We have an use-case where we keep a monotonically increasing key signifying the current version of a customer data. It is used in the system to figure out the most recent version and resolve the conflict for third party callers if there is delay between reading the data and its processing and they get multiple versions.
CUSTOMER_RESOURCE_ID CURRENT_VERSION
132323 1234
If something changes for this resource, we increment the version from 1234 to 1235 (it is totally fine even if we increase it to 1300 as long as it doesn't go down). For this, we need to first read the value and then update it.
Other alternative is to use DB's timestamp and keep updating version with the DB timestamp which will always be increasing. Since this is just one system, clock-skew can only happen when we change the DB. Also, we are not super concerned about the the case when multiple threads update the data within a fraction of time (i.e. least granularity of the timestamp) as we have another lock as per which only one thread updates the resource at one time.
I was wondering if we could use database's system timestamp to avoid the select-and-increment with just-update.
Is there any concern with this approach? I assume that it will be less overhead on the database but I don't know how much we save here.
Many approaches can be discussed. In my opinion;
-- You can use incremental squence that even default column value instead of database timestamp. Maybe miliseconds can make problem for you.
--In addition, If your table is not very big and your resources is enough, you can add a is_last_version column which will update with every insert to query last version of customers in a perfomance(ie. select * from customers where customer_id = 123 and is_last_version= 1--to get rid of the cost of ordering). So, you can know which row is the last version. But each insert time will be longer. You should test it.
Example:
CUSTOMER_RESOURCE:
CUSTOMER_RESOURCE_ID CURRENT_VERSION_ID IS_LAST_VERSION INSERT_TIME
(default seq) (default sysdate --optionally)
132323 1 1
132324 2 1
132325 3 1
132326 4 0
132327 5 0
132328 6 1
132326 7 0
132329 8 1
132326 9 1
132327 10 1
When insert:
update CUSTOMER_RESOURCE
SET IS_LAST_VERSION = 0
where CUSTOMER_RESOURCE_ID = 132326;
insert into CUSTOMER_RESOURCE(CUSTOMER_RESOURCE_ID,IS_LAST_VERSION) VALUES (132326,1);
COMMIT;
When select Last Version of a customer:
select * from customers where customer_id = 123 and is_last_version= 1;
Related
In my springboot application, I noticed one strange issue when inserting new rows.
My ids are generated by sequence, but after I restart the application it starts from 21.
Example:
First launch, I insert 3 rows - ids generated by sequence 1,2,3
After restart second launch, I insert 3 rows ids generated from 21. So ids are 21,22 ...
Every restart It increased to 20. - This increasing pattern always 20
Refer my database table (1,2 after restart 21)
My JPA entity
#Id
#GeneratedValue(strategy = GenerationType.IDENTITY)
#Column(unique = true, nullable = false)
private Long id;
I tried some stackoverflow solutions, it's not working
I tried this, not working
spring.jpa.properties.hibernate.id.new_generator_mappings=false
I want to insert rows by sequence like 1,2,3,4. Not like this 1,2,21,22, How to resolve this problem?
Although I think the question comments already provide all the information necessary to understand the problem, please, let me try explain some things and try fixing some inaccuracies.
According to your source code you are using the IDENTITY id generation strategy:
#Id
#GeneratedValue(strategy = GenerationType.IDENTITY)
#Column(unique = true, nullable = false)
private Long id;
You are using an Oracle database and this is a very relevant information for the question.
Support for IDENTITY columns was introduced in Oracle 12c, probably Release 1, and in Hibernate version - I would say 5.1 although here in SO is indicated that you need at least - 5.3.
Either way, IDENTITY columns in Oracle are supported by the use of database SEQUENCEs: i.e., for every IDENTITY column a corresponding sequence is created. As you can read in the Oracle documentation this explain why, among others, all the options for creating sequences can be applied to the IDENTITY column definition, like min and max ranges, cache size, etc.
By default a sequence in Oracle has a cache size of 20 as indicated in a tiny note in the aforementioned Oracle documentation:
Note: When you create an identity column, Oracle recommends that you
specify the CACHE clause with a value higher than the default of 20 to
enhance performance.
And this default cache size is the reason that explains why you are obtaining this non consecutive numbers in your id values.
This behavior is not exclusive to Hibernate: please, just issue a simple JDBC insert statement or SQL commands with any suitable tool and you will experiment the same.
To solve the issue create your table indicating NOCACHE for your IDENTITY column:
CREATE TABLE your_table (
id NUMBER GENERATED BY DEFAULT ON NULL AS IDENTITY NOCACHE,
--...
)
Note you need to use NOCACHE and not CACHE 0 as indicated in the question comments and now in a previous version of other answers, which is an error because the value for the CACHE option should be at least 2.
Probably you could modify your column without recreating the whole table as well:
ALTER TABLE your_table MODIFY (ID GENERATED BY DEFAULT ON NULL AS IDENTITY NOCACHE);
Having said all that, please, be aware that in fact the cache mechanism is an optimization and not a drawback: in the end, and this is just my opinion, those ids are only non natural assigned IDs and, in a general use case, the cache benefits outweigh the drawbacks.
Please, consider read this great article about IDENTITY columns in Oracle.
The provided answer related to the use of the hilo optimizer could be right but it requires explicitly using the optimizer in your id field declaration which seems not to be the case.
It is related to Hi/Lo algorithm that Hibernate uses for incrementing the sequence value. Read more in this example: https://www.baeldung.com/hi-lo-algorithm-hibernate.
This is an optimization used by Hibernate, which consumes some values from the DB sequence into a pool (Java runtime) and uses them while executing appropriate INSERT statements on the table. If this optimization is turned off and set allocationSize=1, then the desired behavior (no gaps in ids) is possible (with a certain precision, not always), but for the price of making two requests to DB for each INSERT.
Examples give the idea of what is going on in the upper level of abstraction.
(Internal implementation is more complex, but here we don't care)
Scenario: user makes 21 inserts during some period of time
Example 1 (current behavior allocationSize=20)
#1 insert: // first cycle
- need next MY_SEQ value, but MY_SEQ_PREFETCH_POOL is empty
- select 20 values from MY_SEQ into MY_SEQ_PREFETCH_POOL // call DB
- take it from MY_SEQ_PREFETCH_POOL >> remaining=20-1
- execute INSERT // call DB
#2-#20 insert:
- need next MY_SEQ value,
- take it from MY_SEQ_PREFETCH_POOL >> remaining=20-i
- execute INSERT // call DB
#21 insert: // new cycle
- need next MY_SEQ value, but MY_SEQ_PREFETCH_POOL is empty
- select 20 values from MY_SEQ into MY_SEQ_PREFETCH_POOL // call DB
- take it from MY_SEQ_PREFETCH_POOL >> remaining=19
- execute INSERT // call DB
Example 2 (current behavior allocationSize=1)
#1-21 insert:
- need next MY_SEQ value, but MY_SEQ_PREFETCH_POOL is empty
- select 1 value from MY_SEQ into MY_SEQ_PREFETCH_POOL // call DB
- take it from MY_SEQ_PREFETCH_POOL >> remaining=0
- execute INSERT // call DB
Example#1: total calls to DB is 23
Example#2: total calls to DB is 42
Manual declaration of the sequence in the database will not help in this case, because, for instance in this statement\
CREATE SEQUENCE ABC START WITH 1 INCREMENT BY 1 CYCLE NOCACHE;
we control only "cache" used in the DB internal runtime, which is not visible to Hibernate. It affects sequence gaps in situations when DB stopped and started again, and this is not the case.
When Hibernate consumes values from the sequence it implies that the state of the sequence is changed on DB side. We may treat it as hotel rooms booking: a company (Hibernate) booked 20 rooms for a conference in a hotel (DB), but only 2 participants arrived. Then 18 rooms will stay empty and cannot be used by other guests. In this case the "booking period" is forever.
More details on how to configure Hibernate work with sequences is here:
https://ntsim.uk/posts/how-to-use-hibernate-identifier-sequence-generators-properly
Here is a short answer for older version of Hibernate. Still it has relevant ideas:
https://stackoverflow.com/a/5346701/2774914
I created a PostgreSQL sequence on a PostgreSQL 10.7 dB called markush_seq
I read from the seq
select nextval('markush_seq’)` )
using a java web service:
When I run the web service on eclipse (using java 1.8.161) or call the sequence direct from SQL developer, it works fine and the sequence increments by 1 each time eg:
http://localhost:8086/wipdbws/read-markush-seq
21767823690
21767823691
21767823692
However when I run the webservice on AWS (which uses java 1.8.252) and read from the seq using:
https://aws-location/wipdbws/read-markush-seq
I get the sequence number returned as eg:
21767823692
21767823702
21767823693
21767823703
21767823694
21767823704
The sequence in AWS appears to be a combination of 2 incrementing sequences, 10 apart.
It’s the same java code, the only thing that has changed is:
The location of the webservice
a. AWS – USWEST
b. Eclipse - London
The java version:
a. 1.8.161 in London
b. 1.8.252 in US WEST
The seq details are:
SELECT * FROM information_schema.sequences
where sequence_name='markush_seq';
select * from pg_sequences where sequencename='markush_seq';
Any suggestion appreciated.
Likely due to multiple sessions accessing the sequence and sequence cache settings.
Documentation says:
although multiple sessions are guaranteed to allocate distinct
sequence values, the values might be generated out of sequence when
all the sessions are considered. For example, with a cache setting of
10, session A might reserve values 1..10 and return nextval=1, then
session B might reserve values 11..20 and return nextval=11 before
session A has generated nextval=2. Thus, with a cache setting of one
it is safe to assume that nextval values are generated sequentially;
with a cache setting greater than one you should only assume that the
nextval values are all distinct, not that they are generated purely
sequentially. Also, last_value will reflect the latest value reserved
by any session, whether or not it has yet been returned by nextval.
In Oracle, what is the the default ordering of rows for a select query if no "order by" clause is specified.
Is it
the order in which the rows were inserted
there is no default ordering at all
none of the above.
According to Tom Kyte: "Unless and until you add "order by" to a query, you cannot say ANYTHING about the order of the rows returned. Well, short of 'you cannot rely on the order of the rows being returned'."
See this question at asktom.com.
As for ROWNUM, it doesn't physically exist, so it can't be "freed". ROWNUM is assigned after a record is retrieved from a table, which is why "WHERE ROWNUM = 5" will always fail to select any records.
#ammoQ: you might want to read this AskTom article on GROUP BY ordering. In short:
Does a Group By clause in an Query gaurantee that the output data will be
sorted on the Group By columns in
order, even if there is NO Order By
clause?
and we said...
ABSOLUTELY NOT,
It never has, it never did, it never
will.
There is no explicit default ordering. For obvious reasons, if you create a new table, insert a few rows and do a "select *" without a "where" clause, it will (very likely) return the rows in the order they were inserted.
But you should never ever rely on a default order happening. If you need a specific order, use an "order by" clause. For example, in Oracle versions up to 9i, doing a "group by" also caused the rows to be sorted by the group expression(*). In 10g, this behaviour does no longer exist! Upgrading Oracle installations has caused me some work because of this.
(*) disclaimer: while this is the behaviour I observed, it was never guaranteed
It has already been said that Oracle is allowed to give you the rows in any order it wants, when you don't specify an ORDER BY clause. Speculating what the order will be when you don't specify the ORDER BY clause is pointless. And relying on it in your code, is a "career limiting move".
A simple example:
SQL> create table t as select level id from dual connect by level <= 10
2 /
Tabel is aangemaakt.
SQL> select id from t
2 /
ID
----------
1
2
3
4
5
6
7
8
9
10
10 rijen zijn geselecteerd.
SQL> delete t where id = 6
2 /
1 rij is verwijderd.
SQL> insert into t values (6)
2 /
1 rij is aangemaakt.
SQL> select id from t
2 /
ID
----------
1
2
3
4
5
7
8
9
10
6
10 rijen zijn geselecteerd.
And this is only after a simple delete+insert. And there are numerous other situations thinkable. Parallel execution, partitions, index organised tables to name just a few.
Bottom line, as already very well said by ammoQ: if you need the rows sorted, use an ORDER BY clause.
You absolutely, positively cannot rely on any ordering unless you specify order by. For Oracle in particular, I've actually seen the exact same query (without joins), run twice within a few seconds of each other, on a table that didn't change in the interim, return a wildly different order. This seems to be more likely when the result set is large.
The parallel execution mentioned by Rob van Wijk probably explains this. See also Oracle's Using Parallel Execution doc.
It is impacted by index ,
if there is index ,it will return a ascending order ,
if there is not any index ,it will return the order inserted .
You can modify the order in which data is stored into the table by INSERT with the ORGANIZATION clause of the CREATE TABLE statement
Although, it should be rownnum (your #2), it really isn't guaranteed and you shouldn't trust it 100%.
I believe it uses Oracle's hidden Rownum attribute.
So your #1 is probably right assuming there were no deletes done that might have freed rownums for later use.
EDIT: As others have said, you really shouldn't rely on this, ever. Besides deletes theres a lot of different conditions that can affect the default sorting behavior.
I need to limit multiple service usages for multiple customers. For example, customer customer1 can send max 1000 SMS per month. My implementation is based on one MySQL table with 3 columns:
date TIMESTAMP
name VARCHAR(128)
value INTEGER
For every service usage (sending SMS) one row is inserted to the table. value holds usage count (eg. if SMS was split to 2 parts then value = 2). name holds limiter name (eg. customer1-sms).
To find out how many times the service was used this month (March 2011), a simple query is executed:
SELECT SUM(value) FROM service_usage WHERE name = 'customer1-sms' AND date > '2011-03-01';
The problem is that this query is slow (0.3 sec). We are using indexes on columns date and name.
Is there some better way how to implement service usage limitation? My requirement is that it must be flexibile (eg. if I need to know usage within last 10 minutes or another within current month). I am using Java.
Thanks in advance
You should have one index on both columns, not two indexes on each of the columns. This should make the query very fast.
If it still doesn't, then you could use a table with a month, a name and a value, and increment the value for the current month each time an SMS is sent. This would remove the sum from your query. It would still need an index on (month, name) to be as fast as possible, though.
I found one solution to my problem. Instead of inserting service usage increment, I will insert the last one incremented:
BEGIN;
-- select last the value
SELECT value FROM service_usage WHERE name = %name ORDER BY date ASC LIMIT 1;
-- insert it to the database
INSERT INTO service_usage (CURRENT_TIMESTAMP, %name, %value + %increment);
COMMIT;
To find out service usage since %date:
SELECT value AS value1 FROM test WHERE name = %name ORDER BY date DESC LIMIT 1;
SELECT value AS value2 FROM test WHERE name = %name AND date <= %date ORDER BY date DESC LIMIT 1;
The result will be value1 - value2
This way I'll need transactions. I'll probably implement it as stored procedure.
Any additional hints are still appreciated :-)
It's worth trying to replace your "=" with "like". Not sure why, but in the past I've seen this perform far more quickly than the "=" operator on varchar columns.
SELECT SUM(value) FROM service_usage WHERE name like 'customer1-sms' AND date > '2011-03-01';
Edited after comments:
Okay, now I can sorta re-create your issue - the first time I run the query, it takes around 0.03 seconds, subsequent runs of the query take 0.001 second. Inserting new records causes the query to revert to 0.03 seconds.
Suggested solution:
COUNT does not show the same slow-down. I would change the business logic so every time the user sends and SMS you insert the a record with value "1"; if the message is a multipart message, simply insert two rows.
Replace the "sum" with a "count".
I've applied this to my test data, and even after inserting a new record, the "count" query returns in 0.001 second.
A little presentation for what I want to do:
Consider the case where different people from a firm get, once a year, an all expenses paid trip to somewhere. There may be 1000 persons that could qualify for the trip but only 16 places are available.
Each of this 16 spots has an associated index which must be from 1 to 16. The ones on the reservation have index starting from 17.
The first 16 persons that apply get a definite spot on the trip. The rest end up on the reservation list. If one of the first 16 persons cancels, the first person with a reservation gets his place and all the indexes are renumbered to compensate for the person that canceled.
All of this is managed in a Java web app with an Oracle DB.
Now, my problem:
I have to manage the index in a correct way (all sequential, no duplicate indexes), with possible hundreds of people that simultaneously apply for the trip.
When inserting a record in the table for the trip, the way of getting the index is by
SELECT MAX(INDEX_NR) + 1 AS NEXT_INDEX_NR FROM TABLE
and using this as the new index (this is done Java side and then a new query to insert the record). It is obvious why we have multiple spots or reservations with the same index. So, we get, let’s say, 19 people on the trip because 4 of them have index 10, for example.
How can I manage this? I have been thinking of 3 ways so far:
Use an isolation level of Serializable for the DB transactions (don’t like this one);
Insert a record with no INDEX_NR and then have a trigger manage the things… in some way (never worked with triggers before);
Each record also has a UPDATED column. Could I use this in some way? (note that I can’t lose the INDEX_NR since other parts of the app make use of it).
Is there a best way to do this?
Why make it complicated ?
Just insert all reservations as they are entered and insert a timestamp of when they resevered a spot.
Then in you query just use the timestamp to sort them.
There is offcourse the chance that there are people that reserved a spot at the very same millisecond then just use a random method to assign order.
Why do you need to explicitly store the index? Instead you could store each person's order (which never changes) along with an active flag. In your example if person #16 pulls out you simply mark them as inactive.
To compute whether a person qualifies for the trip you simply count the number of active people with order less than that person:
select count(*)
from CompetitionEntry
where PersonOrder < 16
and Active = 1
This approach removes the need for bulk updates to the database (you only ever update one row) and hence mostly mitigates your problem of transactional integrity.
Another way would be to explicitly lock a record on another table on the select.
-- Initial Setup
CREATE TABLE NUMBER_SOURCE (ID NUMBER(4));
INSERT INTO NUMBER_SOURCE(ID) VALUES 0;
-- Your regular code
SELECT ID AS NEXT_INDEX_NR FROM NUMBER_SOURCE FOR UPDATE; -- lock!
UPDATE NUMBER_SOURCE SET ID = ID + 1;
INSERT INTO TABLE ....
COMMIT; -- releases lock!
No other transaction will be able to perform the query on the table NUMBER_SOURCE until the commit (or rollback).
When adding people to the table, give them an ID in such a way that the ID is ascending in the order in which they were added. This can be a timestamp.
Select all the records from the table which qualify, order by ID, and update their INDEX_NR
Select * from table where INDEX_NR <= 16 order by INDEX_NR
Step #2 seems complicated but it's actually quite simple:
update (
select *
from TABLE
where ...
order by ID
)
set INDEX_NR = INDEXSEQ.NEXTVAL
Don't forget to reset the sequence to 1.
Calculate your index in runtime:
CREATE OR REPLACE VIEW v_person
AS
SELECT id, name, ROW_NUMBER() OVER (ORDER BY id) AS index_rn
FROM t_person
CREATE OR REPLACE TRIGGER trg_person_ii
INSTEAD OF INSERT ON v_person
BEGIN
INSERT
INTO t_person (id, name)
VALUES (:new.id, :new.name);
END;
CREATE OR REPLACE TRIGGER trg_person_iu
INSTEAD OF UPDATE ON v_person
BEGIN
UPDATE t_person
SET id = :new.id,
name = :new.name
WHERE id = :old.id;
END;
CREATE OR REPLACE TRIGGER trg_person_id
INSTEAD OF DELETE ON v_person
BEGIN
DELETE
FROM t_person
WHERE id = :old.id;
END;
INSERT
INTO v_person
VALUES (1, 'test', 1)
SELECT *
FROM v_person
--
id name index_rn
1 test 1
INSERT
INTO v_person
VALUES (2, 'test 2', 1)
SELECT *
FROM v_person
--
id name index_rn
1 test 1
2 test 2 2
DELETE
FROM v_person
WHERE id = 1
SELECT *
FROM v_person
--
id name index_rn
2 test 2 1
"I have to manage the index in a correct way (all sequential, no duplicate indexes), with possible hundreds of people that simultaneously apply for the trip.
When inserting a record in the table for the trip, the way of getting the index is by
SELECT MAX(INDEX_NR) + 1 AS NEXT_INDEX_NR FROM TABLE
and using this as the new index (this is done Java side and then a new query to insert the record). It is obvious why we have multiple spots or reservations with the same index."
Yeah. Oracle's MVCC ("snapshot isolation") used incorrectly by someone who shouldn't have been in IT to begin with.
Really, Peter is right. Your index number is, or rather should be, a sort of "ranking number" on the ordered timestamps that he mentions (this holds a requirement that the DBMS can guarantee that any timestamp value appears only once in the entire database).
You say you are concerned with "regression bugs". I say "Why do you need to be concerned with "regression bugs" in an application that is DEMONSTRABLY beyond curing ?". Because your bosses paid a lot of money for the crap they've been given and you don't want to be the pianist that gets shot for bringing the message ?
The solution depends on what you have under your control. I assume that you can change both database and Java code, but refrain from modifying the database scheme since you had to adapt too much Java code otherwise.
A cheap solution might be to add a uniqueness constraint on the pair (trip_id, index_nr) or just on index_nr if there is just one trip. Additionally add a check contraint check(index_nr > 0) - unless index_nr is already unsigned. Everything else is then done in Java: When inserting a new applicant as described by you, you have to add code catching the exception when someone else got inserted concurrently. If some record is updated or deleted, you either have to live with holes between sequence numbers (by selecting the 16 candidates with the lowest index_nr as shown by Quassnoi in his view) or fill them up by hand (similarily to what Aaron suggested) after every update/delete.
If index_nr is mostly used in the application as read-only, a better solution might be to combine the answers of Peter and Quassnoi: Use either a time stamp (automatically inserted by the database by defining the current time as default) or an auto-incremented integer (as default inserted by the database) as value stored in the table. And use a view (like the one defined by Quassnoi) to access the table and the automatically calculated index_nr from Java. But also define both constraints like for the cheap solution.