How can I page query database without lost records?

How can I page query database without lost records? - java

We want to programmably copy all records from one table to another periodically.
Now I use SELECT * FROM users LIMIT 2 OFFSET <offset> for fetch records.
The table records like below:
user_1
user_2
user_3
user_4
user_5
user_6
When I fetched the first page (user_1, user_2), then the record "user_2" was be deleted at the source table.
And now I fetched the second page is (user_4, user_5), the third page is (user_6).
This lead to I lost the records "user_3" at the destination table.
And the real source table may be has 1000 000 records, How can I resolve the problem effectively?

First you should use an unique index on the source table and use it in an order clause to make sure that the order or the rows is consistent over time. Next you do not use offsets but start after the last element fetched.
Something like:
SELECT * FROM users ORDER BY id LIMIT 2;
for the first time, and then
SELECT * FROM users WHERE ID > last_recieved_id ORDER BY id LIMIT 2;
for the next ones.
This will be immune to asynchronous deletions.
I you have no unique index but have a non unique one in your table, you can still apply the above solution with a non-strict comparison operator. You will consistently re-get the last rows and it would certainly break with a limit 2, but it could work for reasonable values.
If you have no index - which is known to cause different other problems - the only reliable way is to have one single big select and use the SQL cursor to page.

Related

SQL pagination without offset and index column

My database has millions of records and that is being used on a portal to display to the user and pagination is done using offset and also the data is sorted by some column, Is there any alternate solution to offset and using any index column such as auto increment field or unique field, it does not works if I have to sort with some different column, also even when the row is deleted it does not return expected results.
I am running sql queries in my java application, I've tried a way by only adding limit to my queries. So it works like the offset will always be zero and the limit will be number of (limit + offset) as per pagination logic
Ex : User requests for 10 records per page and navigates to 51 page
Alternate Logic : limit 10 offset 500 -> limit = 510
Queries look like
select * from history limit 510 order by log_date;
so with the help of absolute method of ResultSet I navigate to the row number as per the specified offset and fetch the results after that row
-- No of rows returned 510
rs.absolute(500);
while(rs.next()){
//store data in the object
}
but even by this way I m telling data base to return 510 records and if user navigates to the last page it will fetch all the rows which will be very inefficient.

So you are hitting the DB on every new page request - with this out of the confusion :
In the sample you gave, you are fetching all of the result set into java application and then filtering. Let the Database do the filtering and give you the result. Send the PAGE_NUMBER you want into the db query itself.
select * from history limit 510 order by log_date
OFFSET PAGE_NUMBER*MAX_ROWS_TO_SELECT ROWS FETCH NEXT MAX_ROWS_TO_SELCET ROWS ONLY;
As mentioned in the above sample, you have to get only 10 records. This is the efficient way for you to get the required page content in optimized DB resultset size.

My solution is only applicable to phoenix 4.7 (hdp 2.5) and the way I found was to sort the data in ascending order with the primary key and if there is the composite key data was sorted as per the first column of the key and then the offset and limit works properly.

Hibernate concurrency creating a duplicate record on saveOrUpdate

I'm trying to implement a counter with Java, Spring, Hibernate and Oracle SQL. Each record represents a count, by a given timestamp. Let's say each record is uniquely identified by the minute, and each record holds a count column. The service should expect to receive a ton of concurrent requests and my update a counter column for possibly the same record.
In my table, if the record does not exist, just insert the record in and set its count to 1. Otherwise, find the record by timestamp and increase its existing counter column by 1.
In order to ensure that we're maintain data consistency and integrity, I'm using pessimistic locking. For example, if 20 counts come in at the same time, and not necessarily by the same user, it's possible that we may override the record from a stale read of that record before updating. With locking, I'm ensuring that if 20 counts come in, the net effect on the database should represent the 20 count.
So locking is fine, but the problem is that if the record never did exist in the first place, and we have two or more concurrent requests coming in trying to update the not-yet-existant record, I've observed that the a duplicate record gets inserted because we cannot lock on a record that doesn't exist yet. How can we ensure that no duplicates get created in the table? Should it be controlled via Oracle? Or can I manage this via my app and Hibernate?
Thank you.

One was to avoid this sort of problem altogether would be to just generate the count at the time you actually query the data. Oracle has an analytic function ROW_NUMBER() which can assign a row number to each record in the result set of a query. As a rough example, consider the following query:
SELECT
ts,
ROW_NUMBER() OVER (ORDER BY ts) rn
FROM yourTable
The count you want would be in the rn column, representing the number of records appearing since the first entry in the table. Of course, you could further restrict the query.
This approach is robust to removing records, as the count would always start with 1. One drawback is that row number functionality is not supported by Hibernate. You would have to run this either as a native query or a stored proc.

Get multiple Oracle sequences in one roundtrip

We have a "audit" table that we create lots of rows in. Our persistence layer queries the audit table sequence to create a new row in the audit table. With millions of rows being created daily the select statement to get the next value from the sequence is one of our top ten most executed queries. We would like to reduce the number of database roundtrips just to get the sequence next value (primary key) before inserting a new row in the audit table. We know you can't batch select statements from JDBC. Are there any common techniques for reducing database roundtrips to get a sequence next value?

Get a couple (e.g. 1000) of sequence values in advance by a single select:
select your_sequence.nextval
from dual
connect by level < 1000
cache the obtained sequences and use it for the next 1000 audit inserts.
Repeat this when you have run out of cached sequence values.

Skip the select statement for the sequence and generate the sequence value in the insert statement itself.
insert (ID,..) values (my_sequence.nextval,..)
No need for an extra select. If you need the sequence value get it by adding a returning clause.
insert (ID,..) values (my_sequence.nextval,..) returning ID into ..
Save some extra time by specifying a cache value for the sequence.

I suggest you change the "INCREMENT BY" option of the sequence and set it to a number like 100 (you have to decide what step size must be taken by your sequence, 100 is an example.)
then implement a class called SequenceGenerator, in this class you have a property that contains the nextValue, and every 100 times, calls the sequence.nextVal in order to keep the db sequence up to date.
in this way you will go to db every 100 inserts for the sequence nextVal
every time the application starts, you have to initialize the SequenceGenerator class with the sequence.nextVal.
the only downside of this approach is that if your application stops for any reason, you will loose some of the sequences values and there will be gaps in your ids. but it should not be a logical problem if you don't have anu business logic on the id values.

Fetching records one by one from PostgreSql DB

There's a DB that contains approximately 300-400 records. I can make a simple query for fetching 30 records like:
SELECT * FROM table
WHERE isValidated = false
LIMIT 30
Some more words about content of DB table. There's a column named isValidated, that can (as you correctly guessed) take one of two values: true or false. After a query some of the records should be made validated (isValidated=true). It is approximately 5-6 records from each bunch of 30 records. Correspondingly after each query, I will fetch the records (isValidated=false) from previous query. In fact, I'll never get to the end of the table with such approach.
The validation process is made with Java + Hibernate. I'm new to Hibernate, so I use Criterion for making this simple query.
Is there any best practices for such task? The variant with adding a flag-field (that marks records which were fetched already) is inappropriate (over-engineering for this DB).
Maybe there's an opportunity to create some virtual table where records that were already processed will be stored or something like this. BTW, after all the records are processed, it is planned to start processing them again (it is possible, that some of them need to be validated).
Thank you for your help in advance.

I can imagine several solutions:
store everything in memory. You only have 400 records, and it could be a perfectly fine solution given this small number
use an order by clause (which you should do anyway) on a unique column (the PK, for example), store the ID of the last loaded record, and make sure the next query uses where ID > :lastId

How to optimize retrieval of most occurring values (hundreds of millions of rows)

I'm trying to retrieve some most occurring values from a SQLite table containing a few hundreds of millions of rows.
The query so far may look like this:
SELECT value, COUNT(value) AS count FROM table GROUP BY value ORDER BY count DESC LIMIT 10
There is a index on the value field.
However, with the ORDER BY clause, the query takes so much time I've never seen the end of it.
What could be done to drastically improve such queries on such big amount of data?
I tried to add a HAVING clause (e.g: HAVING count > 100000) to lower the number of rows to be sorted, without success.
Note that I don't care much on the time required to do the insertion (it still need to be reasonable, but priority is given to the selection), so I'm opened for solutions suggesting computation at insertion time ...
Thanks in advance,

1) create a new table where you'll store one row per unique "value" and the "count", put a descending index on the count column
2) add a trigger to the original table, where you maintain this new table (inset and update) as necessary to increment/decrement the count.
3) run your query off this new table, which will run fast because of the descending count index

this query forces you to look at every row in the table. that is what is taking time.
I almost never recommend this, but in this case, you could maintain the count in a denormalized fashion in an external table.
place the value and count into another table during insert, update, and delete via triggers.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How can I page query database without lost records? - java

Related

SQL pagination without offset and index column

Hibernate concurrency creating a duplicate record on saveOrUpdate

Get multiple Oracle sequences in one roundtrip

Fetching records one by one from PostgreSql DB

How to optimize retrieval of most occurring values (hundreds of millions of rows)

Categories

Resources