Multiple Comparison Conditions on Sort Key for DynamoDB? - java

I am using AWS SDK2 DynamoDB for an application with the following schema:
personId startDate endDate name age job
1 10/2/2013 10/3/2020 Bob 12 SWE
1 8/2/2013 10/3/2021 Bob 12 EE
2 10/2/2013 10/3/2021 Joe 17 Student
3 11/2/2013 10/3/2022 Kim 16 Boss
My goal is to be able to query the table by a personId and a date in order to retrieve a person object. Currently, I am thinking about having a partition key on personId; however, I am not sure how to design my sortKey.
For example, say I want a person with personId = 1 and date = 10/5/2019, I would expect the first and second entry of the example table to be returned because the date is between the startDate and endDate. How can I design the sortKey so that I can use an appropriate key condition expression to say something like date between startDate and endDate? I know that filter expression can be used, but I was wondering if there was a way to design the sort key so that filter is not needed as it is more costly.

No you can't.
Why do you need a sort key anyway? If personId is unique, you don't need a composite primary key (hash + sort).
And since you know the personId when reading, you can just use GetItem(table, HK=:personId)
Edit
Given the additional information that personId is not unique...
Have startDate be your sort key...
You'll be able to Query(table, HK=:personId, SK <= :date, ScanIndexForward=false, limit=1)
This will return the single record with the most recent start date less than or equal to the date you passed in.
The SQL equivalent would look like
select *
from table
where personId = :personId
and startDate <= :date
order by startDate desc
fetch first row only
EDIT2
Didn't see the updated info in the question. If a person can have overlapping jobs, then you'll need to leave off the limit and do the filtering (either in DDB or in your client app)
DDB isn't the best for everything, with data as you've laid out, you've basically got two choices, pay more for DDB RCUs or pay more for an RDB.
Having said that, I'll point out that a DDB RCU covers up to 4KB of data.
If your DDB rows are smaller, you can return multiple rows with a single Query() call costing only 1 RCU.
If you're row size is 512b, you could return 8 rows for a single RCU.

Related

Any drawbacks of using ID instead of DATE column?

I will be storing the archival users' passwords in the ArchivalPassword table:
CREATE TABLE public.ArchivalPassword (
id SERIAL,
userid INTEGER NOT NULL,
content VARCHAR(100) NOT NULL,
CONSTRAINT archivalpassword_pkey PRIMARY KEY(id),
CONSTRAINT archivalpassword_user FOREIGN KEY (userid)
REFERENCES public.user(id)
ON DELETE CASCADE
ON UPDATE CASCADE
NOT DEFERRABLE
)
WITH (oids = false);
CREATE INDEX fki_archivalpassword_user ON public.archivalpassword
USING btree (userid);
For each user I store the limited number of the passwords (based on the archived.passwords.limit property). If the user changes the password I am fetching the archived passwords number from the ArchivalPassword table and if it is greater than limit I calculate how many have to be deleted and delete them.
The requirement is that I delete the oldest passwords. And the question is if I can make and assumption that the password with the lower ID is older than the one with greater ID? Or do I need to add the EXPIREDAT column (date), which will be used to determine which password is needed to be deleted (the one which has the oldest date in the EXPIREDAT column)?
Here is the hypothetical EXPIREDAT column definition:
expiredat TIMESTAMP(0) WITH TIME ZONE DEFAULT '2017-03-20 00:00:00+01' NOT NULL;
And the ID sequence definition:
CREATE SEQUENCE public.archivalpassword_id_seq
INCREMENT 1 MINVALUE 1
MAXVALUE 9223372036854775807 START 1
CACHE 1;
Can you see any drawbacks of using the ID column in the described case?
Assuming your id column is something like a BIGSERIAL then it has a sequence definition which is where it auto allocates the next id from. Under normal circumstances the id's will reliably be allocated in order based on user's changing their password. The sequence definition can however be manually changed so that it starts at a different number and if anyone did this then id numbers would no longer represent chronological order.
I would personally opt to use the EXPIREDAT column though as that will always be accurate and the intention is clear. Not sure why you say "but then i would have to sort the dates instead of the integers" - assuming you are letting Postgres do the sorting I'm not sure why you think there is much difference?
If you have many users then integer (serial data type in Postgres) is faster then a date and time (timestamp data type in Postgres) column to access the record. Not sure a date column would be good if password changes multiple times on the same day.

How to return rows that are "missing" from table - Employee Absent Report [closed]

This question is unlikely to help any future visitors; it is only relevant to a small geographic area, a specific moment in time, or an extraordinarily narrow situation that is not generally applicable to the worldwide audience of the internet. For help making this question more broadly applicable, visit the help center.
Closed 10 years ago.
I have two tables, like this:
master
---------
empcode INT PRIMARY KEY
name VARCHAR
dept VARCHAR
emp_tx
----------
empcode INT references MASTER(empcode)
s_date DATETIME
The emp_tx table records the employee "in" and "out" transactions. The column s_date stores the time (as a DATETIME value) when the "in" or "out" event occurred. The transactions are recorded from the office region (through Finger Print Biometric System.)
Example data from emp_tX table:
empcode s_datetime
------- ------------------
1110 2012-12-12 09:31:42 (employee in time to the office)
1110 2012-12-12 13:34:17 (employee out time for lunch)
1110 2012-12-12 14:00:17 (employee in time after lunch)
1110 2012-12-12 18:00:12 (employee out time after working hours)
1112
etc.
Note:
If an employee is absent from the office on a given day, then no row will be inserted into the emp_tx transaction table for that date. An absence of an employee on a given date will be indicated by a row "missing" for that employee and that date.
Can anyone help me to get a SQL Query that returns the dates that employees were absent, to produce an Employee Absent Report?
The input to the query will be two DATE values, a "from" date and a "to" date, which specifies a range of dates. The query should return all occurrences of "absence" (or, non-occurrences rather, non, when no row is found in the EMP_TX table for an empcode on any date between the "from" and "to" dates.
Expected output:
If we input '2012-12-12' as the "from" date, and '2012-12-20' as the "to" date, the query should return rows something like this:
Empcode EmpName Department AbsentDate TotalNoofAbsent days
------- ------- ---------- ----------- --------------------
1110 ABC Accounts 2012-12-12
1110 ABC Accounts 2012-12-14 2
1112 xyz Software 2012-12-19
1112 xyz Software 2012-12-17 2
I've tried this query, and I am sure it is not returning the rows I want:
select tx.date from Emp_TX as tx where Date(S_Date) not between '2012-12-23' and '2012-12-30'
Thanks.
If an "absence" is defined as the non-appearance of a row in the emp_tx table for a particular empcode for a particular date (date=midnight to midnight 24 hour period), and ...
If its acceptable to not show an "absence" for a date when there are NO transactions in the emp_tx table for that date (i.e. exclude a date when ALL empcode are absent on that date), then ...
You can get the first four columns of the specified result set with a query like this: (untested)
SELECT m.empcode AS `EmpCode`
, m.name AS `EmpName`
, m.dept AS `Department`
, d.dt AS `AbsentDate`
FROM ( SELECT DATE(t.s_date) AS dt
FROM emp_tx t
WHERE t.s_date >= '2012-12-12'
AND t.s_date < DATE_ADD( '2012-12-20' ,INTERVAL 1 DAY)
GROUP BY DATE(t.s_date)
ORDER BY DATE(t.s_date)
) d
CROSS
JOIN master m
LEFT
JOIN emp_tx p
ON p.s_date >= d.dt
AND p.s_date < d.dt + INTERVAL 1 DAY
AND p.empcode = m.empcode
WHERE p.empcode IS NULL
ORDER
BY m.empcode
, d.dt
Getting that fifth column TotalNoofAbsent returned in the same resultset is possible, but it's going to make that query really messy. This detail might be more efficiently handled on the client side, when processing the returned resultset.
How the query works
The inline view aliased as d gets us a set of "date" values that we are checking. Using the emp_tx table as a source of these "date" values is a convenient way to do this. Not the DATE() function is returning just the "date" portion of the DATETIME argument; we're using a GROUP BY to get a distinct list of dates (i.e. no duplicate values). (What we're after, with this inline view query, is a distinct set of DATE values between the two values passed in as arguments. There are other, more involved, ways of generating a list of DATE values.)
As long as every "date" value that you will consider as an "absence" appears somewhere in the table (that is, at least one empcode had one transaction on each date that is of interest), and as long a the number of rows in the emp_tx table isn't excessive, then the inline view query will work reasonably well.
(NOTE: The query in the inline view can be run separately, to verify that the results are correct and as we expect.)
The next step is to do take the results from the inline view and perform a CROSS JOIN operation (to generate a Cartesian product) to match EVERY empcode with EVERY date returned from the inline view. The result of this operation represents every possible occurrence of "attendance".
The final step in the query is to perform an "anti-join" operation, using a LEFT JOIN and a WHERE IS NULL predicate. The LEFT JOIN (outer join) returns every possible attendance occurrence (from the left side), INCLUDING those that don't have a matching row (attendance record) from the emp_tx table.
The "trick" is to include a predicate (in the WHERE clause) that discards all of the rows where a matching attendance record was found, so that what we are left with is all combinations of empcode and date (possible attendance occurrences) where there was NO MATCHING attendance transaction.
(NOTE: I've purposefully left the references to the s_date (DATETIME) column "bare" in the predicates, and used range predicates. This will allow MySQL to make effective use of an appropriate index that includes that column.)
If we were to wrap the column references in the predicates inside a function e.g. DATE(p.s_date), then MySQL won't be able to make effective use of an index on the s_date column.
As one of the comments (on your question) points out, we're not making any distinction between transactions that mark an employee either as "coming in" or "going out". We are ONLY looking for the existence of a transaction for that empcode in a given 24-hour "midnight to midnight" period.
There are other approaches to getting the same result set, but the "anti-join" pattern usually turns out to give the best performance with large sets.
For best performance, you'll likely want covering indexes:
... ON master (empcode, name, dept)
... ON emp_tx (s_date, empcode)
Unfortunately, your query is going to get you a ton of results... It will always return all dates for an employee outside the range you gave. You want to check for NOT EXISTS a record BETWEEN your dates.
It may be possible to do this in pure SQL... I can't think of a way offhand without using cursors or something DB-specific. This Java pseudocode will give you 1 employee's absences:
List<Date> findAbsences(int empCode, Date inDate, Date outDate) {
List<Date> result = new LinkedList<Date>();
Calendar c = new Calendar();
c.setTime(new Date(2012,12,12));
while (!c.getTime().after(outDate)) {
// run query for EMP_TX records between inDate & outDate
//SELECT 1 FROM EMP_TX WHERE EmpCode = :empid AND S_Date BETWEEN :in AND :out;
if (!query.hasNext()) {
result.add(c.getTime);
}
c.add(Calendar.DATE, 1);
}
}

JPA Select latest instance for each item

Let's say I have a Meeting entity. Each meeting has a single attendee and a meeting date. Within my meeting table I may have multiple meetings for each attendee, with different dates for each. I need a JPA query that will select only the latest meeting for all attendees. For instance, if my table looks like this
Meeting ID | Attendee ID | Meeting Date
1 | 1 | 6/1/2011
2 | 2 | 6/1/2011
3 | 1 | 6/6/2011
4 | 3 | 6/6/2011
My result should be
Meeting ID | Attendee ID | Meeting Date
2 | 2 | 6/1/2011
3 | 1 | 6/6/2011
4 | 3 | 6/6/2011
Using JPA 2 against postgres. Meeting has 1-1 to attendee and a simple timestamp date. I suspect I'm going to need to do a group by and max(blah) and maybe a join to myself, but I'm not sure of the best way to approach this.
Update:
After spending the evening playing with this, I still do not have an acceptable JPQL solution to this. Here is what I have so far:
select m from Meeting m
where m.meetingDate in
( select max(meet.meetingDate)
from Meeting meet group by meet.attendee )
I have various other conditions that are not relevant to this question, like filtering by attendee department and whatnot. The only reason this works is because we are tracking meeting date to the second (or finer) and the chance that there will be two meetings at exactly the same time is minimal. We are putting some java stuff around it to keep only hte last meeting for each attendee just in case we do get two at the same time, but that's a pretty crappy solution. It really shouldn't be too difficult to get it all in a query, but I have yet to figure it out.
Update2: Adding sql tag because if I need to use sql to create a view and create a JPA object to map to the view I'm ok with that.
In SQL the solution is very simple - join the table with a subquery, which gives you the most recent meeting for each attendee:
select * from Meeting ALL
join ( select max(meetingDate) as newest, attendee
from Meeting group by attendee ) LATEST
on ALL.meetingDate = LATEST.newest AND ALL.attendee = LATEST.attendee
This works, and works fast!
The problem with JPA is that it (or most implementations) won't allow a subquery for a join. After spending several hours trying what will compile in the first place, and then, how slow it is, I decided that I hate JPA. Solutions like the ones above - like EXISTS (SELECT .. ) or IN ( SELECT .. ) - take ages to execute, orders of magnitude slower than they should.
Having a solution that works meant that I just needed to access that solution from JPA. There are two magic words in SQL that help you do just that:
CREATE VIEW
and the life becomes so much simpler... Just define such entity and use it.
Caution: it's read-only.
Of course, any JPA purists will look down on you when you do that, so if anyone has a pure JPA solution, please let us both know!
I think I've got it with this query.
select m from Meeting m
where m.meetingDate =
(select max(m1.meetingDate)
from Meeting m1
where m1.attendee = m.attendee )
and not exists
(select m2 from Meeting m2
where m2.attendee = m.attendee
and m2.meetingDate > m.meetingDate)
Well in SQL that would be quite simple I think, so I assume that can be mapped to JPA:
SELECT m.AttendeeId, MAX(m.MeetingDate) from Meeting m GROUP BY m.AttendeeId
Edit: If you need the messageId itself as well you can do that with a simple subquery that returns the messageId for a message where the other two values are equal. Just make sure you handle the case where there are several messageIds for the same Attendee and Date (eg pick the first result since they should all be equally good - although I'd doubt that such data even makes sense for meetings)
Plain SQL
As Bulba has said appropriate way is to join a subquery with group by.
JPA, JPQL
The problem is that you can't join a subquery.
Here is a workaround.
Lets see what you get in the subquery with group by. You get a list of pairs (attendee_id, max(meeting_date)).
This pair is like a new unique id for row with max date you want to join on.
Then note that each row in the table forms a pair (attendee_id, meeting_date).
So every row has an id as a pair (attendee_id, meeting_date).
Lets take a row if only it forms an id that belongs to list received in the subquery.
For simplicity lets represent this id-pair as a concatenation of attendee_id and meeting_date: concat(attendee_id, meeting_date).
Then the query in SQL(similarly for JPQL and JPA CriteriaBuilder) would be as follows:
SELECT * FROM meetings
WHERE concat(attendee_id, meeting_date) IN
(SELECT concat(attendee_id, max(meeting_date)) FROM meetings GROUP BY attendee_id)
Note that there is only one subquery per query, not one subquery for each row like in some answers.
Afraid of comparing strings?
We have a special offer for you!
Lets encode that id-pair to number.
It will be a sum of attendee_id and meeting_date but with modifications to ensure uniqueness of code. We can take number representation of date as Unix time.
We will fix the value of max date that our code can capture because final code has max value limit (e.g. bigint(int8)<263). Lets take for convenience max date as 2149-06-07 03:00:00. It equals 5662310400 in seconds and 65536 in days.
I will assume here that we need precision for date in days(so we ignore hours and below).
To construct unique code we can interpret it as a number in a numerical system with base of 65536. The last symbol(number from 0 to 216-1) in or code in such numerical system is number of days. Other symbols will capture attendee_id. In such interpretation code looks like XXXX, where each X is in range [0,216-1] (to be more accurate, first X is in range [0,215-1] because of 1 bit for sign), first three X represents attendee_id and last X represents meeting_date.
So the max value of attendee_id our code can capture is 247-1.
The code can be computed as attendee_id*65536 + "date in days".
In postgresql it will be:
attendee_id*65536 + date_part('epoch', meeting_date)/(60*60*24)
Where date_part returns date in seconds which we convert to days by dividing on constant.
And final query to get the latest meetings for all attendees:
SELECT * FROM meetings
WHERE attendee_id*65536 + date_part('epoch', meeting_date)/(60*60*24)
IN (SELECT attendee_id*65536 + date_part('epoch', max(meeting_date))/(60*60*24) from meetings GROUP BY attendee_id);
Benchmarking
I have created a table with stucture as in the question and populated it with 100000 rows randomly selecting attendee_id from [1, 10000] and random date from range [1970-01-01, 2017-09-16]. I have benchmarked (with EXPLAIN ANALYZE) queries with the following techniques:
Correlated subquery
SELECT * FROM meetings m1 WHERE m1.meeting_date=
(SELECT max(m2.meeting_date) FROM meetings m2 WHERE m2.attendee_id=m1.attendee_id);
Execution time: 873260.878 ms
Join subquery with group by
SELECT * FROM meetings m
JOIN (SELECT attendee_id, max(meeting_date) from meetings GROUP BY attendee_id) attendee_max_date
ON attendee_max_date.attendee_id = m.attendee_id;</code>
Execution time: 103.427 ms
Use pair (attendee_id, date) as a key
Concat attendee_id and meeting_date as strings
SELECT * FROM meetings WHERE concat(attendee_id, meeting_date) IN
(SELECT concat(attendee_id, max(meeting_date)) from meetings GROUP BY attendee_id);
Execution time: 207.720 ms
Encode attendee_id and meeting_date to a single number(code)
SELECT * FROM meetings
WHERE attendee_id*65536 + date_part('epoch',meeting_date)/(60*60*24)
IN (SELECT attendee_id*65536 + date_part('epoch',max(meeting_date))/(60*60*24) from meetings GROUP BY attendee_id);
Execution time: 127.595 ms
Here is a git with table scheme, table data (as csv), code for populating table, and queries.
Try this
SELECT MAX(m.MeetingDate) FROM Meeting m

Service usage limiter implementation

I need to limit multiple service usages for multiple customers. For example, customer customer1 can send max 1000 SMS per month. My implementation is based on one MySQL table with 3 columns:
date TIMESTAMP
name VARCHAR(128)
value INTEGER
For every service usage (sending SMS) one row is inserted to the table. value holds usage count (eg. if SMS was split to 2 parts then value = 2). name holds limiter name (eg. customer1-sms).
To find out how many times the service was used this month (March 2011), a simple query is executed:
SELECT SUM(value) FROM service_usage WHERE name = 'customer1-sms' AND date > '2011-03-01';
The problem is that this query is slow (0.3 sec). We are using indexes on columns date and name.
Is there some better way how to implement service usage limitation? My requirement is that it must be flexibile (eg. if I need to know usage within last 10 minutes or another within current month). I am using Java.
Thanks in advance
You should have one index on both columns, not two indexes on each of the columns. This should make the query very fast.
If it still doesn't, then you could use a table with a month, a name and a value, and increment the value for the current month each time an SMS is sent. This would remove the sum from your query. It would still need an index on (month, name) to be as fast as possible, though.
I found one solution to my problem. Instead of inserting service usage increment, I will insert the last one incremented:
BEGIN;
-- select last the value
SELECT value FROM service_usage WHERE name = %name ORDER BY date ASC LIMIT 1;
-- insert it to the database
INSERT INTO service_usage (CURRENT_TIMESTAMP, %name, %value + %increment);
COMMIT;
To find out service usage since %date:
SELECT value AS value1 FROM test WHERE name = %name ORDER BY date DESC LIMIT 1;
SELECT value AS value2 FROM test WHERE name = %name AND date <= %date ORDER BY date DESC LIMIT 1;
The result will be value1 - value2
This way I'll need transactions. I'll probably implement it as stored procedure.
Any additional hints are still appreciated :-)
It's worth trying to replace your "=" with "like". Not sure why, but in the past I've seen this perform far more quickly than the "=" operator on varchar columns.
SELECT SUM(value) FROM service_usage WHERE name like 'customer1-sms' AND date > '2011-03-01';
Edited after comments:
Okay, now I can sorta re-create your issue - the first time I run the query, it takes around 0.03 seconds, subsequent runs of the query take 0.001 second. Inserting new records causes the query to revert to 0.03 seconds.
Suggested solution:
COUNT does not show the same slow-down. I would change the business logic so every time the user sends and SMS you insert the a record with value "1"; if the message is a multipart message, simply insert two rows.
Replace the "sum" with a "count".
I've applied this to my test data, and even after inserting a new record, the "count" query returns in 0.001 second.

How to manage consecutive column values in table rows

A little presentation for what I want to do:
Consider the case where different people from a firm get, once a year, an all expenses paid trip to somewhere. There may be 1000 persons that could qualify for the trip but only 16 places are available.
Each of this 16 spots has an associated index which must be from 1 to 16. The ones on the reservation have index starting from 17.
The first 16 persons that apply get a definite spot on the trip. The rest end up on the reservation list. If one of the first 16 persons cancels, the first person with a reservation gets his place and all the indexes are renumbered to compensate for the person that canceled.
All of this is managed in a Java web app with an Oracle DB.
Now, my problem:
I have to manage the index in a correct way (all sequential, no duplicate indexes), with possible hundreds of people that simultaneously apply for the trip.
When inserting a record in the table for the trip, the way of getting the index is by
SELECT MAX(INDEX_NR) + 1 AS NEXT_INDEX_NR FROM TABLE
and using this as the new index (this is done Java side and then a new query to insert the record). It is obvious why we have multiple spots or reservations with the same index. So, we get, let’s say, 19 people on the trip because 4 of them have index 10, for example.
How can I manage this? I have been thinking of 3 ways so far:
Use an isolation level of Serializable for the DB transactions (don’t like this one);
Insert a record with no INDEX_NR and then have a trigger manage the things… in some way (never worked with triggers before);
Each record also has a UPDATED column. Could I use this in some way? (note that I can’t lose the INDEX_NR since other parts of the app make use of it).
Is there a best way to do this?
Why make it complicated ?
Just insert all reservations as they are entered and insert a timestamp of when they resevered a spot.
Then in you query just use the timestamp to sort them.
There is offcourse the chance that there are people that reserved a spot at the very same millisecond then just use a random method to assign order.
Why do you need to explicitly store the index? Instead you could store each person's order (which never changes) along with an active flag. In your example if person #16 pulls out you simply mark them as inactive.
To compute whether a person qualifies for the trip you simply count the number of active people with order less than that person:
select count(*)
from CompetitionEntry
where PersonOrder < 16
and Active = 1
This approach removes the need for bulk updates to the database (you only ever update one row) and hence mostly mitigates your problem of transactional integrity.
Another way would be to explicitly lock a record on another table on the select.
-- Initial Setup
CREATE TABLE NUMBER_SOURCE (ID NUMBER(4));
INSERT INTO NUMBER_SOURCE(ID) VALUES 0;
-- Your regular code
SELECT ID AS NEXT_INDEX_NR FROM NUMBER_SOURCE FOR UPDATE; -- lock!
UPDATE NUMBER_SOURCE SET ID = ID + 1;
INSERT INTO TABLE ....
COMMIT; -- releases lock!
No other transaction will be able to perform the query on the table NUMBER_SOURCE until the commit (or rollback).
When adding people to the table, give them an ID in such a way that the ID is ascending in the order in which they were added. This can be a timestamp.
Select all the records from the table which qualify, order by ID, and update their INDEX_NR
Select * from table where INDEX_NR <= 16 order by INDEX_NR
Step #2 seems complicated but it's actually quite simple:
update (
select *
from TABLE
where ...
order by ID
)
set INDEX_NR = INDEXSEQ.NEXTVAL
Don't forget to reset the sequence to 1.
Calculate your index in runtime:
CREATE OR REPLACE VIEW v_person
AS
SELECT id, name, ROW_NUMBER() OVER (ORDER BY id) AS index_rn
FROM t_person
CREATE OR REPLACE TRIGGER trg_person_ii
INSTEAD OF INSERT ON v_person
BEGIN
INSERT
INTO t_person (id, name)
VALUES (:new.id, :new.name);
END;
CREATE OR REPLACE TRIGGER trg_person_iu
INSTEAD OF UPDATE ON v_person
BEGIN
UPDATE t_person
SET id = :new.id,
name = :new.name
WHERE id = :old.id;
END;
CREATE OR REPLACE TRIGGER trg_person_id
INSTEAD OF DELETE ON v_person
BEGIN
DELETE
FROM t_person
WHERE id = :old.id;
END;
INSERT
INTO v_person
VALUES (1, 'test', 1)
SELECT *
FROM v_person
--
id name index_rn
1 test 1
INSERT
INTO v_person
VALUES (2, 'test 2', 1)
SELECT *
FROM v_person
--
id name index_rn
1 test 1
2 test 2 2
DELETE
FROM v_person
WHERE id = 1
SELECT *
FROM v_person
--
id name index_rn
2 test 2 1
"I have to manage the index in a correct way (all sequential, no duplicate indexes), with possible hundreds of people that simultaneously apply for the trip.
When inserting a record in the table for the trip, the way of getting the index is by
SELECT MAX(INDEX_NR) + 1 AS NEXT_INDEX_NR FROM TABLE
and using this as the new index (this is done Java side and then a new query to insert the record). It is obvious why we have multiple spots or reservations with the same index."
Yeah. Oracle's MVCC ("snapshot isolation") used incorrectly by someone who shouldn't have been in IT to begin with.
Really, Peter is right. Your index number is, or rather should be, a sort of "ranking number" on the ordered timestamps that he mentions (this holds a requirement that the DBMS can guarantee that any timestamp value appears only once in the entire database).
You say you are concerned with "regression bugs". I say "Why do you need to be concerned with "regression bugs" in an application that is DEMONSTRABLY beyond curing ?". Because your bosses paid a lot of money for the crap they've been given and you don't want to be the pianist that gets shot for bringing the message ?
The solution depends on what you have under your control. I assume that you can change both database and Java code, but refrain from modifying the database scheme since you had to adapt too much Java code otherwise.
A cheap solution might be to add a uniqueness constraint on the pair (trip_id, index_nr) or just on index_nr if there is just one trip. Additionally add a check contraint check(index_nr > 0) - unless index_nr is already unsigned. Everything else is then done in Java: When inserting a new applicant as described by you, you have to add code catching the exception when someone else got inserted concurrently. If some record is updated or deleted, you either have to live with holes between sequence numbers (by selecting the 16 candidates with the lowest index_nr as shown by Quassnoi in his view) or fill them up by hand (similarily to what Aaron suggested) after every update/delete.
If index_nr is mostly used in the application as read-only, a better solution might be to combine the answers of Peter and Quassnoi: Use either a time stamp (automatically inserted by the database by defining the current time as default) or an auto-incremented integer (as default inserted by the database) as value stored in the table. And use a view (like the one defined by Quassnoi) to access the table and the automatically calculated index_nr from Java. But also define both constraints like for the cheap solution.

Categories