JPA Select latest instance for each item - java

Let's say I have a Meeting entity. Each meeting has a single attendee and a meeting date. Within my meeting table I may have multiple meetings for each attendee, with different dates for each. I need a JPA query that will select only the latest meeting for all attendees. For instance, if my table looks like this
Meeting ID | Attendee ID | Meeting Date
1 | 1 | 6/1/2011
2 | 2 | 6/1/2011
3 | 1 | 6/6/2011
4 | 3 | 6/6/2011
My result should be
Meeting ID | Attendee ID | Meeting Date
2 | 2 | 6/1/2011
3 | 1 | 6/6/2011
4 | 3 | 6/6/2011
Using JPA 2 against postgres. Meeting has 1-1 to attendee and a simple timestamp date. I suspect I'm going to need to do a group by and max(blah) and maybe a join to myself, but I'm not sure of the best way to approach this.
Update:
After spending the evening playing with this, I still do not have an acceptable JPQL solution to this. Here is what I have so far:
select m from Meeting m
where m.meetingDate in
( select max(meet.meetingDate)
from Meeting meet group by meet.attendee )
I have various other conditions that are not relevant to this question, like filtering by attendee department and whatnot. The only reason this works is because we are tracking meeting date to the second (or finer) and the chance that there will be two meetings at exactly the same time is minimal. We are putting some java stuff around it to keep only hte last meeting for each attendee just in case we do get two at the same time, but that's a pretty crappy solution. It really shouldn't be too difficult to get it all in a query, but I have yet to figure it out.
Update2: Adding sql tag because if I need to use sql to create a view and create a JPA object to map to the view I'm ok with that.

In SQL the solution is very simple - join the table with a subquery, which gives you the most recent meeting for each attendee:
select * from Meeting ALL
join ( select max(meetingDate) as newest, attendee
from Meeting group by attendee ) LATEST
on ALL.meetingDate = LATEST.newest AND ALL.attendee = LATEST.attendee
This works, and works fast!
The problem with JPA is that it (or most implementations) won't allow a subquery for a join. After spending several hours trying what will compile in the first place, and then, how slow it is, I decided that I hate JPA. Solutions like the ones above - like EXISTS (SELECT .. ) or IN ( SELECT .. ) - take ages to execute, orders of magnitude slower than they should.
Having a solution that works meant that I just needed to access that solution from JPA. There are two magic words in SQL that help you do just that:
CREATE VIEW
and the life becomes so much simpler... Just define such entity and use it.
Caution: it's read-only.
Of course, any JPA purists will look down on you when you do that, so if anyone has a pure JPA solution, please let us both know!

I think I've got it with this query.
select m from Meeting m
where m.meetingDate =
(select max(m1.meetingDate)
from Meeting m1
where m1.attendee = m.attendee )
and not exists
(select m2 from Meeting m2
where m2.attendee = m.attendee
and m2.meetingDate > m.meetingDate)

Well in SQL that would be quite simple I think, so I assume that can be mapped to JPA:
SELECT m.AttendeeId, MAX(m.MeetingDate) from Meeting m GROUP BY m.AttendeeId
Edit: If you need the messageId itself as well you can do that with a simple subquery that returns the messageId for a message where the other two values are equal. Just make sure you handle the case where there are several messageIds for the same Attendee and Date (eg pick the first result since they should all be equally good - although I'd doubt that such data even makes sense for meetings)

Plain SQL
As Bulba has said appropriate way is to join a subquery with group by.
JPA, JPQL
The problem is that you can't join a subquery.
Here is a workaround.
Lets see what you get in the subquery with group by. You get a list of pairs (attendee_id, max(meeting_date)).
This pair is like a new unique id for row with max date you want to join on.
Then note that each row in the table forms a pair (attendee_id, meeting_date).
So every row has an id as a pair (attendee_id, meeting_date).
Lets take a row if only it forms an id that belongs to list received in the subquery.
For simplicity lets represent this id-pair as a concatenation of attendee_id and meeting_date: concat(attendee_id, meeting_date).
Then the query in SQL(similarly for JPQL and JPA CriteriaBuilder) would be as follows:
SELECT * FROM meetings
WHERE concat(attendee_id, meeting_date) IN
(SELECT concat(attendee_id, max(meeting_date)) FROM meetings GROUP BY attendee_id)
Note that there is only one subquery per query, not one subquery for each row like in some answers.
Afraid of comparing strings?
We have a special offer for you!
Lets encode that id-pair to number.
It will be a sum of attendee_id and meeting_date but with modifications to ensure uniqueness of code. We can take number representation of date as Unix time.
We will fix the value of max date that our code can capture because final code has max value limit (e.g. bigint(int8)<263). Lets take for convenience max date as 2149-06-07 03:00:00. It equals 5662310400 in seconds and 65536 in days.
I will assume here that we need precision for date in days(so we ignore hours and below).
To construct unique code we can interpret it as a number in a numerical system with base of 65536. The last symbol(number from 0 to 216-1) in or code in such numerical system is number of days. Other symbols will capture attendee_id. In such interpretation code looks like XXXX, where each X is in range [0,216-1] (to be more accurate, first X is in range [0,215-1] because of 1 bit for sign), first three X represents attendee_id and last X represents meeting_date.
So the max value of attendee_id our code can capture is 247-1.
The code can be computed as attendee_id*65536 + "date in days".
In postgresql it will be:
attendee_id*65536 + date_part('epoch', meeting_date)/(60*60*24)
Where date_part returns date in seconds which we convert to days by dividing on constant.
And final query to get the latest meetings for all attendees:
SELECT * FROM meetings
WHERE attendee_id*65536 + date_part('epoch', meeting_date)/(60*60*24)
IN (SELECT attendee_id*65536 + date_part('epoch', max(meeting_date))/(60*60*24) from meetings GROUP BY attendee_id);
Benchmarking
I have created a table with stucture as in the question and populated it with 100000 rows randomly selecting attendee_id from [1, 10000] and random date from range [1970-01-01, 2017-09-16]. I have benchmarked (with EXPLAIN ANALYZE) queries with the following techniques:
Correlated subquery
SELECT * FROM meetings m1 WHERE m1.meeting_date=
(SELECT max(m2.meeting_date) FROM meetings m2 WHERE m2.attendee_id=m1.attendee_id);
Execution time: 873260.878 ms
Join subquery with group by
SELECT * FROM meetings m
JOIN (SELECT attendee_id, max(meeting_date) from meetings GROUP BY attendee_id) attendee_max_date
ON attendee_max_date.attendee_id = m.attendee_id;</code>
Execution time: 103.427 ms
Use pair (attendee_id, date) as a key
Concat attendee_id and meeting_date as strings
SELECT * FROM meetings WHERE concat(attendee_id, meeting_date) IN
(SELECT concat(attendee_id, max(meeting_date)) from meetings GROUP BY attendee_id);
Execution time: 207.720 ms
Encode attendee_id and meeting_date to a single number(code)
SELECT * FROM meetings
WHERE attendee_id*65536 + date_part('epoch',meeting_date)/(60*60*24)
IN (SELECT attendee_id*65536 + date_part('epoch',max(meeting_date))/(60*60*24) from meetings GROUP BY attendee_id);
Execution time: 127.595 ms
Here is a git with table scheme, table data (as csv), code for populating table, and queries.

Try this
SELECT MAX(m.MeetingDate) FROM Meeting m

Related

Access the last entrys in the database with a java program

I am currently writing a Java program.
Brief description of the component:
I have an "Entries" table.
This table has the following columns:
Date (which is entered automatically when the user makes a double entry)
Input (This is the double number from the user)
With 5 entries, for example, the program should now access the last 2 entries made by the user and reflect them in the program.
For example, the table looks like this:
Date --------- Entry
21.01.2022 -- 500
01.03.2022 -- 551
04.05.2022 -- 629
30.06.2022 -- 701
15.07.2022 -- 781
Then the program should give me the 701 and the 781.
What is the most sensible way to do this?
It makes no sense to use the following "SQL statement": Select where date 06/30/2022 because it is no longer useful when the user makes a new entry.
Please help!!
select Entry
from your_table
order by date desc -- show most recent entries above in the results
fetch first 2 rows only; -- show first 2 records only
You might be running on an old DB (version less than 12) so "fetch" might not be introduced.
Try this then
select *
from (select Entry
from your_table
order by date desc)
where rownum <= 2;
Answering to your question how to get a penultimate value
select *
from (select Entry, row_number() over(order by date_col desc) rn
from your_table)
where rn = 2;
the "rn = 2" condition will get you not the last date but a date before the last. Setting it to 1 will get you the row with most recent date
You can use the following SQL statement to select the last two rows:
select * from Entries order by date desc limit 2;

Multiple Comparison Conditions on Sort Key for DynamoDB?

I am using AWS SDK2 DynamoDB for an application with the following schema:
personId startDate endDate name age job
1 10/2/2013 10/3/2020 Bob 12 SWE
1 8/2/2013 10/3/2021 Bob 12 EE
2 10/2/2013 10/3/2021 Joe 17 Student
3 11/2/2013 10/3/2022 Kim 16 Boss
My goal is to be able to query the table by a personId and a date in order to retrieve a person object. Currently, I am thinking about having a partition key on personId; however, I am not sure how to design my sortKey.
For example, say I want a person with personId = 1 and date = 10/5/2019, I would expect the first and second entry of the example table to be returned because the date is between the startDate and endDate. How can I design the sortKey so that I can use an appropriate key condition expression to say something like date between startDate and endDate? I know that filter expression can be used, but I was wondering if there was a way to design the sort key so that filter is not needed as it is more costly.
No you can't.
Why do you need a sort key anyway? If personId is unique, you don't need a composite primary key (hash + sort).
And since you know the personId when reading, you can just use GetItem(table, HK=:personId)
Edit
Given the additional information that personId is not unique...
Have startDate be your sort key...
You'll be able to Query(table, HK=:personId, SK <= :date, ScanIndexForward=false, limit=1)
This will return the single record with the most recent start date less than or equal to the date you passed in.
The SQL equivalent would look like
select *
from table
where personId = :personId
and startDate <= :date
order by startDate desc
fetch first row only
EDIT2
Didn't see the updated info in the question. If a person can have overlapping jobs, then you'll need to leave off the limit and do the filtering (either in DDB or in your client app)
DDB isn't the best for everything, with data as you've laid out, you've basically got two choices, pay more for DDB RCUs or pay more for an RDB.
Having said that, I'll point out that a DDB RCU covers up to 4KB of data.
If your DDB rows are smaller, you can return multiple rows with a single Query() call costing only 1 RCU.
If you're row size is 512b, you could return 8 rows for a single RCU.

SQLite / Java Compare most recent time stamp with one from 24hr ago for each name

I am trying to get data from the most recent time stamp, and the time stamp from 24 hr ago for each name in the table. My current method makes two seperate queries and combines the results. This, however is quite slow, and also prevents me from sorting the data (by comments etc)
The below query gets the data from 24 hr ago (last)
SELECT Price, comment ,name, timestamp
FROM details INNER JOIN car ON details .ID=car.ID
WHERE timestamp >= datetime('now','-1 day') and name = 'BMW'
order by timestamp asc limit 1
I then have another similar query which returns data with the most recent time stamp (first).
I have a method in Java which contains the above queries, and passes in a new car name into the name = " " part. This returns first and last for each car, I then compare price and comment details and return the results.
However this is proving to be very slow process. And it also means that I cant order the results efficiently.
I have also tried with union, however it does provide the desired results
SELECT Price, comment ,name, max(timestamp)
FROM details INNER JOIN name ON details .ID=name .ID
UNION
SELECT Price, comment,name, min(timestamp)
FROM details INNER JOIN name ON details .ID=name .ID
WHERE timestamp >= datetime('now','-1 day')
group by name
order by comment desc
limit 40
What is the correct way to perform this query?
To get one output row for each name, use GROUP BY:
SELECT Price, comment, name, max(timestamp)
FROM details INNER JOIN name USING (ID)
GROUP BY name.ID
UNION ALL
SELECT Price, comment, name, min(timestamp)
FROM details INNER JOIN name USING (ID)
WHERE timestamp >= datetime('now','-1 day')
GROUP BY name.ID
ORDER BY ...;
Assuming that cars.name is unique (i.e. equivalent to cars.id and that you want the results on separate rows, you can do the aggregation in a subquery to get the two timestamps. Then, join in the additional information you want:
SELECT c.name, d.Price, d.comment, d.name, d.timestamp
FROM car c JOIN
details d
ON d.ID = c.ID JOIN
(SELECT dd.ID, MAX(dd.timestamp) as maxts, MIN(dd.timestamp) as mints
FROM detail dd
WHERE dd.timestamp >= datetime('now', '-1 day')
GROUP BY dd.ID
) dd
ON dd.ID = c.ID AND d.timestamp IN (dd.mints, dd.maxts)
ORDER BY timestamp asc;

How to count different values in one column of table in sql using java

I have a situation where i have to count number of strings of same type in one column of a table, for eg. a column would be having values such as A=absent P=present
A A A P P P P P P P A
So i need to count all the strings of same type, like P i.e Present that means query should give count result 7.
What can be idol query for this?
I think this is the simplest Query that I can think of in terms of SQL Select count(*) from table where attendence='P'
Update:
Ensure to use parameterized format of prepared statement in your Java code to prevent SQL Injection.
SELECT attendance, COUNT(*)
FROM Roster
GROUP BY attendance;
Will give you the count of each value in the column. So, for a table having 4 A values and 7 P values, the query will return
attendance | COUNT(*)
___________|_________
|
A | 4
P | 7
Aside: Since your table has repetitive values in its attendance column, you should consider pulling all possible values for attendance out into their own "enumeration" of sorts. SQL doesn't offer enumerations, but there are a few ways to achieve a similar effect.
One method is to create a look up table that contains an ID column (can be an auto-increment), and the values that you want for attendance. Then, in your Roster table, use a foreign key to reference the ID of the correct value.
To save yourself time in the future, you can create a View which uses the attendance values rather than the IDs.
You can read up on Database Normalization if you're interested in improving your database's design.

How to manage consecutive column values in table rows

A little presentation for what I want to do:
Consider the case where different people from a firm get, once a year, an all expenses paid trip to somewhere. There may be 1000 persons that could qualify for the trip but only 16 places are available.
Each of this 16 spots has an associated index which must be from 1 to 16. The ones on the reservation have index starting from 17.
The first 16 persons that apply get a definite spot on the trip. The rest end up on the reservation list. If one of the first 16 persons cancels, the first person with a reservation gets his place and all the indexes are renumbered to compensate for the person that canceled.
All of this is managed in a Java web app with an Oracle DB.
Now, my problem:
I have to manage the index in a correct way (all sequential, no duplicate indexes), with possible hundreds of people that simultaneously apply for the trip.
When inserting a record in the table for the trip, the way of getting the index is by
SELECT MAX(INDEX_NR) + 1 AS NEXT_INDEX_NR FROM TABLE
and using this as the new index (this is done Java side and then a new query to insert the record). It is obvious why we have multiple spots or reservations with the same index. So, we get, let’s say, 19 people on the trip because 4 of them have index 10, for example.
How can I manage this? I have been thinking of 3 ways so far:
Use an isolation level of Serializable for the DB transactions (don’t like this one);
Insert a record with no INDEX_NR and then have a trigger manage the things… in some way (never worked with triggers before);
Each record also has a UPDATED column. Could I use this in some way? (note that I can’t lose the INDEX_NR since other parts of the app make use of it).
Is there a best way to do this?
Why make it complicated ?
Just insert all reservations as they are entered and insert a timestamp of when they resevered a spot.
Then in you query just use the timestamp to sort them.
There is offcourse the chance that there are people that reserved a spot at the very same millisecond then just use a random method to assign order.
Why do you need to explicitly store the index? Instead you could store each person's order (which never changes) along with an active flag. In your example if person #16 pulls out you simply mark them as inactive.
To compute whether a person qualifies for the trip you simply count the number of active people with order less than that person:
select count(*)
from CompetitionEntry
where PersonOrder < 16
and Active = 1
This approach removes the need for bulk updates to the database (you only ever update one row) and hence mostly mitigates your problem of transactional integrity.
Another way would be to explicitly lock a record on another table on the select.
-- Initial Setup
CREATE TABLE NUMBER_SOURCE (ID NUMBER(4));
INSERT INTO NUMBER_SOURCE(ID) VALUES 0;
-- Your regular code
SELECT ID AS NEXT_INDEX_NR FROM NUMBER_SOURCE FOR UPDATE; -- lock!
UPDATE NUMBER_SOURCE SET ID = ID + 1;
INSERT INTO TABLE ....
COMMIT; -- releases lock!
No other transaction will be able to perform the query on the table NUMBER_SOURCE until the commit (or rollback).
When adding people to the table, give them an ID in such a way that the ID is ascending in the order in which they were added. This can be a timestamp.
Select all the records from the table which qualify, order by ID, and update their INDEX_NR
Select * from table where INDEX_NR <= 16 order by INDEX_NR
Step #2 seems complicated but it's actually quite simple:
update (
select *
from TABLE
where ...
order by ID
)
set INDEX_NR = INDEXSEQ.NEXTVAL
Don't forget to reset the sequence to 1.
Calculate your index in runtime:
CREATE OR REPLACE VIEW v_person
AS
SELECT id, name, ROW_NUMBER() OVER (ORDER BY id) AS index_rn
FROM t_person
CREATE OR REPLACE TRIGGER trg_person_ii
INSTEAD OF INSERT ON v_person
BEGIN
INSERT
INTO t_person (id, name)
VALUES (:new.id, :new.name);
END;
CREATE OR REPLACE TRIGGER trg_person_iu
INSTEAD OF UPDATE ON v_person
BEGIN
UPDATE t_person
SET id = :new.id,
name = :new.name
WHERE id = :old.id;
END;
CREATE OR REPLACE TRIGGER trg_person_id
INSTEAD OF DELETE ON v_person
BEGIN
DELETE
FROM t_person
WHERE id = :old.id;
END;
INSERT
INTO v_person
VALUES (1, 'test', 1)
SELECT *
FROM v_person
--
id name index_rn
1 test 1
INSERT
INTO v_person
VALUES (2, 'test 2', 1)
SELECT *
FROM v_person
--
id name index_rn
1 test 1
2 test 2 2
DELETE
FROM v_person
WHERE id = 1
SELECT *
FROM v_person
--
id name index_rn
2 test 2 1
"I have to manage the index in a correct way (all sequential, no duplicate indexes), with possible hundreds of people that simultaneously apply for the trip.
When inserting a record in the table for the trip, the way of getting the index is by
SELECT MAX(INDEX_NR) + 1 AS NEXT_INDEX_NR FROM TABLE
and using this as the new index (this is done Java side and then a new query to insert the record). It is obvious why we have multiple spots or reservations with the same index."
Yeah. Oracle's MVCC ("snapshot isolation") used incorrectly by someone who shouldn't have been in IT to begin with.
Really, Peter is right. Your index number is, or rather should be, a sort of "ranking number" on the ordered timestamps that he mentions (this holds a requirement that the DBMS can guarantee that any timestamp value appears only once in the entire database).
You say you are concerned with "regression bugs". I say "Why do you need to be concerned with "regression bugs" in an application that is DEMONSTRABLY beyond curing ?". Because your bosses paid a lot of money for the crap they've been given and you don't want to be the pianist that gets shot for bringing the message ?
The solution depends on what you have under your control. I assume that you can change both database and Java code, but refrain from modifying the database scheme since you had to adapt too much Java code otherwise.
A cheap solution might be to add a uniqueness constraint on the pair (trip_id, index_nr) or just on index_nr if there is just one trip. Additionally add a check contraint check(index_nr > 0) - unless index_nr is already unsigned. Everything else is then done in Java: When inserting a new applicant as described by you, you have to add code catching the exception when someone else got inserted concurrently. If some record is updated or deleted, you either have to live with holes between sequence numbers (by selecting the 16 candidates with the lowest index_nr as shown by Quassnoi in his view) or fill them up by hand (similarily to what Aaron suggested) after every update/delete.
If index_nr is mostly used in the application as read-only, a better solution might be to combine the answers of Peter and Quassnoi: Use either a time stamp (automatically inserted by the database by defining the current time as default) or an auto-incremented integer (as default inserted by the database) as value stored in the table. And use a view (like the one defined by Quassnoi) to access the table and the automatically calculated index_nr from Java. But also define both constraints like for the cheap solution.

Categories