I need to limit multiple service usages for multiple customers. For example, customer customer1 can send max 1000 SMS per month. My implementation is based on one MySQL table with 3 columns:
date TIMESTAMP
name VARCHAR(128)
value INTEGER
For every service usage (sending SMS) one row is inserted to the table. value holds usage count (eg. if SMS was split to 2 parts then value = 2). name holds limiter name (eg. customer1-sms).
To find out how many times the service was used this month (March 2011), a simple query is executed:
SELECT SUM(value) FROM service_usage WHERE name = 'customer1-sms' AND date > '2011-03-01';
The problem is that this query is slow (0.3 sec). We are using indexes on columns date and name.
Is there some better way how to implement service usage limitation? My requirement is that it must be flexibile (eg. if I need to know usage within last 10 minutes or another within current month). I am using Java.
Thanks in advance
You should have one index on both columns, not two indexes on each of the columns. This should make the query very fast.
If it still doesn't, then you could use a table with a month, a name and a value, and increment the value for the current month each time an SMS is sent. This would remove the sum from your query. It would still need an index on (month, name) to be as fast as possible, though.
I found one solution to my problem. Instead of inserting service usage increment, I will insert the last one incremented:
BEGIN;
-- select last the value
SELECT value FROM service_usage WHERE name = %name ORDER BY date ASC LIMIT 1;
-- insert it to the database
INSERT INTO service_usage (CURRENT_TIMESTAMP, %name, %value + %increment);
COMMIT;
To find out service usage since %date:
SELECT value AS value1 FROM test WHERE name = %name ORDER BY date DESC LIMIT 1;
SELECT value AS value2 FROM test WHERE name = %name AND date <= %date ORDER BY date DESC LIMIT 1;
The result will be value1 - value2
This way I'll need transactions. I'll probably implement it as stored procedure.
Any additional hints are still appreciated :-)
It's worth trying to replace your "=" with "like". Not sure why, but in the past I've seen this perform far more quickly than the "=" operator on varchar columns.
SELECT SUM(value) FROM service_usage WHERE name like 'customer1-sms' AND date > '2011-03-01';
Edited after comments:
Okay, now I can sorta re-create your issue - the first time I run the query, it takes around 0.03 seconds, subsequent runs of the query take 0.001 second. Inserting new records causes the query to revert to 0.03 seconds.
Suggested solution:
COUNT does not show the same slow-down. I would change the business logic so every time the user sends and SMS you insert the a record with value "1"; if the message is a multipart message, simply insert two rows.
Replace the "sum" with a "count".
I've applied this to my test data, and even after inserting a new record, the "count" query returns in 0.001 second.
Related
I have a requirement where I need to generate an account number and insert it into a table column in the following format.
"TBA2222011300000001" = where "TBA" is the value of another column or user sent data "22220113" implies the current date and "00000001" is a seven digit sequence that needs to be incremented and appended for every insert.
How can I append the sequence to the column, Should I do it in java or can it be done at DB end. I am currently using postgres with java and spring boot.
https://www.postgresql.org/docs/current/ddl-generated-columns.html
A generated column is a special column that is always computed from
other columns.
Several restrictions apply to the definition of
generated columns and tables involving generated columns:
The generation expression can only use immutable functions and cannot use subqueries or reference anything other than the current row
in any way.
now() is mutable function, so you cannot use Generated columns.
I am not sure why Default not working.
https://www.postgresql.org/docs/current/ddl-default.html
So now the only option is trigger.
CREATE TABLE account_info(
account_id INT GENERATED ALWAYS AS IDENTITY,
account_type text not null,
acconut_number text ) ;
So what you want is to automate:
UPDATE account_info set
account_number =
concat(
account_type,
to_char(CURRENT_DATE, 'yyyymmdd'),
to_char(account_id, 'FM00000000'));
create the function
create or replace function update_account_number() returns trigger as $$
BEGIN
UPDATE account_info set
account_number =
concat(
account_type,
to_char(CURRENT_DATE, 'yyyymmdd'),
to_char(account_id, 'FM00000000'));
RETURN NULL;
end;
$$ LANGUAGE plpgSQL;
create the trigger:
CREATE OR REPLACE TRIGGER udpate_accout_number
AFTER INSERT ON account_info
FOR EACH ROW
EXECUTE FUNCTION update_account_number();
Have an id column which is identity in postgres with start and end index as required.
For your reference to create identity column as desired
https://www.postgresqltutorial.com/postgresql-identity-column/
Have 1 more column for createdDate.
Then account number is simply a derived value TBA + formatted(DATE) + formatted(Id).
Ex -
No not a trigger just a function. There won't be any account number column in your table. It will simply be a function which takes date and identity as input and gives account number as output. Since account number is only dependent on id and date. No need to store this value at all, whenever you need the account number just call that function. Account number will not exist at all. It will always be calculated based on id and date. Simple.
Refer this in the article
Method 1: Derived Value called "markup"
The first method we may want to add to this table is a accountNumber method, for calculating our accountNumber based on current date and id. Since this value will always be based on two other stored values, there is no sense in storing it (other than possibly in a pre-calculated index). To do this, we:
CREATE FUNCTION accountNumber(id,date) RETURNS varchar AS
$$ SELECT TBA + format(id) + format(date)
$$ LANGUAGE SQL IMMUTABLE;
You need to put logic for format(id) and format(date) as per your requirement.
There is no point of storing the value which can be easily derived from other 2 columns. It would unnecessary consume space. Also maintaining data integrity and checks will be an overhead.
Creating function for derived values
https://ledgersmbdev.blogspot.com/2012/08/postgresql-or-modelling-part-2-intro-to.html
You can use the function in output as well as search.
Index would also be utilized as required.
I did the following to generate the desired account number.
Created a new sequence and appended zeros to it.
select to_char(nextval('finance_accounts_id_seq'), 'fm00000000')
Got the Current date in java using DateTimeFormatter
DateTimeFormatter dmf = DateTimeFormatter.ofPattern("yyyyMMdd");
String date = LocalDate.now().format(dmf);
Got the "TBA" from request param of the user.
I am using AWS SDK2 DynamoDB for an application with the following schema:
personId startDate endDate name age job
1 10/2/2013 10/3/2020 Bob 12 SWE
1 8/2/2013 10/3/2021 Bob 12 EE
2 10/2/2013 10/3/2021 Joe 17 Student
3 11/2/2013 10/3/2022 Kim 16 Boss
My goal is to be able to query the table by a personId and a date in order to retrieve a person object. Currently, I am thinking about having a partition key on personId; however, I am not sure how to design my sortKey.
For example, say I want a person with personId = 1 and date = 10/5/2019, I would expect the first and second entry of the example table to be returned because the date is between the startDate and endDate. How can I design the sortKey so that I can use an appropriate key condition expression to say something like date between startDate and endDate? I know that filter expression can be used, but I was wondering if there was a way to design the sort key so that filter is not needed as it is more costly.
No you can't.
Why do you need a sort key anyway? If personId is unique, you don't need a composite primary key (hash + sort).
And since you know the personId when reading, you can just use GetItem(table, HK=:personId)
Edit
Given the additional information that personId is not unique...
Have startDate be your sort key...
You'll be able to Query(table, HK=:personId, SK <= :date, ScanIndexForward=false, limit=1)
This will return the single record with the most recent start date less than or equal to the date you passed in.
The SQL equivalent would look like
select *
from table
where personId = :personId
and startDate <= :date
order by startDate desc
fetch first row only
EDIT2
Didn't see the updated info in the question. If a person can have overlapping jobs, then you'll need to leave off the limit and do the filtering (either in DDB or in your client app)
DDB isn't the best for everything, with data as you've laid out, you've basically got two choices, pay more for DDB RCUs or pay more for an RDB.
Having said that, I'll point out that a DDB RCU covers up to 4KB of data.
If your DDB rows are smaller, you can return multiple rows with a single Query() call costing only 1 RCU.
If you're row size is 512b, you could return 8 rows for a single RCU.
We have an use-case where we keep a monotonically increasing key signifying the current version of a customer data. It is used in the system to figure out the most recent version and resolve the conflict for third party callers if there is delay between reading the data and its processing and they get multiple versions.
CUSTOMER_RESOURCE_ID CURRENT_VERSION
132323 1234
If something changes for this resource, we increment the version from 1234 to 1235 (it is totally fine even if we increase it to 1300 as long as it doesn't go down). For this, we need to first read the value and then update it.
Other alternative is to use DB's timestamp and keep updating version with the DB timestamp which will always be increasing. Since this is just one system, clock-skew can only happen when we change the DB. Also, we are not super concerned about the the case when multiple threads update the data within a fraction of time (i.e. least granularity of the timestamp) as we have another lock as per which only one thread updates the resource at one time.
I was wondering if we could use database's system timestamp to avoid the select-and-increment with just-update.
Is there any concern with this approach? I assume that it will be less overhead on the database but I don't know how much we save here.
Many approaches can be discussed. In my opinion;
-- You can use incremental squence that even default column value instead of database timestamp. Maybe miliseconds can make problem for you.
--In addition, If your table is not very big and your resources is enough, you can add a is_last_version column which will update with every insert to query last version of customers in a perfomance(ie. select * from customers where customer_id = 123 and is_last_version= 1--to get rid of the cost of ordering). So, you can know which row is the last version. But each insert time will be longer. You should test it.
Example:
CUSTOMER_RESOURCE:
CUSTOMER_RESOURCE_ID CURRENT_VERSION_ID IS_LAST_VERSION INSERT_TIME
(default seq) (default sysdate --optionally)
132323 1 1
132324 2 1
132325 3 1
132326 4 0
132327 5 0
132328 6 1
132326 7 0
132329 8 1
132326 9 1
132327 10 1
When insert:
update CUSTOMER_RESOURCE
SET IS_LAST_VERSION = 0
where CUSTOMER_RESOURCE_ID = 132326;
insert into CUSTOMER_RESOURCE(CUSTOMER_RESOURCE_ID,IS_LAST_VERSION) VALUES (132326,1);
COMMIT;
When select Last Version of a customer:
select * from customers where customer_id = 123 and is_last_version= 1;
A little presentation for what I want to do:
Consider the case where different people from a firm get, once a year, an all expenses paid trip to somewhere. There may be 1000 persons that could qualify for the trip but only 16 places are available.
Each of this 16 spots has an associated index which must be from 1 to 16. The ones on the reservation have index starting from 17.
The first 16 persons that apply get a definite spot on the trip. The rest end up on the reservation list. If one of the first 16 persons cancels, the first person with a reservation gets his place and all the indexes are renumbered to compensate for the person that canceled.
All of this is managed in a Java web app with an Oracle DB.
Now, my problem:
I have to manage the index in a correct way (all sequential, no duplicate indexes), with possible hundreds of people that simultaneously apply for the trip.
When inserting a record in the table for the trip, the way of getting the index is by
SELECT MAX(INDEX_NR) + 1 AS NEXT_INDEX_NR FROM TABLE
and using this as the new index (this is done Java side and then a new query to insert the record). It is obvious why we have multiple spots or reservations with the same index. So, we get, let’s say, 19 people on the trip because 4 of them have index 10, for example.
How can I manage this? I have been thinking of 3 ways so far:
Use an isolation level of Serializable for the DB transactions (don’t like this one);
Insert a record with no INDEX_NR and then have a trigger manage the things… in some way (never worked with triggers before);
Each record also has a UPDATED column. Could I use this in some way? (note that I can’t lose the INDEX_NR since other parts of the app make use of it).
Is there a best way to do this?
Why make it complicated ?
Just insert all reservations as they are entered and insert a timestamp of when they resevered a spot.
Then in you query just use the timestamp to sort them.
There is offcourse the chance that there are people that reserved a spot at the very same millisecond then just use a random method to assign order.
Why do you need to explicitly store the index? Instead you could store each person's order (which never changes) along with an active flag. In your example if person #16 pulls out you simply mark them as inactive.
To compute whether a person qualifies for the trip you simply count the number of active people with order less than that person:
select count(*)
from CompetitionEntry
where PersonOrder < 16
and Active = 1
This approach removes the need for bulk updates to the database (you only ever update one row) and hence mostly mitigates your problem of transactional integrity.
Another way would be to explicitly lock a record on another table on the select.
-- Initial Setup
CREATE TABLE NUMBER_SOURCE (ID NUMBER(4));
INSERT INTO NUMBER_SOURCE(ID) VALUES 0;
-- Your regular code
SELECT ID AS NEXT_INDEX_NR FROM NUMBER_SOURCE FOR UPDATE; -- lock!
UPDATE NUMBER_SOURCE SET ID = ID + 1;
INSERT INTO TABLE ....
COMMIT; -- releases lock!
No other transaction will be able to perform the query on the table NUMBER_SOURCE until the commit (or rollback).
When adding people to the table, give them an ID in such a way that the ID is ascending in the order in which they were added. This can be a timestamp.
Select all the records from the table which qualify, order by ID, and update their INDEX_NR
Select * from table where INDEX_NR <= 16 order by INDEX_NR
Step #2 seems complicated but it's actually quite simple:
update (
select *
from TABLE
where ...
order by ID
)
set INDEX_NR = INDEXSEQ.NEXTVAL
Don't forget to reset the sequence to 1.
Calculate your index in runtime:
CREATE OR REPLACE VIEW v_person
AS
SELECT id, name, ROW_NUMBER() OVER (ORDER BY id) AS index_rn
FROM t_person
CREATE OR REPLACE TRIGGER trg_person_ii
INSTEAD OF INSERT ON v_person
BEGIN
INSERT
INTO t_person (id, name)
VALUES (:new.id, :new.name);
END;
CREATE OR REPLACE TRIGGER trg_person_iu
INSTEAD OF UPDATE ON v_person
BEGIN
UPDATE t_person
SET id = :new.id,
name = :new.name
WHERE id = :old.id;
END;
CREATE OR REPLACE TRIGGER trg_person_id
INSTEAD OF DELETE ON v_person
BEGIN
DELETE
FROM t_person
WHERE id = :old.id;
END;
INSERT
INTO v_person
VALUES (1, 'test', 1)
SELECT *
FROM v_person
--
id name index_rn
1 test 1
INSERT
INTO v_person
VALUES (2, 'test 2', 1)
SELECT *
FROM v_person
--
id name index_rn
1 test 1
2 test 2 2
DELETE
FROM v_person
WHERE id = 1
SELECT *
FROM v_person
--
id name index_rn
2 test 2 1
"I have to manage the index in a correct way (all sequential, no duplicate indexes), with possible hundreds of people that simultaneously apply for the trip.
When inserting a record in the table for the trip, the way of getting the index is by
SELECT MAX(INDEX_NR) + 1 AS NEXT_INDEX_NR FROM TABLE
and using this as the new index (this is done Java side and then a new query to insert the record). It is obvious why we have multiple spots or reservations with the same index."
Yeah. Oracle's MVCC ("snapshot isolation") used incorrectly by someone who shouldn't have been in IT to begin with.
Really, Peter is right. Your index number is, or rather should be, a sort of "ranking number" on the ordered timestamps that he mentions (this holds a requirement that the DBMS can guarantee that any timestamp value appears only once in the entire database).
You say you are concerned with "regression bugs". I say "Why do you need to be concerned with "regression bugs" in an application that is DEMONSTRABLY beyond curing ?". Because your bosses paid a lot of money for the crap they've been given and you don't want to be the pianist that gets shot for bringing the message ?
The solution depends on what you have under your control. I assume that you can change both database and Java code, but refrain from modifying the database scheme since you had to adapt too much Java code otherwise.
A cheap solution might be to add a uniqueness constraint on the pair (trip_id, index_nr) or just on index_nr if there is just one trip. Additionally add a check contraint check(index_nr > 0) - unless index_nr is already unsigned. Everything else is then done in Java: When inserting a new applicant as described by you, you have to add code catching the exception when someone else got inserted concurrently. If some record is updated or deleted, you either have to live with holes between sequence numbers (by selecting the 16 candidates with the lowest index_nr as shown by Quassnoi in his view) or fill them up by hand (similarily to what Aaron suggested) after every update/delete.
If index_nr is mostly used in the application as read-only, a better solution might be to combine the answers of Peter and Quassnoi: Use either a time stamp (automatically inserted by the database by defining the current time as default) or an auto-incremented integer (as default inserted by the database) as value stored in the table. And use a view (like the one defined by Quassnoi) to access the table and the automatically calculated index_nr from Java. But also define both constraints like for the cheap solution.
How do I implement paging in Hibernate? The Query objects has methods called setMaxResults and setFirstResult which are certainly helpful. But where can I get the total number of results, so that I can show link to last page of results, and print things such as results 200 to 250 of xxx?
You can use Query.setMaxResults(int results) and Query.setFirstResult(int offset).
Editing too: There's no way to know how many results you'll get. So, first you must query with "select count(*)...". A little ugly, IMHO.
You must do a separate query to get the max results...and in the case where between time A of the first time the client issues a paging request to time B when another request is issued, if new records are added or some records now fit the criteria then you have to query the max again to reflect such. I usually do this in HQL like this
Integer count = (Integer) session.createQuery("select count(*) from ....").uniqueResult();
for Criteria queries I usually push my data into a DTO like this
ScrollableResults scrollable = criteria.scroll(ScrollMode.SCROLL_INSENSITIVE);
if(scrollable.last()){//returns true if there is a resultset
genericDTO.setTotalCount(scrollable.getRowNumber() + 1);
criteria.setFirstResult(command.getStart())
.setMaxResults(command.getLimit());
genericDTO.setLineItems(Collections.unmodifiableList(criteria.list()));
}
scrollable.close();
return genericDTO;
you could perform two queries - a count(*) type query, which should be cheap if you are not joining too many tables together, and a second query that has the limits set. Then you know how many items exists but only grab the ones being viewed.
You can do one thing. just prepare Criteria query as per your busness requirement with all Predicates , sorting , searching etc.
and then do as below :-
CriteriaBuilder criteriaBuilder = em.getCriteriaBuilder();
CriteriaQuery<Feedback> criteriaQuery = criteriaBuilder.createQuery(Feedback.class);
//Just Prepare your all Predicates as per your business need.
//eg :-
yourPredicateAsPerYourBusnessNeed = criteriaBuilder.equal(Root.get("applicationName"), applicationName);
criteriaQuery.where(yourPredicateAsPerYourBusnessNeed).distinct(true);
TypedQuery<Feedback> criteriaQueryWithPredicate = em.createQuery(criteriaQuery);
//Getting total Count Here
Long totalCount = criteriaQueryWithPredicate.getResultStream().distinct().count();
Now we have our actual data with us as above with total count , right.
So now we can apply pagination on the data we have in our hand above , as below :-
List<Feedback> feedbackList = criteriaQueryWithPredicate.setFirstResult(offset).setMaxResults(pageSize).getResultList();
Now You can prepare a wrapper with your List return by DB along with the totalCount , startingPageNo that is offset here in this case, page Size etc and can return to your service / controller class.
I am 101 % sure , this will solve your problem, Because I was facing same problem and sorted it out same way.
Thanks- Sunil Kumar Mali
You can just setMaxResults to the maximum number of rows you want returned. There is no harm in setting this value greater than the number of actual rows available. The problem the other solutions is they assume the ordering of records remains the same each repeat of the query, and there are no changes going on between commands.
To avoid that if you really want to scroll through results, it is best to use the ScrollableResults. Don't throw this object away between paging, but use it to keep the records in the same order. To find out the number of records from the ScrollableResults, you can simply move to the last() position, and then get the row number. Remember to add 1 to this value, since row numbers start counting at 0.
I personally think you should handle the paging in the front-end. I know this isn't that efficiënt but at least it would be less error prone.
If you would use the count(*) thing what would happen if records get deleted from the table in between requests for a certain page? Lots of things could go wrong this way.