Postgres data with interval of 8 hours - java

I have a table with large number of rows. It has column like timestamp(in millis), value, and a siteId(foreign key). I want to fetch data from that of last three months with an interval in timestamp of 8 hours and I want to fetch data of all siteId in the three month timestamp. I have data in there for every 5 minutes of every siteId. If I fetch data of last three months, it is coming in millions. so I want to take data of every 8 hours. Sometimes, there can be a gap too so if a siteId was not there for the 8th hour, it should get its next data which can be 5 minutes past(or 10minutes past...) of that 8th hour.
Its hard to create a query for that and normal fetching and massaging the data in afterwards will take time.
I am using postgres, java and JPA. If I can do it via query or via some JPA utility to ease the CPU? I want to drop the time taken(right now 9 seconds for each query) to the least. Can you guys help me? Thanks in advance
My Table structure:
| timestamp | value | siteId |
|----------------|-------|--------|
| 1610370000000 | 22 | 123 |
| 1610370700000 | 21 | 123 |
| 1610370028000 | 22 | 123 |
| 1610369889000 | 23 | 123 |
| 1610370000000 | 22 | 124 |
| 1613534400000 | 21 | 124 |
| 1610369889000 | 22 | 124 |
| 1610370005000 | 23 | 125 |
So every site is having data for every 5 minutes. I want data of last three months with interval of at least 8 hours of every site. Hope this helps

Assuming you want same data structure like your example in question from last 3 months on 8 hours interval for each siteID.
Try this:
select
distinct on (siteId, "group" ) siteId, value, timestamp_,
ceil((extract(epoch from current_timestamp)*1000-timestamp_)/28800000) "group"
from test
where to_timestamp(timestamp_/1000) between current_timestamp - interval '3 month' and current_timestamp
order by 1,4,3
DEMO
Here I am dividing the difference of current_timestamp and timestamp_ field with 2880000(8*60*60*1000) to get the group and getting the first value of the group by using distinct on.
You can switch order by from order by 1,4,3 (return the value of min timestamp_ of range) to order by 1,4,3 desc (return the value of max timestamp_ of range) to check the correct result.
I am not sure about performance. But is should work better than java fetching.

If you need all data but you face issue with a transfer of huge data amount I would recommend to use pagination approach.
https://www.baeldung.com/jpa-pagination

Related

How to set an a second Auto increment column per user?

I know this question has been asked before and most of the answers warn about doing so or suggest a solution for MyISAM but I'm using InnoDB.
I'm trying to generate an Invoice or Bill for the user for him to give out to me (Business plan requirement) !
The thing is that in my country the reference of the ID of the bill for the same person should be ordered or he will be audited for the missing bills. Like for example he gave me one bill with 0001 and the second one is 0005. He will be interrogated for the missing 4.
So I need to have a custom auto-increment per UserID.
User 1 - idUser= 1 ,idBill = 1
User 1 - idUser= 1 ,idBill = 2
User 2 - idUsr = 2 , idBill = 1
Some threads suggested using triggers while others warned about table locks. I personally not familiar with triggers so I steer away from them since they require maintenance.
I am using Java and MySQL.
An example:
CREATE TABLE main (id INT AUTO_INCREMENT PRIMARY KEY,
primary_id CHAR(3),
secondary_id INT) ENGINE = InnoDB;
CREATE TABLE auxiliary (primary_id CHAR(3),
secondary_id INT AUTO_INCREMENT,
PRIMARY KEY (primary_id, secondary_id)) ENGINE = MyISAM;
CREATE TRIGGER generate_secondary_id
BEFORE INSERT
ON main
FOR EACH ROW
BEGIN
INSERT INTO auxiliary (primary_id) VALUES (NEW.primary_id);
SET NEW.secondary_id = LAST_INSERT_ID();
END
INSERT INTO main (primary_id) VALUES
('A01'),
('A01'),
('B01'),
('C01'),
('A01'),
('B01'),
('A01'),
('B01');
SELECT * FROM main;
id | primary_id | secondary_id
-: | :--------- | -----------:
1 | A01 | 1
2 | A01 | 2
3 | B01 | 1
4 | C01 | 1
5 | A01 | 3
6 | B01 | 2
7 | A01 | 4
8 | B01 | 3
db<>fiddle here

Database schema for leave report with changing structure

I have a requirement to show leave history and forecast. The data is received weekly in a report which I need to store in a table. I can use any DB supported by Java.
A sample of the data looks like this:
To be able to show past totals by department I need to store the data that comes out in the report each week.
How to store the forecast data, as the data structure of the report keeps changing. In the sample above the last 12 columns are the 12 months following the date the report was run. Next month the first column will be October etc.
I have create a fiddle here
I have considered just storing the last 4 weeks of reports (each report in a separate table) and inserting work group totals into a separate totals table where each row would represent a department and its totals.
If there is a better way - what sort of data structure/schema should I use?
I can think of 3 approaches:
You can add a date and forecast column and then get rid of the columns that are named after month/years. It's like transpose action in Excel. Additionally, since Dept, Leave_Balance, projected_balance_6m will not be in the same grain as the new columns, I'd create a new table. Example rows from the new table would be like:
+------------+-----------+----------+
| EmployeeID | YearMonth | Forecast |
+------------+-----------+----------+
| 456 | 201901 | 0 |
| 456 | 201902 | 5 |
+------------+-----------+----------+
Again in a new table, you can add a year column and make the forecast column names to resemble months. This wouldn't be continuous as your current solution but easier to handle in the BI software.
+------------+------+-----+-----+-----+-----+-----+-----+
| EmployeeID | Year | Jan | Feb | Mar | Apr | May | Jun |
+------------+------+-----+-----+-----+-----+-----+-----+
| 456 | 2019 | 0 | 0 | 0 | 0 | 0 | 0 |
| 456 | 2020 | 0 | 5 | 0 | 6 | 0 | 0 |
| 123 | 2020 | 0 | 0 | 1 | 0 | 0 | 0 |
+------------+------+-----+-----+-----+-----+-----+-----+
Other approach could be to rename columns relative to current date. Here, cur is SEPT19, cur+1 is OCT19 and so on. This solution will have the least impact but, drawback of this approach is, it is not clear when you last updated the table, and what cur value is actually. So, that information should be made available somewhere.
+-----+------+-------+---------------+--------------+-----+-------+-------+
| ID | Name | Dept | Leave_Balance | p_balance_6m | cur | cur+1 | cur+2 |
+-----+------+-------+---------------+--------------+-----+-------+-------+
| 456 | Mary | Sales | 32.3 | 45.6 | 0 | 0 | 0 |
+-----+------+-------+---------------+--------------+-----+-------+-------+
I like the first and second solutions more because they are more self contained. Your choice would depend on how much you want to rely on BI software (Tableau, Qlikview etc).

SQL count occurrence of every record (integer) in one table with multiple columns

I am working on an algorithm, using SQL and JAVA, concerning big datasets.
In SQL I have a table with all the data and I want to use as much of SQL queries as possible before loading it into JAVA.
I generate random datasets (in Java), consisting exclusively of integers between 1 and 40001 and then insert them into a MySQL table.
The rows can be of different lengths, with a maximum of 30 items/records (this includes the ID). So normally the amount of columns is 30 (so COL1, COL2, COL3,......COL30) but this amount will also be random at some point
What I want to do is count the occurrence of every distinct item in a table/dataset and put them in a new table with their count. This however is tricky since I want to count it over the entire table, not just one column. How do I do this?
To specify:
Take this table for example (this is a very small one in comparison with my usual tables):
ID | COL1 | COL2 | COL3 | COL4 | COL5 |
---------------------------------------
1 | 8 | 35 | 42 | 12 | 27 |
2 | 22 | 42 | 35 | 8 | NULL |
3 | 18 | 22 | 8 | NULL | NULL |
4 | 42 | 12 | 27 | 35 | 8 |
5 | 18 | 27 | 12 | 22 | NULL |
What I want to extract from this table is this:
Item | Count
-------------
8 | 3
35 | 3
40 | 1
12 | 3
27 | 3
22 | 3
42 | 2
43 | 1
18 | 2
It is also the case that an item can't be in the same row more than once, if that helps.
Can anyone help me? Or can it just simply not be done in SQL? Would it be better to do this in JAVA, performance-wise?
Thanks in advance!
You can do this by unpivoting the data and then aggregating:
select col, count(*)
from (select col1 as col from t union all
select col2 from t union all
. . .
select col30 from t
) t
group by col;
If you don't have a known set of columns, then you will need to use dynamic SQL.

Java & MySQL- Rankings

I am making a game and I require a ranking system. I already save all the stats like kills, deaths, wins, innocent shots, etc with MySQL. I am clueless a the moment on how I would be able to rank everyone. I want to have it over MySQL but they will be updated very quickly. I was thinking I could load all ranks in a HashMap when the game starts but that would be very ineffective since there are thousands of players. I want to also use most of the stats to work this out. Could someone explain to me how I would be able to do this? Thanks!
One way would be to use mysql Events to trigger a stored procedure. The stored procedure would execute the ranking and store the rank in the db. You would then just set the event trigger time to whatever you wanted, say 10 minutes.
mysql events: https://dev.mysql.com/doc/refman/5.7/en/events.html
CREATE
[DEFINER = { user | CURRENT_USER }]
EVENT
[IF NOT EXISTS]
event_name
ON SCHEDULE schedule
[ON COMPLETION [NOT] PRESERVE]
[ENABLE | DISABLE | DISABLE ON SLAVE]
[COMMENT 'comment']
DO event_body;
schedule:
AT timestamp [+ INTERVAL interval] ...
| EVERY interval
[STARTS timestamp [+ INTERVAL interval] ...]
[ENDS timestamp [+ INTERVAL interval] ...]
interval:
quantity {YEAR | QUARTER | MONTH | DAY | HOUR | MINUTE |
WEEK | SECOND | YEAR_MONTH | DAY_HOUR | DAY_MINUTE |
DAY_SECOND | HOUR_MINUTE | HOUR_SECOND | MINUTE_SECOND}
Then you would setup a stored procedure to generate the rank and set it for your players: https://dev.mysql.com/doc/refman/5.7/en/stored-programs-views.html
An example procedure would be:
CREATE PROCEDURE simpleproc (OUT param1 INT)
BEGIN
SELECT COUNT(*) INTO param1 FROM t;
END
You can make the procedures as complex as you need to.

How to process columns of an SQLite table in Java android?

I have an SQLite table like:
+---+-------------+----------------+
|_id| lap_time_ms |formatted_elapse|
+---+-------------+----------------+
| 1 | 5600 | 00:05.6 |
| 2 | 4612 | 00:04.6 |
| 3 | 4123 | 00:04.1 |
| 4 | 15033 | 00:15.0 |
| 5 | 4523 | 00:04.5 |
| 6 | 6246 | 00:06.2 |
Where lap_time_ms is an of the type long and represents the amount of time in milliseconds for a lap while formatter_elapse is a String that represents the formatted (displayable) form of the first column (elapse).
My question is that if I want to add (say) 5 seconds (5000) to each lap_time_ms then I use a statement like:
DB.execSQL("update table_name set KEY_ELAPSE=KEY_ELAPSE+5000);
Which works fine however the problem is that formatted_elapse still retains its outdated value!
So, what is the best way to update the values in the formatted_elapse column if I have a function like:
public static String getFormattedTime(long milliseconds) {
// custom code that processes the long
return processedString;
}
It may be a long shot (metaphorically speaking of course ;) but is it possible to have SQLite link the two columns such that if I update a lap_time_ms row, the formatted_elapse will automatically update appropriately.
Thanks in advance!
In theory, it would be possible to create a trigger to update that column, but only if the formatting can be done with some built-in SQLite function (Android does not allow user-defined functions):
CREATE TRIGGER update_formatted_elapse
AFTER UPDATE OF lap_time_ms ON MyTable
FOR EACH ROW
BEGIN
UPDATE MyTable
SET formatted_elapse = strftime('%M:%f', NEW.lap_time_ms, 'unixepoch')
WHERE _id = NEW._id;
END;
However, it would be bad design to store the formatted string in the database; it would be duplicated information that is in danger of becoming inconsistent.
Drop the formatted_elapse column and just call getFormattedTime in your code whenever you need it.

Categories