Why is Oracle Pivot producing non-existent results?

Why is Oracle Pivot producing non-existent results? - java

I manage a database holding a large amount of climate data collected from various stations. It's an Oracle 12.2 DB, and here's a synopsis of the relevant tables:
FACT = individual measurements at a particular time
UTC_START = time in UTC at which the measurement began
LST_START = time in local standard time (to the particular station) at which the measurement began
SERIES_ID = ID of the series to which the measurement belongs (FK to SERIES)
STATION_ID = ID of the station at which the measurement occurred (FK to STATION)
VALUE = value of the measurement
Note that UTC_START and LST_START always have a constant difference per station (the LST offset from UTC). I have confirmed that there are no instances where the difference between UTC_START and LST_START is anything other than what is expected.
SERIES = descriptive data for series of data
SERIES_ID = ID of the series (PK)
NAME = text name of the series (e.g. Temperature)
STATION = descriptive data for stations
STATION_ID = ID of the station (PK)
SITE_ID = ID of the site at which a station is located (most sites have one station, but a handful have 2)
SITE_RANK = rank of the station within the site if there are more than 1 stations.
EXT_ID = external ID for a site (provided to us)
The EXT_ID of a site applies to all stations at that site (but may not be populated unless SITE_RANK == 1, not ideal, I know, but not the issue here), and data from lower ranked stations is preferred. To organize this data into a consumable format, we're using a PIVOT to collect measurements occurring at the same site/time into rows.
Here's the query:
WITH
primaries AS (
SELECT site_id, ext_id
FROM station
WHERE site_rank = 1
),
data as (
SELECT d.site_id, d.utc_start, d.lst_start, s.name, d.value FROM (
SELECT s.site_id, f.utc_start, f.lst_start, f.series_id, f.value,
ROW_NUMBER() over (PARTITION BY s.site_id, f.utc_start, f.series_id ORDER BY s.site_rank) as ORDINAL
FROM fact f
JOIN station s on f.station_id = s.station_id
) d
JOIN series s ON d.series_id = s.series_id
WHERE d.ordinal = 1
AND d.site_id = ?
AND d.utc_start >= ?
AND d.utc_start < ?
)
records as (
SELECT * FROM data
PIVOT (
MAX(VALUE) AS VALUE
FOR NAME IN (
-- these are a few series that we would want to collect by UTC_START
't5' as t5,
'p5' as p5,
'solrad' as solrad,
'str' as str,
'stc_05' as stc_05,
'rh' as rh,
'smv005_05' as smv005_05,
'st005_05' as st005_05,
'wind' as wind,
'wet1' as wet1
)
)
)
SELECT r.*, p.ext_id
FROM records r JOIN primaries p on r.site_id = p.site_id
Here's where things get odd. This query works perfectly in SQLAlchemy, IntelliJ (using OJDBC thin), and Orcale SQL Developer. But when it's run from within our Java program (same JDBC urls, and credentials, using plain old JDBC statments and result sets), it gives results that don't make sense. Specifically for the same station, it will return 2 rows with the same UTC_START, but different LST_START (recall that I have verified that this 100% does not occur anywhere in the FACT table). Just to ensure there was no weird parameter handling going on, we tested hard-coding values in for the placeholders, and copy-and-pasted the exact same query between various clients, and the only one that returns these strange results is the Java program (which is using the exact same OJDBC jar as IntelliJ).
If anyone has any insight or possible causes, it would be greatly appreciated. We're at a bit of a loss right now.

It turns out that Nathan's comment was correct. Though it seems counter-intuitive (to me, at least), it appears that calling ResultSet.getString on a DATE column will in fact convert to Timestamp first. Timestamp has the unfortunate default behavior of using the system default timezone unless you specify otherwise explicitly.
This default behavior meant that daylight saving's time was taken into account when we didn't intend it to be, leading to the odd behavior described.

Related

Make statistics out of a SQL table [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I have a table in my database where I register readings from several sensors this way:
CREATE TABLE [test].[readings] (
[timestamp_utc] DATETIME2(0) NOT NULL, -- 48bits
[sensor_id] INT NOT NULL, -- 32 bits
[site_id] INT NOT NULL, -- 32 bits
[reading] REAL NOT NULL, -- 64 bits
PRIMARY KEY([timestamp_utc], [sensor_id], [site_id])
)
CREATE TABLE [test].[sensors] (
[sensor_id] int NOT NULL ,
[measurement_type_id] int NOT NULL,
[site_id] int NOT NULL ,
[description] varchar(255) NULL ,
PRIMARY KEY ([sensor_id], [site_id])
)
And I want to easily make statistics out of all these readings.
Some queries I would like to do:
Get me all readings for site_id = X between date_hour1 and date_hour2
Get me all readings for site_id = X and sensor_id in <list> between date_hour1 and date_hour2
Get me all readings for site_id = X and sensor measurement type = Z between date_hour1 and date_hour2
Get me all readings for site_id = X, aggregated (average) by DAY between date_hour1 and date_hour2
Get me all readings for site_id = X, aggregated (average) by DAY between date_hour1 and date_hour2 but in UTC+3 (this should give a different result than previous query because now the beginning and ending of days are shifted by 3h)
Get me min, max, std, mean for all readings for site_id = X between date_hour1 and date_hour2
So far I have been using Java to query the database and perform all this processing locally. But this ends up a bit slow and the code stays a mess to write and maintain (too much cicles, generic functions to perform repeated tasks, large/verbose code base, etc)...
To make things worse, table readings is huge (hence the importance of the primary key, which is also a performance index), and maybe I should be using a TimeSeries database for this (are there any good ones?). I am using SQL Server.
What is the best way to do this? I feel I am reinventing the wheel because all of this is kinda of an analytics app...
I know these queries sound simple, but when you try to parametrize all this you can end up with a monster like this:
-- Sums all device readings, returns timestamps in localtime according to utcOffset (if utcOffset = 00:00, then timestamps are in UTC)
CREATE PROCEDURE upranking.getSumOfReadingsForDevices
#facilityId int,
#deviceIds varchar(MAX),
#beginTS datetime2,
#endTS datetime2,
#utcOffset varchar(6),
#resolution varchar(6) -- NO, HOURS, DAYS, MONTHS, YEARS
AS BEGIN
SET NOCOUNT ON -- http://stackoverflow.com/questions/24428928/jdbc-sql-error-statement-did-not-return-a-result-set
DECLARE #deviceIdsList TABLE (
id int NOT NULL
);
DECLARE #beginBoundary datetime2,
#endBoundary datetime2;
SELECT #beginBoundary = DATEADD(day, -1, #beginTS);
SELECT #endBoundary = DATEADD(day, 1, #endTS);
-- We shift sign from the offset because we are going to convert the zone for the entire table and not beginTS endTS themselves
SELECT #utcOffset = CASE WHEN LEFT(#utcOffset, 1) = '+' THEN STUFF(#utcOffset, 1, 1, '-') ELSE STUFF(#utcOffset, 1, 1, '+') END
INSERT INTO #deviceIdsList
SELECT convert(int, value) FROM string_split(#deviceIds, ',');
SELECT SUM(reading) as reading,
timestamp_local
FROM (
SELECT reading,
upranking.add_timeoffset_to_datetime2(timestamp_utc, #utcOffset, #resolution) as timestamp_local
FROM upranking.readings
WHERE
device_id IN (SELECT id FROM #deviceIdsList)
AND facility_id = #facilityId
AND timestamp_utc BETWEEN #beginBoundary AND #endBoundary
) as innertbl
WHERE timestamp_local BETWEEN #beginTS AND #endTS
GROUP BY timestamp_local
ORDER BY timestamp_local
END
GO
This is a query that receives the site id (facilityId in this case), the list of sensor ids (the deviceIds in this case), the beginning and the ending timestamps, followed by their UTC offset in a string like "+xx:xx" or "-xx:xx", terminating with the resolution which will basically say how the result will be aggregated by SUM (taking the UTC offset into consideration).
And since I am using Java, at first glance I could use Hibernate or something, but I feel Hibernate wasn't made for these type of queries.

Your structure looks good at first glance but looking at your queries it makes me think that there are tweaks you may want to try. Performance is never an easy subject and it is not easy to find an "one size fits all answer". Here a some considerations:
Do you want better read or write performance? If you want better read performance you need to reconsider your indexes. Sure you have a primary key but most of your queries don't make use of it (all three fields). Try creating an index for [sensor_id], [site_id].
Can you use cache? If some searches are recurrent and your app is the single point of entry to your database, then evaluate if your use cases would benefit from caching.
If the table readings is huge, then consider using some sort of partitioning strategy. Check out MSSQL documentation
If you don't need real time data, then try some sort of search engine such as Elastic Search

How to determine if a table contains a value in SQL?

I feel like I'm missing something very obvious here, but it seems that the only way to go about doing this is to get the value, and then see if it returns a null (empty) value, which I would rather not do.
Is there an equivalent to List.contains(Object o) in SQL? Or perhaps the JDBC has something of that nature? If so, what is it?
I am using Microsoft Access 2013.
Unfortunately I don't have any useful code to show, but here is the gist of what I am trying to do. It isn't anything unique at all. I want to have a method (Java) that returns the values of a user that are stored in the database. If the user has not previously been added to the database, the user should be added, and the default values of the user should be set. Then those newly created values will be returned. If a player has already been added to the database (with the username as the primary key), I don't want to overwrite the data that is already there.
I would also advise against using MS Access for this purpose, but if you are familiar with MS Office applications, the familiar UI/UX structure might help you get your footing and require less time to learn other database environments. However, MS Access tends to be quite limited, and I would advise considering alternative options if available.

The only way to see if an SQL table contains a row with some condition on a column is to actually make an SQL query. I don't see why you wouldn't do that. Just make sure that you have an index on the column that you will be constraining the results on. Also for better speed use count to prevent from retrieving all the data from the rows.
SELECT count(*) FROM foos WHERE bar = 'baz'
Assuming you have an index on the bar column this query should be pretty fast and all you have to do is check whether it returns > 0. If it does then you have rows matching your criteria.

You can use "IF EXISTS" which returns a boolean value of 1 or 0.
select
if(
exists( select * from date1 where current_date()>now() ),
'today > now',
'today is not > now'
) as 'today > now ?' ;
+--------------------+
| today > now? |
+--------------------+
| today is not > now |
+--------------------+
1 row in set (0.00 sec)
Another Example:
SELECT IF(
EXISTS( SELECT col from tbl where id='n' ),
colX, colY
) AS 'result'
FROM TBL;

I'm also new to sql and I'm using Oracle.
In Oracle, suppose we have: TYPE: value.
We can use:
where value not in (select TYPE from table)
to make sure value not exist in the column TYPE of the table.
Don't know if it helps.

You can simply use Query with condition.
For example if you have to check records with particular coloumn, you can use where condition
select * from table where column1 = 'checkvalue'
You can use count property to check the no. of records existing with your specified conditon
select count(*) from table where column1 = 'checkvalue'

I have created the following method, which to my knowledge works perfectly. (Using the java.sql package)
public static containsUser(String username)
{
//connection is the Connection object used to connect to my Access database.
Statement statement = this.connection.createStatement();
//"Users" is the name of the table, "Username" is the primary key.
String sql = "SELECT * FROM Users WHERE Username = '" + username + "'";
Result result = statement.executeQuery(sql);
//There is no need for a loop because the primary key is unique.
return result.next();
}
It's an extremely simple and extremely basic method, but hopefully it might help someone in the future.
If there is anything wrong with it, please let me know. I don't want anyone learning from or using poorly written code.
IMPORTANT EDIT: It is now over half a decade after I wrote the above content (both question and answer), and I now advise against the solution I illustrated above.
While it does work, it prioritizes a "Java-mindset-friendly" approach to SQL. In short, it is typically a bad idea to migrate paradigms and mindsets of one language to another, as it is inevitable that you will eventually find yourself trying to fit a square peg into a round hole. The only way to make that work is to shave the corners off the square. The peg will then of course fit, but as you can imagine, starting with a circle peg in the first place would have been the better, cleaner, and less messy solution.
Instead, refer to the above upvoted answers for a more realistic, enterprise-friendly solution to this problem, especially as I imagine the people reading this are likely in a similar situation as I was when I originally wrote this.

Hibernate getting position of a row in a result set

I need to get an equivalent to this SQL that can be run using Hibernate. It doesn't work as is due to special characters like #.
SELECT place from (select #curRow := #curRow + 1 AS place, time, id FROM `testing`.`competitor` JOIN (SELECT #curRow := 0) r order by time) competitorList where competitorList.id=4;
My application is managing results of running competitions. The above query is selecting for a specific competitor, it's place based on his/her overall time.
For simplicity I'll only list the COMPETITOR table structure (only the relevant fields). My actual query involves a few joins, but they are not relevant for the question:
CREATE TABLE competitor {
id INT,
name VARCHAR,
time INT
}
Note that competitors are not already ordered by time, thus, the ID cannot be used as rank. As well, it is possible to have two competitors with the same overall time.
Any idea how I could make this work with Hibernate?

Hard to tell without a schema, but you may be able to use something like
SELECT COUNT(*) FROM testing ts
WHERE ts.score < $obj.score
where I am using the $ to stand for whatever Hibernate notation you need to refer to the live object.

I couldn't find any way to do this, so I had to change the way I'm calculating the position. I'm now taking the top results and am creating the ladder in Java, rather than in the SQL query.

Service usage limiter implementation

I need to limit multiple service usages for multiple customers. For example, customer customer1 can send max 1000 SMS per month. My implementation is based on one MySQL table with 3 columns:
date TIMESTAMP
name VARCHAR(128)
value INTEGER
For every service usage (sending SMS) one row is inserted to the table. value holds usage count (eg. if SMS was split to 2 parts then value = 2). name holds limiter name (eg. customer1-sms).
To find out how many times the service was used this month (March 2011), a simple query is executed:
SELECT SUM(value) FROM service_usage WHERE name = 'customer1-sms' AND date > '2011-03-01';
The problem is that this query is slow (0.3 sec). We are using indexes on columns date and name.
Is there some better way how to implement service usage limitation? My requirement is that it must be flexibile (eg. if I need to know usage within last 10 minutes or another within current month). I am using Java.
Thanks in advance

You should have one index on both columns, not two indexes on each of the columns. This should make the query very fast.
If it still doesn't, then you could use a table with a month, a name and a value, and increment the value for the current month each time an SMS is sent. This would remove the sum from your query. It would still need an index on (month, name) to be as fast as possible, though.

I found one solution to my problem. Instead of inserting service usage increment, I will insert the last one incremented:
BEGIN;
-- select last the value
SELECT value FROM service_usage WHERE name = %name ORDER BY date ASC LIMIT 1;
-- insert it to the database
INSERT INTO service_usage (CURRENT_TIMESTAMP, %name, %value + %increment);
COMMIT;
To find out service usage since %date:
SELECT value AS value1 FROM test WHERE name = %name ORDER BY date DESC LIMIT 1;
SELECT value AS value2 FROM test WHERE name = %name AND date <= %date ORDER BY date DESC LIMIT 1;
The result will be value1 - value2
This way I'll need transactions. I'll probably implement it as stored procedure.
Any additional hints are still appreciated :-)

It's worth trying to replace your "=" with "like". Not sure why, but in the past I've seen this perform far more quickly than the "=" operator on varchar columns.
SELECT SUM(value) FROM service_usage WHERE name like 'customer1-sms' AND date > '2011-03-01';
Edited after comments:
Okay, now I can sorta re-create your issue - the first time I run the query, it takes around 0.03 seconds, subsequent runs of the query take 0.001 second. Inserting new records causes the query to revert to 0.03 seconds.
Suggested solution:
COUNT does not show the same slow-down. I would change the business logic so every time the user sends and SMS you insert the a record with value "1"; if the message is a multipart message, simply insert two rows.
Replace the "sum" with a "count".
I've applied this to my test data, and even after inserting a new record, the "count" query returns in 0.001 second.

How to manage consecutive column values in table rows

A little presentation for what I want to do:
Consider the case where different people from a firm get, once a year, an all expenses paid trip to somewhere. There may be 1000 persons that could qualify for the trip but only 16 places are available.
Each of this 16 spots has an associated index which must be from 1 to 16. The ones on the reservation have index starting from 17.
The first 16 persons that apply get a definite spot on the trip. The rest end up on the reservation list. If one of the first 16 persons cancels, the first person with a reservation gets his place and all the indexes are renumbered to compensate for the person that canceled.
All of this is managed in a Java web app with an Oracle DB.
Now, my problem:
I have to manage the index in a correct way (all sequential, no duplicate indexes), with possible hundreds of people that simultaneously apply for the trip.
When inserting a record in the table for the trip, the way of getting the index is by
SELECT MAX(INDEX_NR) + 1 AS NEXT_INDEX_NR FROM TABLE
and using this as the new index (this is done Java side and then a new query to insert the record). It is obvious why we have multiple spots or reservations with the same index. So, we get, let’s say, 19 people on the trip because 4 of them have index 10, for example.
How can I manage this? I have been thinking of 3 ways so far:
Use an isolation level of Serializable for the DB transactions (don’t like this one);
Insert a record with no INDEX_NR and then have a trigger manage the things… in some way (never worked with triggers before);
Each record also has a UPDATED column. Could I use this in some way? (note that I can’t lose the INDEX_NR since other parts of the app make use of it).
Is there a best way to do this?

Why make it complicated ?
Just insert all reservations as they are entered and insert a timestamp of when they resevered a spot.
Then in you query just use the timestamp to sort them.
There is offcourse the chance that there are people that reserved a spot at the very same millisecond then just use a random method to assign order.

Why do you need to explicitly store the index? Instead you could store each person's order (which never changes) along with an active flag. In your example if person #16 pulls out you simply mark them as inactive.
To compute whether a person qualifies for the trip you simply count the number of active people with order less than that person:
select count(*)
from CompetitionEntry
where PersonOrder < 16
and Active = 1
This approach removes the need for bulk updates to the database (you only ever update one row) and hence mostly mitigates your problem of transactional integrity.

Another way would be to explicitly lock a record on another table on the select.
-- Initial Setup
CREATE TABLE NUMBER_SOURCE (ID NUMBER(4));
INSERT INTO NUMBER_SOURCE(ID) VALUES 0;
-- Your regular code
SELECT ID AS NEXT_INDEX_NR FROM NUMBER_SOURCE FOR UPDATE; -- lock!
UPDATE NUMBER_SOURCE SET ID = ID + 1;
INSERT INTO TABLE ....
COMMIT; -- releases lock!
No other transaction will be able to perform the query on the table NUMBER_SOURCE until the commit (or rollback).

When adding people to the table, give them an ID in such a way that the ID is ascending in the order in which they were added. This can be a timestamp.
Select all the records from the table which qualify, order by ID, and update their INDEX_NR
Select * from table where INDEX_NR <= 16 order by INDEX_NR
Step #2 seems complicated but it's actually quite simple:
update (
select *
from TABLE
where ...
order by ID
)
set INDEX_NR = INDEXSEQ.NEXTVAL
Don't forget to reset the sequence to 1.

Calculate your index in runtime:
CREATE OR REPLACE VIEW v_person
AS
SELECT id, name, ROW_NUMBER() OVER (ORDER BY id) AS index_rn
FROM t_person
CREATE OR REPLACE TRIGGER trg_person_ii
INSTEAD OF INSERT ON v_person
BEGIN
INSERT
INTO t_person (id, name)
VALUES (:new.id, :new.name);
END;
CREATE OR REPLACE TRIGGER trg_person_iu
INSTEAD OF UPDATE ON v_person
BEGIN
UPDATE t_person
SET id = :new.id,
name = :new.name
WHERE id = :old.id;
END;
CREATE OR REPLACE TRIGGER trg_person_id
INSTEAD OF DELETE ON v_person
BEGIN
DELETE
FROM t_person
WHERE id = :old.id;
END;
INSERT
INTO v_person
VALUES (1, 'test', 1)
SELECT *
FROM v_person
--
id name index_rn
1 test 1
INSERT
INTO v_person
VALUES (2, 'test 2', 1)
SELECT *
FROM v_person
--
id name index_rn
1 test 1
2 test 2 2
DELETE
FROM v_person
WHERE id = 1
SELECT *
FROM v_person
--
id name index_rn
2 test 2 1

"I have to manage the index in a correct way (all sequential, no duplicate indexes), with possible hundreds of people that simultaneously apply for the trip.
When inserting a record in the table for the trip, the way of getting the index is by
SELECT MAX(INDEX_NR) + 1 AS NEXT_INDEX_NR FROM TABLE
and using this as the new index (this is done Java side and then a new query to insert the record). It is obvious why we have multiple spots or reservations with the same index."
Yeah. Oracle's MVCC ("snapshot isolation") used incorrectly by someone who shouldn't have been in IT to begin with.
Really, Peter is right. Your index number is, or rather should be, a sort of "ranking number" on the ordered timestamps that he mentions (this holds a requirement that the DBMS can guarantee that any timestamp value appears only once in the entire database).
You say you are concerned with "regression bugs". I say "Why do you need to be concerned with "regression bugs" in an application that is DEMONSTRABLY beyond curing ?". Because your bosses paid a lot of money for the crap they've been given and you don't want to be the pianist that gets shot for bringing the message ?

The solution depends on what you have under your control. I assume that you can change both database and Java code, but refrain from modifying the database scheme since you had to adapt too much Java code otherwise.
A cheap solution might be to add a uniqueness constraint on the pair (trip_id, index_nr) or just on index_nr if there is just one trip. Additionally add a check contraint check(index_nr > 0) - unless index_nr is already unsigned. Everything else is then done in Java: When inserting a new applicant as described by you, you have to add code catching the exception when someone else got inserted concurrently. If some record is updated or deleted, you either have to live with holes between sequence numbers (by selecting the 16 candidates with the lowest index_nr as shown by Quassnoi in his view) or fill them up by hand (similarily to what Aaron suggested) after every update/delete.
If index_nr is mostly used in the application as read-only, a better solution might be to combine the answers of Peter and Quassnoi: Use either a time stamp (automatically inserted by the database by defining the current time as default) or an auto-incremented integer (as default inserted by the database) as value stored in the table. And use a view (like the one defined by Quassnoi) to access the table and the automatically calculated index_nr from Java. But also define both constraints like for the cheap solution.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Why is Oracle Pivot producing non-existent results? - java

Related

Make statistics out of a SQL table [closed]

How to determine if a table contains a value in SQL?

Hibernate getting position of a row in a result set

Service usage limiter implementation

How to manage consecutive column values in table rows

Categories

Resources