Mapping function result set into List of Pojos with JPA - java

I have the following PostgreSQL function:
CREATE OR REPLACE FUNCTION getFirstNAvailableSlots(N INTEGER, timestamp_start TIMESTAMP, timestamp_end TIMESTAMP,
cpu INTEGER, max_reservable_cpu INTEGER)
returns TABLE (t_start TIMESTAMP, t_end TIMESTAMP)
I would like to call the function, and automatically map the result set into a List of TimeSlots:
public class TimeSlot {
private Timestamp t_start;
private Timestamp t_end;
}
So I added a Query in a repository:
public interface WorkstationReservationRepository extends ReservationBaseRepository<WorkstationReservation>{
#Query(value="SELECT new com.warden.reservationmicroservice.dtos.TimeSlot(t_start,t_end) FROM getFirstNAvailableSlots(:N,:t_start,:t_end,:cpu,:max_reservable_cpu)")
public List<TimeSlot> getFirstNAvailableSlots(#Param("N")int N, #Param("t_start")Timestamp t_start, #Param("t_end")Timestamp t_end, #Param("cpu") int cpu, #Param("max_reservable_cpu")int max_reservable_cpu);
}
However, I get the following error:
Caused by: java.lang.IllegalArgumentException: Validation failed for
query for method public abstract
com.warden.reservationmicroservice.dtos.TimeSlot
com.warden.reservationmicroservice.repositories.WorkstationReservationRepository.getFirstNAvailableSlots(int,java.sql.Timestamp,java.sql.Timestamp,int,int)
at
org.springframework.data.jpa.repository.query.SimpleJpaQuery.validateQuery(SimpleJpaQuery.java:100)
~[spring-data-jpa-3.0.0.jar:3.0.0] at
org.springframework.data.jpa.repository.query.SimpleJpaQuery.(SimpleJpaQuery.java:70)
~[spring-data-jpa-3.0.0.jar:3.0.0] at
org.springframework.data.jpa.repository.query.JpaQueryFactory.fromMethodWithQueryString(JpaQueryFactory.java:55)
~[spring-data-jpa-3.0.0.jar:3.0.0] at
org.springframework.data.jpa.repository.query.JpaQueryLookupStrategy$DeclaredQueryLookupStrategy.resolveQuery(JpaQueryLookupStrategy.java:170)
~[spring-data-jpa-3.0.0.jar:3.0.0] at
org.springframework.data.jpa.repository.query.JpaQueryLookupStrategy$CreateIfNotFoundQueryLookupStrategy.resolveQuery(JpaQueryLookupStrategy.java:252)
~[spring-data-jpa-3.0.0.jar:3.0.0] at
org.springframework.data.jpa.repository.query.JpaQueryLookupStrategy$AbstractQueryLookupStrategy.resolveQuery(JpaQueryLookupStrategy.java:95)
~[spring-data-jpa-3.0.0.jar:3.0.0] at
org.springframework.data.repository.core.support.QueryExecutorMethodInterceptor.lookupQuery(QueryExecutorMethodInterceptor.java:111)
~[spring-data-commons-3.0.0.jar:3.0.0] ... 56 common frames omitted
Caused by: java.lang.IllegalArgumentException:
org.hibernate.query.sqm.ParsingException: line 1:104 mismatched input
'(' expecting {, ',', '.', ID, VERSION, VERSIONED, NATURALID,
ALL, AND, ANY, AS, ASC, AVG, BETWEEN, BOTH, BY, CASE, CAST, COLLATE,
COUNT, CROSS, CUBE, CURRENT, CURRENT_DATE, CURRENT_INSTANT,
CURRENT_TIME, CURRENT_TIMESTAMP, DATE, DATETIME, DAY, DELETE, DESC,
DISTINCT, ELEMENT, ELEMENTS, ELSE, EMPTY, END, ENTRY, ERROR, ESCAPE,
EVERY, EXCEPT, EXCLUDE, EXISTS, EXTRACT, FETCH, FILTER, FIRST,
FOLLOWING, FOR, FORMAT, FROM, FULL, FUNCTION, GROUP, GROUPS, HAVING,
HOUR, IGNORE, ILIKE, IN, INDEX, INDICES, INNER, INSERT, INSTANT,
INTERSECT, INTO, IS, JOIN, KEY, LAST, LEADING, LEFT, LIKE, LIMIT,
LIST, LISTAGG, LOCAL, LOCAL_DATE, LOCAL_DATETIME, LOCAL_TIME, MAP,
MAX, MAXELEMENT, MAXINDEX, MEMBER, MICROSECOND, MILLISECOND, MIN,
MINELEMENT, MININDEX, MINUTE, MONTH, NANOSECOND, NEW, NEXT, NO, NOT,
NULLS, OBJECT, OF, OFFSET, OFFSET_DATETIME, ON, ONLY, OR, ORDER,
OTHERS, OUTER, OVER, OVERFLOW, OVERLAY, PAD, PARTITION, PERCENT,
PLACING, POSITION, PRECEDING, QUARTER, RANGE, RESPECT, RIGHT, ROLLUP,
ROW, ROWS, SECOND, SELECT, SET, SIZE, SOME, SUBSTRING, SUM, THEN,
TIES, TIME, TIMESTAMP, TIMEZONE_HOUR, TIMEZONE_MINUTE, TRAILING,
TREAT, TRIM, TRUNCATE, TYPE, UNBOUNDED, UNION, UPDATE, VALUE, VALUES,
WEEK, WHEN, WHERE, WITH, WITHIN, WITHOUT, YEAR, IDENTIFIER,
QUOTED_IDENTIFIER} at
org.hibernate.internal.ExceptionConverterImpl.convert(ExceptionConverterImpl.java:147)
~[hibernate-core-6.1.6.Final.jar:6.1.6.Final] at
org.hibernate.internal.ExceptionConverterImpl.convert(ExceptionConverterImpl.java:175)
~[hibernate-core-6.1.6.Final.jar:6.1.6.Final] at
org.hibernate.internal.ExceptionConverterImpl.convert(ExceptionConverterImpl.java:182)
~[hibernate-core-6.1.6.Final.jar:6.1.6.Final] at
org.hibernate.internal.AbstractSharedSessionContract.createQuery(AbstractSharedSessionContract.java:760)
~[hibernate-core-6.1.6.Final.jar:6.1.6.Final] at
org.hibernate.internal.AbstractSharedSessionContract.createQuery(AbstractSharedSessionContract.java:662)
~[hibernate-core-6.1.6.Final.jar:6.1.6.Final] at
org.hibernate.internal.AbstractSharedSessionContract.createQuery(AbstractSharedSessionContract.java:126)
~[hibernate-core-6.1.6.Final.jar:6.1.6.Final] at
java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104)
~[na:na] at
java.base/java.lang.reflect.Method.invoke(Method.java:578) ~[na:na]
at
org.springframework.orm.jpa.ExtendedEntityManagerCreator$ExtendedEntityManagerInvocationHandler.invoke(ExtendedEntityManagerCreator.java:360)
~[spring-orm-6.0.3.jar:6.0.3] at
jdk.proxy2/jdk.proxy2.$Proxy113.createQuery(Unknown Source) ~[na:na]
at
org.springframework.data.jpa.repository.query.SimpleJpaQuery.validateQuery(SimpleJpaQuery.java:94)
~[spring-data-jpa-3.0.0.jar:3.0.0] ... 62 common frames omitted
Caused by: org.hibernate.query.sqm.ParsingException: line 1:104
mismatched input '(' expecting {, ',', '.', ID, VERSION,
VERSIONED, NATURALID, ALL, AND, ANY, AS, ASC, AVG, BETWEEN, BOTH, BY,
CASE, CAST, COLLATE, COUNT, CROSS, CUBE, CURRENT, CURRENT_DATE,
CURRENT_INSTANT, CURRENT_TIME, CURRENT_TIMESTAMP, DATE, DATETIME, DAY,
DELETE, DESC, DISTINCT, ELEMENT, ELEMENTS, ELSE, EMPTY, END, ENTRY,
ERROR, ESCAPE, EVERY, EXCEPT, EXCLUDE, EXISTS, EXTRACT, FETCH, FILTER,
FIRST, FOLLOWING, FOR, FORMAT, FROM, FULL, FUNCTION, GROUP, GROUPS,
HAVING, HOUR, IGNORE, ILIKE, IN, INDEX, INDICES, INNER, INSERT,
INSTANT, INTERSECT, INTO, IS, JOIN, KEY, LAST, LEADING, LEFT, LIKE,
LIMIT, LIST, LISTAGG, LOCAL, LOCAL_DATE, LOCAL_DATETIME, LOCAL_TIME,
MAP, MAX, MAXELEMENT, MAXINDEX, MEMBER, MICROSECOND, MILLISECOND, MIN,
MINELEMENT, MININDEX, MINUTE, MONTH, NANOSECOND, NEW, NEXT, NO, NOT,
NULLS, OBJECT, OF, OFFSET, OFFSET_DATETIME, ON, ONLY, OR, ORDER,
OTHERS, OUTER, OVER, OVERFLOW, OVERLAY, PAD, PARTITION, PERCENT,
PLACING, POSITION, PRECEDING, QUARTER, RANGE, RESPECT, RIGHT, ROLLUP,
ROW, ROWS, SECOND, SELECT, SET, SIZE, SOME, SUBSTRING, SUM, THEN,
TIES, TIME, TIMESTAMP, TIMEZONE_HOUR, TIMEZONE_MINUTE, TRAILING,
TREAT, TRIM, TRUNCATE, TYPE, UNBOUNDED, UNION, UPDATE, VALUE, VALUES,
WEEK, WHEN, WHERE, WITH, WITHIN, WITHOUT, YEAR, IDENTIFIER,
QUOTED_IDENTIFIER} at
org.hibernate.query.hql.internal.StandardHqlTranslator$1.syntaxError(StandardHqlTranslator.java:46)
~[hibernate-core-6.1.6.Final.jar:6.1.6.Final] at
org.antlr.v4.runtime.ProxyErrorListener.syntaxError(ProxyErrorListener.java:41)
~[antlr4-runtime-4.10.1.jar:4.10.1] at
org.antlr.v4.runtime.Parser.notifyErrorListeners(Parser.java:543)
~[antlr4-runtime-4.10.1.jar:4.10.1] at
org.antlr.v4.runtime.DefaultErrorStrategy.reportInputMismatch(DefaultErrorStrategy.java:327)
~[antlr4-runtime-4.10.1.jar:4.10.1] at
org.antlr.v4.runtime.DefaultErrorStrategy.reportError(DefaultErrorStrategy.java:139)
~[antlr4-runtime-4.10.1.jar:4.10.1] at
org.hibernate.grammars.hql.HqlParser.statement(HqlParser.java:343)
~[hibernate-core-6.1.6.Final.jar:6.1.6.Final] at
org.hibernate.query.hql.internal.StandardHqlTranslator.parseHql(StandardHqlTranslator.java:127)
~[hibernate-core-6.1.6.Final.jar:6.1.6.Final] at
org.hibernate.query.hql.internal.StandardHqlTranslator.translate(StandardHqlTranslator.java:77)
~[hibernate-core-6.1.6.Final.jar:6.1.6.Final] at
org.hibernate.internal.AbstractSharedSessionContract.lambda$createQuery$2(AbstractSharedSessionContract.java:747)
~[hibernate-core-6.1.6.Final.jar:6.1.6.Final] at
org.hibernate.query.internal.QueryInterpretationCacheStandardImpl.createHqlInterpretation(QueryInterpretationCacheStandardImpl.java:141)
~[hibernate-core-6.1.6.Final.jar:6.1.6.Final] at
org.hibernate.query.internal.QueryInterpretationCacheStandardImpl.resolveHqlInterpretation(QueryInterpretationCacheStandardImpl.java:128)
~[hibernate-core-6.1.6.Final.jar:6.1.6.Final] at
org.hibernate.internal.AbstractSharedSessionContract.createQuery(AbstractSharedSessionContract.java:744)
~[hibernate-core-6.1.6.Final.jar:6.1.6.Final] ... 69 common frames
omitted
Process finished with exit code 1

The #Query isn’t the correct SQL for calling stored proc, it doesn’t like the new, try:
#Query(value=“CALL getFirstNAvailableSlots(:N,:t_start,:t_end,:cpu,:max_reservable_cpu)”, nativeQuery = true)

Related

Why does Hibernate sorts my list that way?

I want to sort a list using Hibernate's Criterias, but don't understand how the framework sorts the results. I have a list of strings to query and sort. The values are as follow:
MCGuffin Super
MCGuffin Mega
McGuffin powerup
MCGuffin 1
MCGuffin Super
MCGuffin 2
MCGuffin Mega
I want to sort them in ascending order. I expect this result: 1, 2, Mega, Mega, powerup, Super, Super.
However, I end up with: Mega, Mega, powerup, Super, Super, 1, 2.
I first thought it was because the ASCII Table, however uppercase and lowercases are treated at the same level (despite lowercases having a higher ASCII address).
The only thing I saw in my code that could potentially be relevant is this line:
// I don't quite understand this line
criteria.setResultTransformer(CriteriaSpecification.DISTINCT_ROOT_ENTITY);
Other than that, there's no sorting. We call criteria.list() to get our results.
Is it normal ? Did I miss something when understanding the way Hibernate works ? How do I get the result I want ?
It's not Hibernate ORM that's doing the sorting, it's your database. When you use criteria, Hibernate ORM will create the query adding the proper order by clause to the SQL.
Depending on the database you are using, there is usually a way to define the collation and specify how strings and characters will be ordered.
For example, in PostgreSQL you can define it when you create the table:
CREATE TABLE test1 (
a text COLLATE "de_DE",
b text COLLATE "es_ES",
...
);

Make statistics out of a SQL table [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I have a table in my database where I register readings from several sensors this way:
CREATE TABLE [test].[readings] (
[timestamp_utc] DATETIME2(0) NOT NULL, -- 48bits
[sensor_id] INT NOT NULL, -- 32 bits
[site_id] INT NOT NULL, -- 32 bits
[reading] REAL NOT NULL, -- 64 bits
PRIMARY KEY([timestamp_utc], [sensor_id], [site_id])
)
CREATE TABLE [test].[sensors] (
[sensor_id] int NOT NULL ,
[measurement_type_id] int NOT NULL,
[site_id] int NOT NULL ,
[description] varchar(255) NULL ,
PRIMARY KEY ([sensor_id], [site_id])
)
And I want to easily make statistics out of all these readings.
Some queries I would like to do:
Get me all readings for site_id = X between date_hour1 and date_hour2
Get me all readings for site_id = X and sensor_id in <list> between date_hour1 and date_hour2
Get me all readings for site_id = X and sensor measurement type = Z between date_hour1 and date_hour2
Get me all readings for site_id = X, aggregated (average) by DAY between date_hour1 and date_hour2
Get me all readings for site_id = X, aggregated (average) by DAY between date_hour1 and date_hour2 but in UTC+3 (this should give a different result than previous query because now the beginning and ending of days are shifted by 3h)
Get me min, max, std, mean for all readings for site_id = X between date_hour1 and date_hour2
So far I have been using Java to query the database and perform all this processing locally. But this ends up a bit slow and the code stays a mess to write and maintain (too much cicles, generic functions to perform repeated tasks, large/verbose code base, etc)...
To make things worse, table readings is huge (hence the importance of the primary key, which is also a performance index), and maybe I should be using a TimeSeries database for this (are there any good ones?). I am using SQL Server.
What is the best way to do this? I feel I am reinventing the wheel because all of this is kinda of an analytics app...
I know these queries sound simple, but when you try to parametrize all this you can end up with a monster like this:
-- Sums all device readings, returns timestamps in localtime according to utcOffset (if utcOffset = 00:00, then timestamps are in UTC)
CREATE PROCEDURE upranking.getSumOfReadingsForDevices
#facilityId int,
#deviceIds varchar(MAX),
#beginTS datetime2,
#endTS datetime2,
#utcOffset varchar(6),
#resolution varchar(6) -- NO, HOURS, DAYS, MONTHS, YEARS
AS BEGIN
SET NOCOUNT ON -- http://stackoverflow.com/questions/24428928/jdbc-sql-error-statement-did-not-return-a-result-set
DECLARE #deviceIdsList TABLE (
id int NOT NULL
);
DECLARE #beginBoundary datetime2,
#endBoundary datetime2;
SELECT #beginBoundary = DATEADD(day, -1, #beginTS);
SELECT #endBoundary = DATEADD(day, 1, #endTS);
-- We shift sign from the offset because we are going to convert the zone for the entire table and not beginTS endTS themselves
SELECT #utcOffset = CASE WHEN LEFT(#utcOffset, 1) = '+' THEN STUFF(#utcOffset, 1, 1, '-') ELSE STUFF(#utcOffset, 1, 1, '+') END
INSERT INTO #deviceIdsList
SELECT convert(int, value) FROM string_split(#deviceIds, ',');
SELECT SUM(reading) as reading,
timestamp_local
FROM (
SELECT reading,
upranking.add_timeoffset_to_datetime2(timestamp_utc, #utcOffset, #resolution) as timestamp_local
FROM upranking.readings
WHERE
device_id IN (SELECT id FROM #deviceIdsList)
AND facility_id = #facilityId
AND timestamp_utc BETWEEN #beginBoundary AND #endBoundary
) as innertbl
WHERE timestamp_local BETWEEN #beginTS AND #endTS
GROUP BY timestamp_local
ORDER BY timestamp_local
END
GO
This is a query that receives the site id (facilityId in this case), the list of sensor ids (the deviceIds in this case), the beginning and the ending timestamps, followed by their UTC offset in a string like "+xx:xx" or "-xx:xx", terminating with the resolution which will basically say how the result will be aggregated by SUM (taking the UTC offset into consideration).
And since I am using Java, at first glance I could use Hibernate or something, but I feel Hibernate wasn't made for these type of queries.
Your structure looks good at first glance but looking at your queries it makes me think that there are tweaks you may want to try. Performance is never an easy subject and it is not easy to find an "one size fits all answer". Here a some considerations:
Do you want better read or write performance? If you want better read performance you need to reconsider your indexes. Sure you have a primary key but most of your queries don't make use of it (all three fields). Try creating an index for [sensor_id], [site_id].
Can you use cache? If some searches are recurrent and your app is the single point of entry to your database, then evaluate if your use cases would benefit from caching.
If the table readings is huge, then consider using some sort of partitioning strategy. Check out MSSQL documentation
If you don't need real time data, then try some sort of search engine such as Elastic Search

Is Siddhi unable to group by more than one variable?

I have the following stream definition:
String eventStreamDefinition =
"define stream cdrEventStream (nodeId string, phone string, timeStamp long, isOutgoingCall bool); ";
And the query:
String query = "#info(name = 'query1') from cdrEventStream#window.externalTime(timeStamp,5 sec) select nodeId, phone, timeStamp, isOutgoingCall, count(nodeId) as callCount group by phone,isOutgoingCall insert all events into outputStream;";
But when I try to compile them I get:
org.wso2.siddhi.query.compiler.exception.SiddhiParserException: You have an error in your SiddhiQL at line 1:267, extraneous input ',' expecting {'#', STREAM, DEFINE, TABLE, FROM, PARTITION, WINDOW, SELECT, GROUP, BY, HAVING, INSERT, DELETE, UPDATE, RETURN, EVENTS, INTO, OUTPUT, EXPIRED, CURRENT, SNAPSHOT, FOR, RAW, OF, AS, OR, AND, ON, IS, NOT, WITHIN, WITH, BEGIN, END, NULL, EVERY, LAST, ALL, FIRST, JOIN, INNER, OUTER, RIGHT, LEFT, FULL, UNIDIRECTIONAL, YEARS, MONTHS, WEEKS, DAYS, HOURS, MINUTES, SECONDS, MILLISECONDS, FALSE, TRUE, STRING, INT, LONG, FLOAT, DOUBLE, BOOL, OBJECT, ID_QUOTES, ID}
at org.wso2.siddhi.query.compiler.internal.SiddhiErrorListener.syntaxError(SiddhiErrorListener.java:34)
at org.antlr.v4.runtime.ProxyErrorListener.syntaxError(ProxyErrorListener.java:65)
at org.antlr.v4.runtime.Parser.notifyErrorListeners(Parser.java:558)
at org.antlr.v4.runtime.DefaultErrorStrategy.reportUnwantedToken(DefaultErrorStrategy.java:377)
at org.antlr.v4.runtime.DefaultErrorStrategy.sync(DefaultErrorStrategy.java:275)
at org.wso2.siddhi.query.compiler.SiddhiQLParser.group_by(SiddhiQLParser.java:3783)
at org.wso2.siddhi.query.compiler.SiddhiQLParser.query_section(SiddhiQLParser.java:3713)
at org.wso2.siddhi.query.compiler.SiddhiQLParser.query(SiddhiQLParser.java:1903)
at org.wso2.siddhi.query.compiler.SiddhiQLParser.execution_element(SiddhiQLParser.java:619)
at org.wso2.siddhi.query.compiler.SiddhiQLParser.execution_plan(SiddhiQLParser.java:550)
at org.wso2.siddhi.query.compiler.SiddhiQLParser.parse(SiddhiQLParser.java:152)
at org.wso2.siddhi.query.compiler.SiddhiCompiler.parse(SiddhiCompiler.java:61)
at org.wso2.siddhi.core.SiddhiManager.createExecutionPlanRuntime(SiddhiManager.java:59)
The only way I can get the query to compile is by removing isOutgoingCall from the group by clause. The Siddhi docs states that grouping by more than one variable should be possible. Is this a bug?
This is on version 3.0.0-alpha.
Grouping by several variables is supported by Siddhi 3.0.0. I just checked your query with Siddhi 3.0.0 and I was able to compile it. But of course I used released 3.0.0. Can you please give it a try.
Tip: You can use Siddhi try it to easily try out your queries

How to return rows that are "missing" from table - Employee Absent Report [closed]

This question is unlikely to help any future visitors; it is only relevant to a small geographic area, a specific moment in time, or an extraordinarily narrow situation that is not generally applicable to the worldwide audience of the internet. For help making this question more broadly applicable, visit the help center.
Closed 10 years ago.
I have two tables, like this:
master
---------
empcode INT PRIMARY KEY
name VARCHAR
dept VARCHAR
emp_tx
----------
empcode INT references MASTER(empcode)
s_date DATETIME
The emp_tx table records the employee "in" and "out" transactions. The column s_date stores the time (as a DATETIME value) when the "in" or "out" event occurred. The transactions are recorded from the office region (through Finger Print Biometric System.)
Example data from emp_tX table:
empcode s_datetime
------- ------------------
1110 2012-12-12 09:31:42 (employee in time to the office)
1110 2012-12-12 13:34:17 (employee out time for lunch)
1110 2012-12-12 14:00:17 (employee in time after lunch)
1110 2012-12-12 18:00:12 (employee out time after working hours)
1112
etc.
Note:
If an employee is absent from the office on a given day, then no row will be inserted into the emp_tx transaction table for that date. An absence of an employee on a given date will be indicated by a row "missing" for that employee and that date.
Can anyone help me to get a SQL Query that returns the dates that employees were absent, to produce an Employee Absent Report?
The input to the query will be two DATE values, a "from" date and a "to" date, which specifies a range of dates. The query should return all occurrences of "absence" (or, non-occurrences rather, non, when no row is found in the EMP_TX table for an empcode on any date between the "from" and "to" dates.
Expected output:
If we input '2012-12-12' as the "from" date, and '2012-12-20' as the "to" date, the query should return rows something like this:
Empcode EmpName Department AbsentDate TotalNoofAbsent days
------- ------- ---------- ----------- --------------------
1110 ABC Accounts 2012-12-12
1110 ABC Accounts 2012-12-14 2
1112 xyz Software 2012-12-19
1112 xyz Software 2012-12-17 2
I've tried this query, and I am sure it is not returning the rows I want:
select tx.date from Emp_TX as tx where Date(S_Date) not between '2012-12-23' and '2012-12-30'
Thanks.
If an "absence" is defined as the non-appearance of a row in the emp_tx table for a particular empcode for a particular date (date=midnight to midnight 24 hour period), and ...
If its acceptable to not show an "absence" for a date when there are NO transactions in the emp_tx table for that date (i.e. exclude a date when ALL empcode are absent on that date), then ...
You can get the first four columns of the specified result set with a query like this: (untested)
SELECT m.empcode AS `EmpCode`
, m.name AS `EmpName`
, m.dept AS `Department`
, d.dt AS `AbsentDate`
FROM ( SELECT DATE(t.s_date) AS dt
FROM emp_tx t
WHERE t.s_date >= '2012-12-12'
AND t.s_date < DATE_ADD( '2012-12-20' ,INTERVAL 1 DAY)
GROUP BY DATE(t.s_date)
ORDER BY DATE(t.s_date)
) d
CROSS
JOIN master m
LEFT
JOIN emp_tx p
ON p.s_date >= d.dt
AND p.s_date < d.dt + INTERVAL 1 DAY
AND p.empcode = m.empcode
WHERE p.empcode IS NULL
ORDER
BY m.empcode
, d.dt
Getting that fifth column TotalNoofAbsent returned in the same resultset is possible, but it's going to make that query really messy. This detail might be more efficiently handled on the client side, when processing the returned resultset.
How the query works
The inline view aliased as d gets us a set of "date" values that we are checking. Using the emp_tx table as a source of these "date" values is a convenient way to do this. Not the DATE() function is returning just the "date" portion of the DATETIME argument; we're using a GROUP BY to get a distinct list of dates (i.e. no duplicate values). (What we're after, with this inline view query, is a distinct set of DATE values between the two values passed in as arguments. There are other, more involved, ways of generating a list of DATE values.)
As long as every "date" value that you will consider as an "absence" appears somewhere in the table (that is, at least one empcode had one transaction on each date that is of interest), and as long a the number of rows in the emp_tx table isn't excessive, then the inline view query will work reasonably well.
(NOTE: The query in the inline view can be run separately, to verify that the results are correct and as we expect.)
The next step is to do take the results from the inline view and perform a CROSS JOIN operation (to generate a Cartesian product) to match EVERY empcode with EVERY date returned from the inline view. The result of this operation represents every possible occurrence of "attendance".
The final step in the query is to perform an "anti-join" operation, using a LEFT JOIN and a WHERE IS NULL predicate. The LEFT JOIN (outer join) returns every possible attendance occurrence (from the left side), INCLUDING those that don't have a matching row (attendance record) from the emp_tx table.
The "trick" is to include a predicate (in the WHERE clause) that discards all of the rows where a matching attendance record was found, so that what we are left with is all combinations of empcode and date (possible attendance occurrences) where there was NO MATCHING attendance transaction.
(NOTE: I've purposefully left the references to the s_date (DATETIME) column "bare" in the predicates, and used range predicates. This will allow MySQL to make effective use of an appropriate index that includes that column.)
If we were to wrap the column references in the predicates inside a function e.g. DATE(p.s_date), then MySQL won't be able to make effective use of an index on the s_date column.
As one of the comments (on your question) points out, we're not making any distinction between transactions that mark an employee either as "coming in" or "going out". We are ONLY looking for the existence of a transaction for that empcode in a given 24-hour "midnight to midnight" period.
There are other approaches to getting the same result set, but the "anti-join" pattern usually turns out to give the best performance with large sets.
For best performance, you'll likely want covering indexes:
... ON master (empcode, name, dept)
... ON emp_tx (s_date, empcode)
Unfortunately, your query is going to get you a ton of results... It will always return all dates for an employee outside the range you gave. You want to check for NOT EXISTS a record BETWEEN your dates.
It may be possible to do this in pure SQL... I can't think of a way offhand without using cursors or something DB-specific. This Java pseudocode will give you 1 employee's absences:
List<Date> findAbsences(int empCode, Date inDate, Date outDate) {
List<Date> result = new LinkedList<Date>();
Calendar c = new Calendar();
c.setTime(new Date(2012,12,12));
while (!c.getTime().after(outDate)) {
// run query for EMP_TX records between inDate & outDate
//SELECT 1 FROM EMP_TX WHERE EmpCode = :empid AND S_Date BETWEEN :in AND :out;
if (!query.hasNext()) {
result.add(c.getTime);
}
c.add(Calendar.DATE, 1);
}
}

How can I insert common data into a temp table from disparate schemas?

I am not sure how to solve this problem:
We import order information from a variety of online vendors ( Amazon, Newegg etc ). Each vendor has their own specific terminology and structure for their orders that we have mirrored into a database. Our data imports into the database with no issues, however the problem I am faced with is to write a method that will extract required fields from the database, regardless of the schema.
For instance assume we have the following structures:
Newegg structure:
"OrderNumber" integer NOT NULL, -- The Order Number
"InvoiceNumber" integer, -- The invoice number
"OrderDate" timestamp without time zone, -- Create date.
Amazon structure:
"amazonOrderId" character varying(25) NOT NULL, -- Amazon's unique, displayable identifier for an order.
"merchant-order-id" integer DEFAULT 0, -- A unique identifier optionally supplied for the order by the Merchant.
"purchase-date" timestamp with time zone, -- The date the order was placed.
How can I select these items and place them into a temporary table for me to query against?
The temporary table could look like:
"OrderNumber" character varying(25) NOT NULL,
"TransactionId" integer,
"PurchaseDate" timestamp with time zone
I understand that some of the databases represent an order number with an integer and others a character varying; to handle that I plan on casting the datatypes to String values.
Does anyone have a suggestion for me to read about that will help me figure this out?
I don't need an exact answer, just a nudge in the right direction.
The data will be consumed by Java, so if any particular Java classes will help, feel free to suggest them.
First, you can create a VIEW to provide this functionality:
CREATE VIEW orders AS
SELECT '1'::int AS source -- or any other tag to identify source
,"OrderNumber"::text AS order_nr
,"InvoiceNumber" AS tansaction_id -- no cast .. is int already
,"OrderDate" AT TIME ZONE 'UTC' AS purchase_date -- !! see explanation
FROM tbl_newegg
UNION ALL -- not UNION!
SELECT 2
"amazonOrderId"
,"merchant-order-id"
,"purchase-date"
FROM tbl_amazon;
You can query this view like any other table:
SELECT * FROM orders WHERE order_nr = 123 AND source = 2;
The source is necessary if the order_nr is not unique. How else would you guarantee unique order-numbers over different sources?
A timestamp without time zone is an ambiguous in a global context. It's only good in connection with its time zone. If you mix timestamp and timestamptz, you need to place the timestamp at a certain time zone with the AT TIME ZONE construct to make this work. For more explanation read this related answer.
I use UTC as time zone, you might want to provide a different one. A simple cast "OrderDate"::timestamptz would assume your current time zone. AT TIME ZONE applied to a timestamp results in timestamptz. That's why I did not add another cast.
While you can, I advise not to use camel-case identifiers in PostgreSQL ever. Avoids many kinds of possible confusion. Note the lower case identifiers (without the now unnecessary double-quotes) I supplied.
Don't use varchar(25) as type for the order_nr. Just use text without arbitrary length modifier if it has to be a string. If all order numbers consist of digits exclusively, integer or bigint would be faster.
Performance
One way to make this fast would be to materialize the view. I.e., write the result into a (temporary) table:
CREATE TEMP TABLE tmp_orders AS
SELECT * FROM orders;
ANALYZE tmp_orders; -- temp tables are not auto-analyzed!
ALTER TABLE tmp_orders
ADD constraint orders_pk PRIMARY KEY (order_nr, source);
You need an index. In my example, the primary key constraint provides the index automatically.
If your tables are big, make sure you have enough temporary buffers to handle this in RAM before you create the temp table. Else it will actually slow you down.
SET temp_buffers = 1000MB;
Has to be the first call to temp objects in your session. Don't set it high globally, just for your session. A temp table is dropped automatically at the end of your session anyway.
To get an estimate how much RAM you need, create the table once and measure:
SELECT pg_size_pretty(pg_total_relation_size('tmp_orders'));
More on object sizes under this related question on dba.SE.
All the overhead only pays if you have to process a number of queries within one session. For other use cases there are other solutions. If you know the source table at the time of the query, it would be much faster to direct your query to the source table instead. If you don't, I would question the uniqueness of your order_nr once more. If it is, in fact, guaranteed to be unique you can drop the column source I introduced.
For only one or a few queries, it might be faster to use the view instead of the materialized view.
I would also consider a plpgsql function that queries one table after the other until the record is found. Might be cheaper for a couple of queries, considering the overhead. Indexes for every table needed of course.
Also, if you stick to text or varchar for your order_nr, consider COLLATE "C" for it.
Sounds like you need to create an abstract class that will define the basics of interacting with the data, then derive a class per database schema you need to access. This will allow the core code to operate on a single object type, and each implementation can then specify the queries in a form specific to that database schema.
Something like:
public class Order
{
private String orderNumber;
private BigDecimal orderTotal;
... etc ...
}
public abstract class AbstractOrderInformation
{
public abstract ArrayList<Order> getOrders();
...
}
with a Newegg class:
public class NeweggOrderInformation extends AbstractOrderInformation
{
public ArrayList<Order> getOrders() {
... do the work of getting the newegg order
}
...
}
Then you can have an arbitrarily large number of formats and when you need information, you can just iterate over all the implementations and get the Orders from each.

Categories