I am working on Spark SQL and I am trying to get the records using following queries:
/*Select all open tasks which are not unscheduled*/
Dataset<Row> scheduledOpenTasks = sqlContext.sql(
"SELECT * "
+ "FROM OpenTaskTable "
+ "WHERE due_date < cast('" + unscheduledDate + "' as timestamp)");
scheduledOpenTasks.createOrReplaceTempView("ScheduledOpenTaskTable");
/*Select scheduled tasks with max due_date for each csg_order_id*/
Dataset<Row> scheduledTasks = sqlContext.sql(
"SELECT TS1.* from ScheduledOpenTaskTable AS TS1 "
+ "INNER JOIN "
+ " (SELECT csg_order_id, MAX(due_date) AS MaxDD"
+ " FROM ScheduledOpenTaskTable"
+ " GROUP BY csg_order_id) AS TS2 "
+ "ON TS1.csg_order_id = TS2.csg_order_id AND TS1.due_date = TS2.MaxDD");
The unscheduled _date has value 4444-12-30.
In the OpenTaskTable, each csg_order_id can have multiple due_dates including unscheduled_date. I need the csg_order_ids with corresponding highest due_dates except unscheduled_date.
Now, with first query, I am removing all the records which have due_date as unscheduled_date. In second query, I am retrieving all the records with max due_date for each csg_order_id.
Now comes the problem: is there any way to combine these queries as one?
Well, after struggling for a while, finally found a way to combine the above two queries like this:
sqlContext.sql("SELECT OT1.* from OpenTaskTable AS OT1 INNER JOIN "
+ "(SELECT OT2.csg_order_id, MAX(OT2.due_date) AS MaxDD FROM "
+ "(SELECT csg_order_id, due_date from OpenTaskTable WHERE due_date < cast('"+unscheduledDate+"' as timestamp)) AS OT2 "
+ "GROUP BY OT2.csg_order_id) AS OT3 "
+ "ON OT1.csg_order_id = OT3.csg_order_id AND OT1.due_date = OT3.MaxDD");
Explanation:
Previously, in the first query, I was retrieving data from OpenTaskTable and then feeding it to the second query. Logically, in the second query also, I am just applying more filters over the retrieved data. At the end, we are trying to get all the attributes from OpenTaskTable only.
So, for this solution I simply used the first query, as the innermost query, and then selected MAX over the records grouped by csg_order_id. And, for the outermost query, just performed an inner join to get all matching csg_order_id records from OpenTaskTable.
Related
I wrote this SQL Query and using it as a native query in hibernate.
#Query(
value = "SELECT DISTINCT tp.* FROM TWITTER_POST AS tp " +
"JOIN TWITTER_LIST AS tl " +
"ON tl.owner_id = ?1 " +
"JOIN REL_TWITTER_LIST__ACCOUNTS_TRACKED_BY_LIST AS atbl " +
"ON tl.id = atbl.twitter_list_id " +
"JOIN TWITTER_ACCOUNT AS ta " +
"ON ta.id = atbl.accounts_tracked_by_list_id " +
"LEFT OUTER JOIN REL_TWITTER_POST__TWITTER_USERS_HIDING_POST uhp " +
"ON tp.id = uhp.twitter_post_id " +
"AND uhp.TWITTER_USERS_HIDING_POST_ID = ?1 " +
"WHERE uhp.twitter_post_id is NULL AND ta.id = tp.author_id",
countQuery = "SELECT DISTINCT count(tp.*) FROM TWITTER_POST AS tp " +
"JOIN TWITTER_LIST AS tl " +
"ON tl.owner_id = ?1 " +
"JOIN REL_TWITTER_LIST__ACCOUNTS_TRACKED_BY_LIST AS atbl " +
"ON tl.id = atbl.twitter_list_id " +
"JOIN TWITTER_ACCOUNT AS ta " +
"ON ta.id = atbl.accounts_tracked_by_list_id " +
"LEFT OUTER JOIN REL_TWITTER_POST__TWITTER_USERS_HIDING_POST uhp " +
"ON tp.id = uhp.twitter_post_id " +
"AND uhp.TWITTER_USERS_HIDING_POST_ID = ?1 " +
"WHERE uhp.twitter_post_id is NULL AND ta.id = tp.author_id",
nativeQuery = true
)
Page<TwitterPost> findAllNonHiddenForListsFromTwitterAccountId(Long twitterAccountId, Pageable pageable);
I noticed that the query executes very slowly when I'm running it through hibernate as opposed to a SQL tool. I assumed it was because I am using a native query as opposed to JQPL, which (from what I read) immediately does caching and pagination without requiring a definition for "count". Trying to convert it to JQPL failed, because I cannot find a good tutorial for more complicated queries on JQPL across join tables.
#Query(
value = "SELECT DISTINCT twitterPost " +
"FROM TwitterPost twitterPost " +
"JOIN TwitterList twitterList " +
"ON twitterList.owner.id = ?1 " +
"JOIN TwitterAccount tweetAuthorFromList " +
"ON tweetAuthorFromList IN twitterList.accountsTrackedByLists " +
"WHERE twitterPost.author = tweetAuthorFromList " +
"AND twitterList.owner NOT IN twitterPost.twitterUsersHidingPosts"
)
Page<TwitterPost> findAllNonHiddenPostsFromListsForTwitterAccountId(Long twitterAccountId, Pageable pageable);
Apparently my Syntax is off
org.hibernate.exception.SQLGrammarException: could not prepare
statement
but the compiler only shows me problems with the generated SQL, not the JQPL so I'm left in the dark.
Also checked for typical bad performance culprits i.e. eager fetching of entities which I set to lazy everywhere.
Any help regarding whether my performance problem assumptions are correct, or converting the query, are highly appreciated - thanks in advance!
There are many things that are wrong here:
Using SELECT DISTINCT with JOIN indicates that you should have used a Semi Join instead.
The ON tl.owner_id = ?1 is done for filtering, not for the projection, hence you are better off doing an EXISTS query.
Assuming why the query runs slow instead of profiling it. The reason why it runs faster in the DB tool is that DB tools usually truncate the result set while Spring Data consumes the entire result set. Or, if you run EXPLAIN, the output might come from the Optimizer without even running the query.
So, here's what you can do:
Use Semi Joins instead of Joins for filtering.
Use Blaze Persistence to write better entity queries dynamically.
Configure Statement Caching at the JDBC Driver level.
Use the slow query log to log the execution plan when the query is slower than N seconds.
I'm trying to get result from 3 inner join in java code with sql.
I tried this
BeanHandler rsh=new BeanHandler(evaluation.class);
evaluation eval=(evaluation) runner.query(query,new Object[]{userid}, rsh);
This code return the result from query, but with "join" statement it's not working, I want to get result from 3 tables.
any ideas?
Edit: (the query)
String query=" select users.username, sum(evaluation.mark)\n" +
"from users inner JOIN user_has_eval \n" +
"on users.id= user_has_eval.user_id inner JOIN\n" +
"evaluation ON evaluation.id=user_has_eval.eval_id\n" +
"where evaluation.types=\"Secretaria\" and users.id = ? \n" +
"\n" ;
Edit-
I'll add the use case to clear up the function of this.
The user will select two dates - a start date and an end date - these are then passed on and used to select the tables (each year has its own table). In one use case where the two given dates lie in the same year it's a simple query on that table alone.
However, if the two dates are different years I will need to join all tables (so 2011-2013 will be three tables connected, to search through) and thus, I want a dynamic fix to this. I know building up a query like below is against security - just thought something similar would work. As the system will get new tables each year I also dont want to have to manually add however many new queries for each case (2011-2016, 2014-2018, 2011-2019.. etc)
I have a question about whether it is possible to create a dynamic query as a String like below and then pass that through to service -> repository, and use that as a query?
for (int i = 0; i < yearCondition; i++) {
if (i == 0) {
query += "SELECT md.Device_ID, l.locationRef, " + reportRunForm.getStartDate() + " as 'From Date', "
+ reportRunForm.getEndDate() + " as 'To Date' "
+ "from mData.meterdata" + iDateStart.substring(0, 4)
+ " join MOL2.meters m on device_ID = m.meterUI "
+ "join MOL2.locations l on m.locationID = l.locationID "
+ "join MOL2.meterreg mr on m.meterID = mr.meterID "
+ "where mr.userID = ?1";
}
query += "UNION SELECT md.Device_ID, l.locationRef, " + reportRunForm.getStartDate() + " as 'From Date', "
+ reportRunForm.getEndDate() + " as 'To Date' "
+ "from mData.meterdata" + (Integer.parseInt(iDateStart.substring(0, 4))+i)
+ " join MOL2.meters m on device_ID = m.meterUI "
+ "join MOL2.locations l on m.locationID = l.locationID "
+ "join MOL2.meterreg mr on m.meterID = mr.meterID "
+ "where mr.userID = ?1";
}
I may have the wrong idea with how this works, and I know I could create and persist a query through entitymanager, but wanted to know whether doing it through the repository would be possible?
My thought was I'd build up the query like above, pass it through to service and then to repository, and bind it as value in #Query annotation but this doesn't seem possible. I'm likely approaching this wrong so any advice would be appreciated.
Thanks.
Edit -
Had a goof. Understand doing it at all like that is stupid, an approach to build up something similar is what I'm looking for that is still secure.
Maybe this annotations before your POJO can help
#org.hibernate.annotations.Entity(dynamicInsert = true)
for example two tables district and constituency ...
Dynamic query
query += "select *from constituency c where 1=1";
if(constituencyNumber!=null)
query +=" and c.constituency_number like '"+constituencyNumber+"%'";
query += " group by c.district_id";
OR
select *from constituency c where (c.constituency_number is null or c.constituency_number like '1%') group by c.district_id;
I'm having an issue with a SQL query in Android - it doesn't seem to be returning the correct result, which I think might be due to my query structure possibly.
This is a running app. I have an initial query that works correctly, that gets me the most common out of the last three runs, which is:
Cursor workout = db.rawQuery("SELECT ActivityID, DistanceID FROM Workout WHERE _id IN(Select _id FROM Workout ORDER BY _id DESC LIMIT 3) Group By ActivityID, DistanceID HAVING COUNT(*) > 1", null);
So now I have the most common distance + activity (running/biking ect..) out of the last three runs, so I wanted to run a query that got me the activity + distance combination on the last three with the fasted time, so I have this:
Cursor fastestWorkout = db.rawQuery
("Select _id, RunTimeSeconds, Distance FROM Workout WHERE ActivityID = " + workout.getInt(0) +
" AND DistanceID = " + workout.getInt(1) +
" AND RunTimeSeconds IN " +
"(Select Min(RunTimeSeconds) From Workout WHERE ActivityID = " + workout.getInt(0) +
" AND DistanceID = " + workout.getInt(1) +
" ORDER BY _id DESC LIMIT 3)" +
" ORDER BY _id DESC LIMIT 3", null);
So the only thing special here is the sub-query that gets me the fastest time, which I imagine might be at fault, however I can't work out why. The cursor has three results in it at the end of this, when it should be one (There are two activities+distances that are the same and one unique - with different RunTimeSeconds, so it should be pulling back that one that is the fastest.
Any help would be appreciated!
If I understand correctly you want the row with the fastest run time. Instead of using a subquery to find this value, why not just order by this run time and pick the first result :
Cursor fastestWorkout = db.rawQuery
("Select _id, RunTimeSeconds, Distance FROM Workout WHERE ActivityID = " + workout.getInt(0) +
" AND DistanceID = " + workout.getInt(1) +
" ORDER BY RunTimeSeconds LIMIT 1", null);
Thanks for the help! I managed to figure this one out in the end! Bwt's suggestion with the ordering by run-time is what got me there!
I just used a sub-query to select from the last three runs, then I could order by the run-time and limit it to just one result, getting the fastest run!
Cursor fastestWorkout = db.rawQuery
("Select _id, RunTimeSeconds, DistanceID FROM (Select _id, RunTimeSeconds, DistanceID, ActivityID FROM Workout ORDER BY _id DESC LIMIT 3) WHERE ActivityID = " + workout.getInt(0) +
" AND DistanceID = " + workout.getInt(1) +
" ORDER BY RunTimeSeconds LIMIT 1", null);
Thanks for your help ans answers!
Trying to join 3 tables within a query returns an empty result. Strange enough, having one table removed (two tables join) returns some set. Here is what I do:
String sql = "SELECT\n" +
" tc.constraint_name, tc.table_name, kcu.column_name, \n" +
" ccu.table_name AS foreign_table_name,\n" +
" ccu.column_name AS foreign_column_name, constraint_type \n" +
"FROM \n" +
" information_schema.table_constraints AS tc \n" +
" JOIN information_schema.key_column_usage AS kcu\n" +
" ON tc.constraint_name = kcu.constraint_name\n" +
" JOIN information_schema.constraint_column_usage AS ccu\n" +
" ON ccu.constraint_name = tc.constraint_name\n" +
"WHERE constraint_type = 'FOREIGN KEY'";
List<Map<String, Object>> foreignTable1 = jdbcTemplate(getShardId(sku)).queryForList(sql);
Would always return an empty set.
Try using outer joins and check whether there are rows which don't have corresponding IDs so that the join removes the non-matching rows. Especially that you write, that two tables result in a non-empty result set seems to indicate, that the join with the third table does not result in matching rows of the result set of the first two.