Join multiple table return no results - java

I have SQL query joining multiple tables.
SELECT a.fourbnumber,a.fourbdate,a.taxcollector,b.cashcheque,c.propertycode
from tbl_rphead a
inner join tbl_rpdetail b on a.rpid = b.rpid
inner join tbl_assessmregister c on b.assessmid = c.assessmid
I can execute that query in Sql Editor with fast manner (3 secs). When I execute that query using JAVA(JDBC), it doesn’t returns any results and no exceptions
I don’t know how to fix that problem.
Each table has 200k records

Your Sql Editor might limiting the result to some count to show the records in view. See the editor you may find the hint showing 500 of XXXXXX
When you calling it from JDBC it may get the results faster from DB, but it need to fill up the result set objects for those lacs of records. It will more time and memory.
If you are working with oracle DB try limiting records in your query with help of rownum < 100 , so you could get the results in java/jdbc. If it works go with SQL pagination technique with rownum < x and rownum > y

Related

SQL LIMIT vs. JDBC Statement setMaxRows. Which one is better?

I want to select the Top 10 records for a given query. So, I can use one of the following options:
Using the JDBC Statement.setMaxRows() method
Using LIMIT and OFFSET in the SQL query
What are the advantages and disadvantages of these two options?
SQL-level LIMIT
To restrict the SQL query result set size, you can use the SQL:008 syntax:
SELECT title
FROM post
ORDER BY created_on DESC
OFFSET 50 ROWS
FETCH NEXT 50 ROWS ONLY
which works on Oracle 12, SQL Server 2012, or PostgreSQL 8.4 or newer versions.
For MySQL, you can use the LIMIT and OFFSET clauses:
SELECT title
FROM post
ORDER BY created_on DESC
LIMIT 50
OFFSET 50
The advantage of using the SQL-level pagination is that the database execution plan can use this information.
So, if we have an index on the created_on column:
CREATE INDEX idx_post_created_on ON post (created_on DESC)
And we execute the following query that uses the LIMIT clause:
EXPLAIN ANALYZE
SELECT title
FROM post
ORDER BY created_on DESC
LIMIT 50
We can see that the database engine uses the index since the optimizer knows that only 50 records are to be fetched:
Execution plan:
Limit (cost=0.28..25.35 rows=50 width=564)
(actual time=0.038..0.051 rows=50 loops=1)
-> Index Scan using idx_post_created_on on post p
(cost=0.28..260.04 rows=518 width=564)
(actual time=0.037..0.049 rows=50 loops=1)
Planning time: 1.511 ms
Execution time: 0.148 ms
JDBC Statement maxRows
According to the setMaxRows Javadoc:
If the limit is exceeded, the excess rows are silently dropped.
That's not very reassuring!
So, if we execute the following query on PostgreSQL:
try (PreparedStatement statement = connection
.prepareStatement("""
SELECT title
FROM post
ORDER BY created_on DESC
""")
) {
statement.setMaxRows(50);
ResultSet resultSet = statement.executeQuery();
int count = 0;
while (resultSet.next()) {
String title = resultSet.getString(1);
count++;
}
}
We get the following execution plan in the PostgreSQL log:
Execution plan:
Sort (cost=65.53..66.83 rows=518 width=564)
(actual time=4.339..5.473 rows=5000 loops=1)
Sort Key: created_on DESC
Sort Method: quicksort Memory: 896kB
-> Seq Scan on post p (cost=0.00..42.18 rows=518 width=564)
(actual time=0.041..1.833 rows=5000 loops=1)
Planning time: 1.840 ms
Execution time: 6.611 ms
Because the database optimizer has no idea that we need to fetch only 50 records, it assumes that all 5000 rows need to be scanned. If a query needs to fetch a large number of records, the cost of a full-table scan is actually lower than if an index is used, hence the execution plan will not use the index at all.
I ran this test on Oracle, SQL Server, PostgreSQL, and MySQL, and it looks like the Oracle and PostgreSQL optimizers don't use the maxRows setting when generating the execution plan.
However, on SQL Server and MySQL, the maxRows JDBC setting is taken into consideration, and the execution plan is equivalent to an SQL query that uses TOP or LIMIT. You can run the tests for yourself, as they are available in my High-Performance Java Persistence GitHub repository.
Conclusion
Although it looks like the setMaxRows is a portable solution to limit the size of the ResultSet, the SQL-level pagination is much more efficient if the database server optimizer doesn't use the JDBC maxRows property.
For most cases, you want to use the LIMIT clause, but at the end of the day both will achieve what you want. This answer is targeted at JDBC and PostgreSQL, but is applicable to other languages and databases that use a similar model.
The JDBC documentation for Statement.setMaxRows says
If the limit is exceeded, the excess rows are silently dropped.
i.e. The database server may return more rows but the client will just ignore them. The PostgreSQL JDBC driver limits on both the client and server side. For the client side, have a look at the usage of maxRows in the AbstractJdbc2ResultSet. For the server side, have a look of maxRows in QueryExecutorImpl.
Server side, the PostgreSQL LIMIT documentation says:
The query optimizer takes LIMIT into account when generating a query
plan
So as long as the query is sensible, it will load only the data it needs to fulfill the query.
setFetchSize Gives the JDBC driver a hint as to the number of rows that should be fetched from the database when more rows are needed for ResultSet objects generated by this Statement.
setMaxRows Sets the limit for the maximum number of rows that any ResultSet object generated by this Statement object can contain to the given number.
I guess using above 2 JDBC API you can try by using setFetchSize you can try if it works for 100K records. Else you can fetch in batch and form ArrayList and return it to your Jasper report.
not sure if i am right, but i remember in the past i was involved in big project to change all queries that were expected to return one row into 'TOP 1' or numrows=1. Reason was that the DB would stop searching for 'next possible matches' when this 'hint' was used. And in high volume environments this really made a difference. The remark that you can 'ignore' superfluous records in the client or in the resultset is not enough. You should avoid unnecessary reads as early as possible. But i have no idea whether the JDBC methods add those db specific hints to the query y/n. I may need to test however to see and use it ... i am not db specialist and can imagine i am not right, but "Speedwise it seems like no difference" can be a wrong assumption ... E.g. if you are asked to search in box for red balls and you only need one, it does not add value to keep searching for all where for you one is enough ... Then it matters to specify 'TOP 1' ...

How to fetch limited data from mysql java

Can I limit the number of rows in result set?
My table contains some 800000 rows, if I fetch them in result set, this will definitely lead to OOM exception. each row has 40 columns.
I do not want to want work on them at the same time, but each row is to be filtered out for some data.
Thank you in advance.
Something like following should be a SQL solution but albeit rather ineffective, since each time you will have to fetch the increasing amount of rows.
Assuming that you have your ORDER BY is based on unique int and
that you will be fetching 1000 rows at a time.
SET currenttop = 0;
SET cuurentid = 0;
SELECT * FROM YourTable t
WHERE t1.ID > #currentID AND (#currentid := t1.ID) IS NOT NULL;
LIMIT (#currenttop:=#currenttop+1000);
Of course you can choose to handle variable from your Java code.
You could use JDBC fetch size to limit the result in the result set. It is better than the SQL LIMIT as it will work for other database as well without changing the query. Jdbc diriver will not read the whole result from the database. Each time it will retrieve the records specified by the fetch size and there will be no memory issue anymore.
You can use limit keyword with sql query, see following
SELECT * FROM tbl LIMIT 5,10; # Retrieve rows 6-15
You can read more about using limit here
Cheers !!
You have a couple options. First, add a limit to the SQL query. Second, you could use JDBCTemplate.query() with a RowCallbackHandler to process one row at a time. The template will handle memory issues with a large result set.

Deletion of bulk records using spring jdbc

I have two tables table1 and table2 and joining them using inner join on one column.
There is a possibility that child table can have more than 50 million recorrds.
It took 30 mins to delete 17 million records using spring jdbc update().
is there a optimized way to reduce deletion time.
Use batchUpdate with some copeable batch size, eg. 5000.
EDIT: The problem is probably not in Spring jdbc but in your query.
Would this work for you?
DELETE
res
FROM
RESULT res
INNER JOIN
POSITION pos
ON res.POSITION_ID = pos.POSITION_ID
WHERE
pos.AS_OF_DATE = '2012-11-29 11:11:11'
This removes entries from RESULT table. Simplified SQL fiddle demo: http://www.sqlfiddle.com/#!3/4a71e/15

Hibernate limit result inquiry

How does maxresult property of hibernate query works? in the example below :
Query query = session.createQuery("from MyTable");
query.setMaxResults(10);
Does this get all rows from database, but only 10 of them are displayed? or this is same as limit in sql.
It's the same as LIMIT, but it is database-independent. For example MS SQL Server does not have LIMIT, so hibernate takes care of translating this. For MySQL it appends LIMIT 10 to the query.
So, always use query.setMaxResults(..) and query.setFirstResult(..) instead of native sql clauses.
SetMaxResults retrieves all the rows and displays the number of rows that are set. In my case I didn't manage to find a method that can really retrieve only a limit number of rows.
My recommendation is to create the query yourself and set there "rownum" or "limit".

Change of design of queries to improve performance

This is more like a design question but related to SQL optimization as well.
My project has to import a large number of records into the database (more than 100k records). In the meantime, the project has logic to check each record to make sure it meets the criteria which are configurable. It then will mark the record as no warning or has warning in the database. The inserting and warning checking are done within one importing process.
For each criteria it has to query the database. The query needs to join two other tables and sometimes add additional nested query inside the conditions, such as
select * from TableA a
join TableB on ...
join TableC on ...
where
(select count(*) from TableA
where TableA.Field = Bla) > 100
Although the queries take unnoticeable time, to query the entire record set takes a considerable amount of time which may be 4 - 5 hours on a server. Especially if there are many criteria, at the end the project will stop running the import and rollback.
I've tried changing "SELECT * FROM" to "SELECT TableA.ID FROM" but it seems it has no effect at all. Is there a better design to improve the performance of this process?
How about making a temp table (or more than one) that stores the aggregated results of the sub-queries, then indexing that/those with covering indexes.
From your code above, we'd make a temp table grouping on TableA.Field1 and including a count, then index on Field1, theCount. On SQL server the fastest approach would then be:
select * from TableA a
join TableB on ...
join TableC on ...
join (select Field1 from #temp1 where theCount > 100) t on...
The reason this works is that we are doing the same trick twice.
First, we pre-aggregate into the temp table, which is a simple operation and very easy for SQL Server to optimize. So we have taken a piece of the problem and solved in an optimizable way.
Then we repeat this trick by joining to a subquery, putting the filter inside the subquery, so that the join acts as a filter.
I would suggest you batch your records together (500 or so at a time) and send it to a stored proc which can do the calculation.
Use simple statements instead of joins in there. That saves as well. This link might help as well.
Good choice is using indexed view.
http://msdn.microsoft.com/en-us/library/dd171921(SQL.100).aspx

Categories