I have 5000 records as search result and based on product number have to pull the related data associated with product number.that means seperate the 5000 product number and sending to database to pull the data.Creating one query and hiting the database for each product number is not efficient.
I'm looking for some idea to handle this situation.
Note:using hibernate and oracle and java
You got that search result with some query, it might be simpler to reuse that query with a join to retrieve the related data.
Instead of 5000 queries to get the result, you may use the IN clause.
You should probably split it in chunks, however, since such long SQL queries can throw errors, or use a temporary table and do a JOIN. Take a look at this.
maybe you could use a Materialized View and some basic paging? http://docs.oracle.com/cd/A97630_01/server.920/a96567/repmview.htm
Related
I am looking for a way how to process a large amount of data that are loaded from the database in a reasonable time.
The problem I am facing is that I have to read all the data from the database (currently around 30M of rows) and then process them in Java. The processing itself is not the problem but fetching the data from the database is. The fetching generally takes from 1-2 minutes. However, I need it to be much faster than that. I am loading the data from db straight to DTO using following query:
select id, id_post, id_comment, col_a, col_b from post_comment
Where id is primary key, id_post and id_comment are foreign keys to respective tables and col_a and col_b are columns of small int data types. The columns with foreign keys have indexes.
The tools I am using for the job currently are Java, Spring Boot, Hibernate and PostgreSQL.
So far the only options that came to my mind were
Ditch hibernate for this query and try to use plain jdbc connection hoping that it will be faster.
Completely rewrite the processing algorithm from Java to SQL procedure.
Did I miss something or these are my only options? I am open to any ideas.
Note that I only need to read the data, not change them in any way.
EDIT: The explain analyze of the used query
"Seq Scan on post_comment (cost=0.00..397818.16 rows=21809216 width=28) (actual time=0.044..6287.066 rows=21812469 loops=1), Planning Time: 0.124 ms, Execution Time: 8237.090 ms"
Do you need to process all rows at once, or can you process them one at a time?
If you can process them one at a time, you should try using a scrollable result set.
org.hibernate.Query query = ...;
query.setReadOnly(true);
ScrollableResults sr = query.scroll(ScrollMode.FORWARD_ONLY);
while(sr.next())
{
MyClass myObject = (MyClass)sr.get()[0];
... process row for myObject ...
}
This will still remember every object in the entity manager, and so will get progressively slower and slower. To avoid that issue, you might detach the object from the entity manager after you're done. This can only be done if the objects are not modified. If they are modified, the changes will NOT be persisted.
org.hibernate.Query query = ...;
query.setReadOnly(true);
ScrollableResults sr = query.scroll(ScrollMode.FORWARD_ONLY);
while(sr.next())
{
MyClass myObject = (MyClass)sr.get()[0];
... process row for myObject ...
entityManager.detach(myObject);
}
If I was in your shoes I would definitely bypass hibernate and go directly to JDBC for this query. Hibernate is not made for dealing with large result sets, and it represents an additional overhead for benefits that are not applicable to cases like this one.
When you use JDBC, do not forget to set autocommit to false and set some large fetch size (of the order of thousands) or else postgres will first fetch all 21 million rows into memory before starting to yield them to you. (See https://stackoverflow.com/a/10959288/773113)
Since you asked for ideas, I have seen this problem being resolved in below options depending on how it fits in your environment:
1) First try with JDBC and Java, simple code and you can do a test run on your database and data to see if this improvement is enough. You will here need to compromise on the other benefits of Hibernate.
2) In point 1, use Multi-threading with multiple connections pulling data to one queue and then you can use that queue to process further or print as you need. you may consider Kafka also.
3) If data is going to further keep on increasing you can consider Spark as the latest technology which can make it all in memory and will be much more faster.
These are some of the options, please like if these ideas help you anywhere.
Why do you 30M keep in memory ??
it's better to rewrite it to pure sql and use pagination based on id
you will be sent 5 as the id of the last comment and you will issue
select id, id_post, id_comment, col_a, col_b from post_comment where id > 5 limit 20
if you need to update the entire table then you need to put the task in the cron but also there to process it in parts
the memory of the road and downloading 30M is very expensive - you need to process parts 0-20 20-n n+20
I am executing a custom built DML statement using the
namedParameterJdbcTemplate.update(sql, valueMap);
call, where the sql is built based on the values in the map. Here my map could get very large and thus the sql might also get very lengthy. I understand that in Oracle, there is no fixed number for how long a query can be and there are many factors including the database configuration that may affect this value, but I would like to limit the query length to a fixed number.
What is the best way to limit the query length? Would the spring-batch API be any useful here?
Thanks in advance for any pointers.
I would choose one of these approaches:
Temporary table - Insert data in batch to the temporary table and then use MERGE INTO statement with that table.
Create SQL type for your rows and bind just that one parameter. (google for OraData - it is a bit tricky but it works)
Both will enable you to have a small static query and therefore avoid potential problems with too large query (and its parsing, polluting library cache etc.).
I want to use Bulk API of Salesforce to run queries of this format.
Select Id from Object where field='<value>'.
I have thousands of such field values and want to retrieve Id of those objects. AFAIK, Bulk query of Salesforce supports only one SOQL statement as input.
One option could be to form a query like
Select Id,field where field in (<all field values>)
but problem is SOQL has 10000 characters limitation.
Any suggestions here?
Thanks
It seems like you are attempting to perform some kind of search query. If so you might look into using a SOSL query as opposed to SOQL as long as the fields you are searching are indexed by SFDC.
Otherwise, I agree with Born2BeMild. Your second approach is better and breaking up your list of values into batches would help get around the limits.
It would also help if you described a bit of your use case in more detail. Typically queries on a dynamic set of fields and values doesn't always yield the best performance even with the bulk api. You are almost better off downloading the data to a local database and exploring the data that way.
You could break those down into batches of 200 or so values and iteratively query Salesforce to build up a result set in memory or process subsets of the data.
You would have to check the governor limits for the maximum number of SOQL queries though. You should be able to track your usage via the API at runtime to avoid going over the maximum.
The problem is that you are hitting the governor limits. Saleforce can only process 200 records at a time if its coming from a database. Therefore to be able to work with all this records first you need to add all records to a list for example:
List<Account> accounts= [SELECT id, name, FROM Account];
Then you can work with the list accounts do everything you need to do with it then when you done you can update the database using:
Update accounts;
this link might be helpful:
https://help.salesforce.com/apex/HTViewSolution?id=000004410&language=en_US
I have this table in oracle and i need to retrieve two columns from the table desc_data
eg:
select ticket_id, date_logged from desc_data;
I would have around 10,000 records in this table, so if I do this operation from java and perform some operations in java by putting these values in a list and then based on some conditions filter data and insert back into some other table, would it be possible and if it's possible would it be an overhead?
I think better to use a stored procedure in database and just call it from java. But what you consider is a possible solution too.
It depends on what type of filtering you wish to do on your 10000 records. If the filtering is simple, such as filtering the records in a date range, then you can achieve that just using SQL. If your processing is more complex, then you could also use an stored procedure. As you are running on Oracle these can be written in Java. See here for an example.
I am able to insert, filter and order records but can't use a simple count!!
I am wondering if there a way to get no of total rows in a table??
Is there a way to use GROUP BY in query?
You can see statistics about your data at admin interface or by using DatastoreService, and there you can find total count of rows ("entities of kind").
There no way to use GROUP BY, it's unsupported by Google BigTable. See "Unsupported Features"
Before using storages like this you need to read about NoSQL, and understand how it works, and why there can't be group_by, join, etc.
The simplest way is to use cursors to iterate over all records that match the query and count them, but this will give you a big overhead if you have many entities.
The smarter way is if you don`t have too complicated filter to use sharded counters to count the results of every possible filter, this also will bring you a headache if you have too much options in the filter.
Why is so important to show the number of the user which bring me to the old joke:
Q: How to count 8 millions rows?
A: Why/Who need to know that there are 8M rows?