Fetch efficiently huge number of entities from DB meeting criteria - java

Let's say I have about half of million records in table A. Of course table A is joined with table B, C, etc. Now I have to fetch entities from table A which meet my criteria. My criteria consists of about 20-30 rules e.g. name in table A has to be like 'something' or date from table B for joined record from table A with ID=1 should be earlier than today. I see three solutions:
Write native query and put parameters from my criteria. But in this case I will join so many tables that it doesn't seem to me as a good example.
Fetch all records from table A and then check in Java every rule for each record. But this seems for me as the worst possible solution.
Use JPA Criteria. But to be honest I do not know how efficient they are while there are so many records and so many joined tables. What's more it seems to me like working with Criteria can be a little irritating when I have so many rules to match.
Maybe there is another (better) solution to my problem but I cannot see it now. I need to add that I need these fetched entities stored in Java collection because when they are all fetched then I have to work with them (e.g. generate report or create some updates in DB basing on these information).
I hope I described my problem clear and I will be thankful for every tip how to optimize such query.

Related

Spring JPA - Reading data along with relations - performance improvement

I am reading data from a table using Spring JPA.
This Entity object has one-to-many relationship to other six tables.
All tables together has 20,000 records in them.
I am using below query to fetch data from DB.
SELECT * FROM A WHER ID IN (SELECT ID FROM B WHERE COL1 = '?')
A table has relationship to other 6 tables.
Spring JPA is taking around 30 seconds of time to read this data from DB.
Any idea to improve the data fetch time here.
I am using native Queries here and i am looking for query rewriting that will optimize the data fetch time.
Please suggest thanks.
You might need consider below to identify the root cause:
Check if you are ending up with n+1 query issue. Your query might end up calling n queries for each join table, where n is no. of associations with the join table. You can check this by setting spring.jpa.show-sql=true
If you see the issue as n+1 then you need set appropriate FetchMode, refer https://www.baeldung.com/hibernate-fetchmode for detailed explanation of using different FetchModes.
If it is not n+1 query issue you might need to check the performance of the genarated queries using EXPLAIN command. Usually IN clause on a non indexed columns have performance impact.
So set spring.jpa.show-sql=true and check queries generated and run to debug and optimize your code or query.

Retrieve Hibernate associated entity with pagination

I am stuck with an issue. I have 3 tables that are associated with a table in one to many relationship.
An employee may have one or more degrees.
An employee may have one or more departments in past
An employee may have one or more Jobs
I am trying to fetch results using named query in a way that I fetch all the results from Degree table and Department table, but only 5 results from Jobs table. Because I want to apply pagination on Jobs table.
But, all these entities are in User tables as a set. Secondly, I don't want to change mapping file because of other usages of same files and due to some architectural restrictions.
Else in case of mapping I could use BatchSize annotation in mapping file, which I am not willing to do.
The best approach is to write three queries:
userRepository.getDegrees(userId);
userRepository.getDepartments(userId);
userRepository.getJobs(userId, pageIndex);
Spring Data is very useful for pagination, as well as simplifying your data access code.
Hibernate cannot fetch multiple Lists in a single query, and even for Sets, you don't want to run a Cartesian Product. So use queries instead of a single JPQL query.

Is it faster to programmatic join tables or use SQL Join statements when one table is much smaller?

Is it faster to programmatically join tables or use SQL Join statements when one table is much smaller?
More specifically, how does grabbing a string from a hashmap<int, string> of the smaller table and setting its value to the objects returned from the larger table compare to pre-joining the tables on the database? Does the relative sizes of the two tables make a difference?
Update: To rephrase my question. Does grabbing the subset of the larger table (the 5,000 - 20,000 records I care about) and then programmatically joining the smaller table (which I would cache locally) out perform an SQL join? Does the SQL join apply to the whole table or just the subset of the larger table that would be returned?
SQL Join Statement:
SELECT id, description
FROM values v, descriptions d
WHERE v.descID=d.descID
AND v.something = thingICareAbout;
Individual Statements:
SELECT id, descID
FROM values
WHERE v.something = thingICareAbout;
SELECT descID, description
FROM descriptions d;
Programmatic join:
for (value : values){
value.setDescription(descriptions.get(value.getDescID))
}
Additional Info: There are a total of 800,000,000 records in the larger table that corresponding to 3,000 values in the smaller table. Most searches return between 5,000 - 20,000 results. This is an oracle DB.
Don't even think it. The database can do things locally at least as fast as you can, and without having to ship all the data over the network.
In general, joining tables like this is the sort of operation that SQL databases are optimized for, so there is a good chance that they're fairly hard to beat on this sort of operation.
The relative size of the two tables might make a difference if you attempt to do the join "manually" as you have to factor in the additional memory consumption to hold the bigger table data in memory while you're doing your processing.
While this example is pretty easy to get right, by doing the join yourself you also lose a built-in data integrity check that the database would give you if you let it do the join.
Probably SQL would do the work faster. From my understanding, if you do it in your program then it would have to load the 800,000,000 records from the database into memory for your application, then the 3,000 for the small table, then match each record, discard almost all of them (you're only expecting a couple of thousand results) and display to the user.
If you put indexes on the right columns in oracle (descID in both tables) then it would be able to find the joining records very quickly and just load up the 5,000-20,000 that you're expecting.
That said though the easiest way to find out is to test both out and take numbers!
If you do the join in memory, you will need to download 800,000,000 + 3,000 records. If you do the join in the database you will need to download 5,000 - 20,000 results each time. Which sounds faster to you? Hint: If you do 100,000 searches the first option might be faster.

Fetching records one by one from PostgreSql DB

There's a DB that contains approximately 300-400 records. I can make a simple query for fetching 30 records like:
SELECT * FROM table
WHERE isValidated = false
LIMIT 30
Some more words about content of DB table. There's a column named isValidated, that can (as you correctly guessed) take one of two values: true or false. After a query some of the records should be made validated (isValidated=true). It is approximately 5-6 records from each bunch of 30 records. Correspondingly after each query, I will fetch the records (isValidated=false) from previous query. In fact, I'll never get to the end of the table with such approach.
The validation process is made with Java + Hibernate. I'm new to Hibernate, so I use Criterion for making this simple query.
Is there any best practices for such task? The variant with adding a flag-field (that marks records which were fetched already) is inappropriate (over-engineering for this DB).
Maybe there's an opportunity to create some virtual table where records that were already processed will be stored or something like this. BTW, after all the records are processed, it is planned to start processing them again (it is possible, that some of them need to be validated).
Thank you for your help in advance.
I can imagine several solutions:
store everything in memory. You only have 400 records, and it could be a perfectly fine solution given this small number
use an order by clause (which you should do anyway) on a unique column (the PK, for example), store the ID of the last loaded record, and make sure the next query uses where ID > :lastId

Partitioning with Hibernate

We have a requirement to delete data in the range of 200K from database everyday. Our application is Java/Java EE based using Oracle DB and Hibernate ORM tool.
We explored various options like
Hibernate batch processing
Stored procedure
Database partitioning
Our DBA suggests database partitioning is the best way to go, so we can easily recreate and drop the partitioned table everyday. Now the issue is we have 2 kinds of data, one which we want to delete everyday and the other which we want to keep it. Suppose this data is stored in table "Trade". Now with partitioning, we have 2 tables "Trade". We have already existing Hibernate based DAO layer to fetch/store trades from/to DB. When we decide to partition the database, how can we control the trades to go in which of the two tables through hibernate. Basically I want , the trades need to be deleted by end of the day, to go in partitioned table and the trades I want to keep, in main table. Please suggest how can this be possible with Hibernate. We may add an additional column to identify the trades to be deleted but how can we ensure these trades should go to partitioned trade table using hibernate.
I would appreciate if someone can suggest any better approach in case we are on wrong path.
When we decide to partition the database, how can we control the trades to go in which of the two tables through hibernate.
That's what Hibernate Shards is for.
You could use hibernate inheritance strategy.
If you know at object creation that it will be deleted by the end of the day, you can create a VolatileTrade that is a subclass of Trade (with no other attribute). Use the 'table per concrete class' strategy (section 9.1.5 of hibernate 3.3 reference documentation) for the mapping.
(I think i would do an abstract superclass Trade, and two concrete subclasses : PersistentTrade and VolatileTrade, so that if you have some other classes that you know will reference only PersistentTrade (or Volatile), you can constrain that in your code. If you had used the Trade superclass as the PersistentTrade, you won't be able to enforce that.)
The volatile trade will go in one table and the 'persitent' trade will go in another table.
Be aware that you won't be able to set a fk constraint on any Trade (persistent and volatile) from other table in the db.
Then you just have to clear the table when you want.
Be careful to define a locking mechanism so that no other thread will try to write data to the table during the drop and the create (if you use that). That won't be an easy task, and doing it rightfully might impact the performance of all operation inserting data in the table (as it will require acquiring the lock).
Wouldn't it be more easy to truncate the table ?

Categories