I am facing an issue while inserting 100 000 records at once using spring data JPA repository. When we execute repo.save(List<Objs>) it is taking a lot of time if we use Sequence generator as it queries the database for the nextval. I am using Oracle, which ID generation is best here?
Sequence generator is probably a good choice, but you have to tweak its parameters.
In your particular case, I'd start experimenting with allocation size, and then with strategy.
See for example: JPA/Hibernate bulk inserts slow
Take a look at the optimizers configuration:
https://vladmihalcea.com/hibernate-hidden-gem-the-pooled-lo-optimizer/
Note that your configuration resolves to:
SequenceHiLoGenerator on Hibernate 4
SequenceStyleGenerator on Hibernate 5, (it has hibernate.id.new_generator_mappings set to true)
You cannot use identity generator (see Hibernate disabled insert batching when using an identity identifier generator)
Table generator is not the best performant one (https://vladmihalcea.com/why-you-should-never-use-the-table-identifier-generator-with-jpa-and-hibernate/)
Additionally, make sure that the number of nextval() is the actual problem.
Maybe changing batch size or statement ordering will help (see https://vladmihalcea.com/how-to-batch-insert-and-update-statements-with-hibernate/)
allocationSize=1 is the real issue here. With this configuration hibernate will call nextVal() for each insert so if you have 1000 inserts then hibernate will call nextVal() a 1000 times.
For more information refer to this article by Vlad Mihalcea
Related
I need to insert many entities into the database via Hibernate. So, I want to find the most effective algorithm for Id generation.
Accordingly Hibernate Documentation exists four widely used generation strategies:
IDENTITY
SEQUENCE
TABLE
AUTO
I should use MySQL database, so I cannot apply SEQUENCE generation strategy. What about other strategies? What is the most efficient from performance point of view?
The best id generators in Hibernate are enhanced-table and enhanced-sequence, coupled with an appropriate optimizer, such as hilo. I have experience with enhanced-table + hilo, inserting over 10,000 records per second.
BTW the statement that "hilo needs an additional query per generated entity" is patently false: the whole point of the optimizer is to prevent this.
As you can't use SEQUENCE, and AUTO just automatically selects a supported generator algorithm out of the existing ones, you are left with IDENTITY and TABLE.
TABLE: uses a hi/lo algorithm to efficiently generate identifiers of type long, short or int, given a table and column as a source of hi values. The hi/lo algorithm generates identifiers that are unique only for a particular database. -> Means an extra query per generated entity. (This is not true if you use optimizers. Unfortunately, using no optimizer generally is the default, if no optimizer was specified.)
IDENTITY: supports identity columns in DB2, MySQL, MS SQL Server, Sybase and HypersonicSQL. -> Performance-wise, this is the way to go, the same way you would do without Hibernate normally. Database generated, almost no overhead.
There exist more Hibernate specific generators, but they won't beat performance-wise the database generated ID. (See 5.1.2.2.1. Various additional generators in your linked document.)
I am using Hibernate 3.0 in my application with Postgres database. It is a monitoring application and gathers data every minute. So we have thousands of rows in some tables every month.
Currently i am using sequence for generating Id in hibernate. Is there any better option according to this scenario?
Any suggestion will be appreciated.
IMHO sequence is the best approach because it gives you more flexibility although you may also use identity (auto-increment) column. I think it postgres it is called serial and there is also a way to store ids in sepearate table. To address these 3 approach you may use
appropriately :
#GeneratedValue(strategy=GenerationType.TABLE)
#GeneratedValue(strategy=GenerationType.SEQUENCE)
#GeneratedValue(strategy=GenerationType.IDENTITY)
As for your previous question whether it is good to use single sequence for all tables. I wouldn't recommend this approach becasue db must assert that all sequence numbers are unique that is why each sequence generated value needs to be synchronized by the db server. If you have single sequence per db it may cause performace issues when multiple requests from multiple tables asks for next id value. I would rather recommend to have single sequence per table.
While I am not sure if there is a better alternative than using a sequence, I am pretty sure that you would want to look at using StatelessSession if this is just for gathering data. You can get rid of all the overhead for e.g 1st level cache, transactional write-behind etc
Is it OK to truncate tables while at the same time using Hibernate to insert data?
We parse a big XML file with many relationships into Hibernate POJO's and persist to the DB.
We are now planning on purging existing data at certain points in time by truncating the tables. Is this OK?
It seems to work fine. We don't use Hibernate's second level cache. One thing I did notice, which is fine, is that when inserting we generate primary keys using Hibernate's #GeneratedValue where Hibernate just uses a key value one greater than the highest value in the table - and even though we are truncating the tables, Hibernate remembers the prior value and uses prior value + 1 as opposed to starting over at 1. This is fine, just unexpected.
Note that the reason we do truncate as opposed to calling delete() on the Hibernate POJO's is for speed. We have gazillions of rows of data, and truncate is just so much faster.
We are now planning on purging existing data at certain points in time by truncating the tables. Is this OK?
If you're not using the second level cache and if you didn't load Entities from the table you're going to truncate in the Session, the following should work (assuming it doesn't break integrity constraints):
Session s = sf.openSession();
PreparedStatement ps = s.connection().prepareStatement("TRUNCATE TABLE XXX");
ps.executeUpdate();
And you should be able to persist entities after that, either in the same transaction or another one.
Of course, such a TRUNCATE won't generate any Hibernate event or trigger any callback, if this matters.
(...) when inserting we generate primary keys using Hibernate's #GeneratedValue (...)
If you are using the default strategy for #GeneratedValue (i.e. AUTO), then it should default to a sequence with Oracle and a sequence won't be reseted if you truncate a table or delete records.
We truncate tables like jdbcTemplate.execute("TRUNCATE TABLE abc")
This should be equivalent (you'll end-up using the same underlying JDBC connection than Hibernate).
What sequence would Hibernate use for the inserts?
AFAIK, Hibernate generates a default "hibernate_sequence" sequence for you if you don't declare your own.
I thought it was just doing a max(field) + 1 on the table?
I don't think so and the fact that Hibernate doesn't start over from 1 after the TRUNCATE seems to confirm that it doesn't. I suggest to activate SQL logging to see the exact statements performed against your database on INSERT.
The generator we specify for #GeneratedValue is just a "dummy" generator (doesn't correspond to any sequence that we've created).
I'm not 100% sure but if you didn't declare any #SequenceGenerator (or #TableGenerator), I don't think that specifying a generator changes something.
Depends on your application. If deleting rows in the database is okey, then truncate is okey, too.
As far as you don't have any Pre- or PostRemove listeners on your entities, there should be no problems.
On the other hand... is it possible that there are still entities loaded in an EntityManager at truncate time, or is this a writeonly table (like a logging table). In this case you won't have any problem at all.
I have 5 MySQL InnoDB tables: Test,InputInvoice,InputLine,OutputInvoice,OutputLine and each is mapped and functioning in Hibernate. I have played with using StatelessSession/Session, and JDBC batch size. I have removed any generator classes to let MySQL handle the id generation- but it is still performing quite slow.
Each of those tables is represented in a java class, and mapped in hibernate accordingly. Currently when it comes time to write the data out, I loop through the objects and do a session.save(Object) or session.insert(Object) if I'm using StatelessSession. I also do a flush and clear (when using Session) when my line count reaches the max jdbc batch size (50).
Would it be faster if I had these in a 'parent' class that held the objects and did a session.save(master) instead of each one?
If I had them in a master/container class, how would I map that in hibernate to reflect the relationship? The container class wouldn't actually be a table of it's own, but a relationship all based on two indexes run_id (int) and line (int).
Another direction would be: How do I get Hibernate to do a multi-row insert?
The ID generation strategy is critical for batch insertion in Hibernate. In particular, IDENTITY generation will usually not work (note that AUTO typically maps to IDENTITY as well). This is because during batch insert Hibernate has a flag called "requiresImmediateIdAccess" that says whether or not generated IDs are immediately required or not; if so, batch processing is disabled.
You can easily spot this in the DEBUG-level logs when it says "executing identity-insert immediately" - this means it has skipped batch processing because it was told that generated IDs are required immediately after insertion.
Generation strategies that typically do work are TABLE and SEQUENCE, because Hibernate can pre-generate the IDs, thereby allowing for batch insertion.
A quick way to spot whether your batch insertion works is to activate DEBUG-level logs as BatchingBatcher will explicitly tell you the batch size it's executing ("Executing batch size: " + batchSize ).
Additionally, the following properties are important for achieving batch insertion. I daren't say they're required as I'm not enough of a Hibernate-expert to do so - perhaps it's just my particular configuration - but in my experience they were needed nonetheless:
hibernate.order_inserts = true
hibernate.order_updates = true
These properties are pretty poorly documented, but I believe what they did was enable for the SQL INSERT and UPDATE statements to be properly grouped for batch execution; I think this might be the multi-row inserts you're after. Don't shoot me if I'm wrong on this, I'm recalling from memory.
I'll also go ahead and assume that you set the following property; if not, this should serve as a reminder:
hibernate.jdbc.batch_size = xx
Where xx is your desired batch size, naturally.
The final solution for me was to use voetsjoeba's response as a jumping off point.
My hibernate config uses the following options:
hibernate.order_inserts = true
hibernate.order_updates = true
I changed from using Session to
StatelessSession
Re-ordered the
Java code to process all the elements
in a batch a table at a time. So all
of table x, then table y, etc.
Removed the <generator> from each
class. Java now creates it and
assigns it to the object
Created logic that allowed me to determine if just
an id was being set and not write
'empty' lines to the database
Finally, I turned on dynamic-insert
for my classes in their hibernate
definitions like so: <class name="com.my.class" table="MY_TABLE" dynamic-insert="true">
How can I generate insert statements like insert into table (sequence.nextval, 'b0) using hibernate?
Hibernate currently selects the sequence.nextval value and only then it uses the value to insert the entry in the table.
Note: I'm not very fond of custom id generators.
Hibernate selects sequence.nextval because it has to return that value back to you (e.g. set ID on your entity). Unless you're doing something very esoteric I strongly doubt this has a big impact on performance (e.g. it's nothing compared to the actual insert). That said, you can look at Hibernate's sequence hi-lo generator - it would only access the sequence once in a while instead of every insert.
If you're using Oracle 10 client or above, check out sequence-identity in the most recent Hibernate versions to do what you're asking for.