Is it OK to truncate tables while at the same time using Hibernate to insert data?
We parse a big XML file with many relationships into Hibernate POJO's and persist to the DB.
We are now planning on purging existing data at certain points in time by truncating the tables. Is this OK?
It seems to work fine. We don't use Hibernate's second level cache. One thing I did notice, which is fine, is that when inserting we generate primary keys using Hibernate's #GeneratedValue where Hibernate just uses a key value one greater than the highest value in the table - and even though we are truncating the tables, Hibernate remembers the prior value and uses prior value + 1 as opposed to starting over at 1. This is fine, just unexpected.
Note that the reason we do truncate as opposed to calling delete() on the Hibernate POJO's is for speed. We have gazillions of rows of data, and truncate is just so much faster.
We are now planning on purging existing data at certain points in time by truncating the tables. Is this OK?
If you're not using the second level cache and if you didn't load Entities from the table you're going to truncate in the Session, the following should work (assuming it doesn't break integrity constraints):
Session s = sf.openSession();
PreparedStatement ps = s.connection().prepareStatement("TRUNCATE TABLE XXX");
ps.executeUpdate();
And you should be able to persist entities after that, either in the same transaction or another one.
Of course, such a TRUNCATE won't generate any Hibernate event or trigger any callback, if this matters.
(...) when inserting we generate primary keys using Hibernate's #GeneratedValue (...)
If you are using the default strategy for #GeneratedValue (i.e. AUTO), then it should default to a sequence with Oracle and a sequence won't be reseted if you truncate a table or delete records.
We truncate tables like jdbcTemplate.execute("TRUNCATE TABLE abc")
This should be equivalent (you'll end-up using the same underlying JDBC connection than Hibernate).
What sequence would Hibernate use for the inserts?
AFAIK, Hibernate generates a default "hibernate_sequence" sequence for you if you don't declare your own.
I thought it was just doing a max(field) + 1 on the table?
I don't think so and the fact that Hibernate doesn't start over from 1 after the TRUNCATE seems to confirm that it doesn't. I suggest to activate SQL logging to see the exact statements performed against your database on INSERT.
The generator we specify for #GeneratedValue is just a "dummy" generator (doesn't correspond to any sequence that we've created).
I'm not 100% sure but if you didn't declare any #SequenceGenerator (or #TableGenerator), I don't think that specifying a generator changes something.
Depends on your application. If deleting rows in the database is okey, then truncate is okey, too.
As far as you don't have any Pre- or PostRemove listeners on your entities, there should be no problems.
On the other hand... is it possible that there are still entities loaded in an EntityManager at truncate time, or is this a writeonly table (like a logging table). In this case you won't have any problem at all.
Related
I'm not sure exactly where the error is coming from, unfortunately, but I have a guess and would like to know the best way to solve it.
Problem
Suppose we have the following table in the database
ID
Field A
Field B
Field C
1
A
C
Something
2
B
C
Something else
And we have two unique indexes on the table
Unique-Index1 (ID)
Unique-Index2 (FieldA, FieldB)
Now I am loading both entities
Session session = ...();
Transaction tx = session.beginTransaction();
TestTable dataset1 = (TestTable) session.get(TestTable.class, 1);
TestTable dataset2 = (TestTable) session.get(TestTable.class, 2);
And now I want to do something like this
update testtable set fielda = 'B' where id = 1;
update testtable set fielda = 'A' where id = 2;
So at the end the unique key is not violated, but after the first statement, the unique index is violated.
In my JAVA application it looks like this
dataset1.setFieldA("B");
dataset2.setFieldA("A");
session.saveOrUpdate(dataset1);
session.saveOrUpdate(dataset2);
tx.commit();
After executing the application I get the following exception
Could not execute JDBC batch update
Unfortunately, the error is not really meaningful. Also, I don't get any information whether it might be a duplicate or not. But if I delete the unique index, it works. So my guess is that it is because of that.
Used frameworks / systems
Java 17 SE application, using Hibernate 3.2 (very old version) with the legacy mapping XML files (so still without annotations). The database is an IBM Informix database.
The database model, as well as the indexes are not generated by Java, but by regular SQL scripts.
I can't change anything about the versions of Hibernate or the database either, unfortunately. Also I cannot influence how the index was created. This all happens outside the application.
Idea
The only idea I had was to first change all records that need to be changed to fictitious values and then set the correct values again. But that would mean that two update statements are triggered per record, right?
Something like this:
dataset1.setFieldA("XXX");
dataset2.setFieldA("YYY");
session.saveOrUpdate(dataset1);
session.saveOrUpdate(dataset2);
dataset1.setFieldA("B");
dataset2.setFieldA("A");
session.saveOrUpdate(dataset1);
session.saveOrUpdate(dataset2);
tx.commit();
However, I am not even sure if I need to commit the transaction. Maybe a flush or something similar is enough, but the solution is not really nice. I can kind of understand the problem, but I would also have thought that this would be legitimate within a transaction then - only at the end of the transaction the constraints have to be correct.
Many greetings and thanks for your help,
Hauke
You have two options. Either you configure the unique constraint to be "deferrable" and also mark it as "initially deferred" so that the constraint is only enforced at transaction commit time, or you delete and re-insert the entries.
I would suggest you to use the first option if your database supports this. You didn't specify which database you are using, but PostgreSQL supports it. You'd only have to run alter table test_table alter constraint your_unique_constraint deferrable initially deferred.
I have a table like (id INTEGER, sometext VARCHAR(255), ....) with id as the primary key and a UNIQUE constraint on sometext. It gets used in a web server, where a request needs to find the id corresponding to a given sometext if it exists, otherwise a new row gets inserted.
This is the only operation on this table. There are no updates and no other operations on this table. Its sole purpose is to persistently number of encountered values of sometext. This means that I can't drop the id and use sometext as the PK.
I do the following:
First, I consult my own cache in order to avoid any DB access. Nearly always, this works and I'm done.
Otherwise, I use Hibernate Criteria to find the row by sometext. Usually, this works and again, I'm done.
Otherwise, I need to insert a new row.
This works fine, except when there are two overlapping requests with the same sometext. Then an ConstraintViolationException results. I'd need something like INSERT IGNORE or INSERT ... ON DUPLICATE KEY UPDATE (Mysql syntax) or MERGE (Firebird syntax).
I wonder what are the options?
AFAIK Hibernate merge works on PK only, so it's inappropriate. I guess, a native query might help or not, as it may or may not be committed when the second INSERT takes place.
Just let the database handle the concurrency. Start a secondary transaction purely for inserting the new row. if it fails with a ConstraintViolationException, just roll that transaction back and read the new row.
Not sure this scales well if the likelihood of a duplicate is high, a lot of extra work if some percent (depends on database) of transactions have to fail the insert and then reselect.
A secondary transaction minimizes the length of time the transaction to add the new text takes, assuming the database supports it correctly, it might be possible for the thread 1 transaction to cause the thread 2 select/insert to hang until the thread 1 transaction is committed or rolled back. Overall database design might also affect transaction throughput.
I don't necessarily question why sometext can't be a PK, wondering why you need to break it out at all. Of course, large volumes might substantially save space if sometext records are large, it almost seems like you're trying to emulate a lucene index to give you a complete list of text values.
I am developing an application that support multiple databases and hibernate fulfilling that requirement.
Now the issue is in primary auto generate key. some databases support auto increment and some required sequence for increment the identity. to solve this issue the use the following strategy
strategy = GenerationType.TABLE (javax.persistence)
This is fulfilling my requirement.
in this post, a user comment that
its always better to use increment or sequence instead of table generation if you need the ids to be in sequence
If I use the auto increment or sequence, it means it required some changes # annotation level, when I move one database to another (extra burden)
update me , it is really better to use increment or sequence instead of table generation or it is just a statement?
Auto increment drawbacks: You don't know the id until the transaction has committed (which can be a problem in JPA, since some EntityManager operations rely on Id's). Not all databases support auto increment fields.
Sequence drawbacks: Not all databases have sequences.
Table drawbacks: Id's are not necessarily consecutive.
Since it is very unlikely that you run out of Id's, using Table generation remains a good option. You can even tweak the id allocation size in order to use more consecutive id's (default size is 50):
#TableGenerator(name="myGenerator", allocationSize=1)
However, this will result in at least two queries to the id allocation table for each insert: one to step the value of the latest id, and one to retrieve it.
I have 5 MySQL InnoDB tables: Test,InputInvoice,InputLine,OutputInvoice,OutputLine and each is mapped and functioning in Hibernate. I have played with using StatelessSession/Session, and JDBC batch size. I have removed any generator classes to let MySQL handle the id generation- but it is still performing quite slow.
Each of those tables is represented in a java class, and mapped in hibernate accordingly. Currently when it comes time to write the data out, I loop through the objects and do a session.save(Object) or session.insert(Object) if I'm using StatelessSession. I also do a flush and clear (when using Session) when my line count reaches the max jdbc batch size (50).
Would it be faster if I had these in a 'parent' class that held the objects and did a session.save(master) instead of each one?
If I had them in a master/container class, how would I map that in hibernate to reflect the relationship? The container class wouldn't actually be a table of it's own, but a relationship all based on two indexes run_id (int) and line (int).
Another direction would be: How do I get Hibernate to do a multi-row insert?
The ID generation strategy is critical for batch insertion in Hibernate. In particular, IDENTITY generation will usually not work (note that AUTO typically maps to IDENTITY as well). This is because during batch insert Hibernate has a flag called "requiresImmediateIdAccess" that says whether or not generated IDs are immediately required or not; if so, batch processing is disabled.
You can easily spot this in the DEBUG-level logs when it says "executing identity-insert immediately" - this means it has skipped batch processing because it was told that generated IDs are required immediately after insertion.
Generation strategies that typically do work are TABLE and SEQUENCE, because Hibernate can pre-generate the IDs, thereby allowing for batch insertion.
A quick way to spot whether your batch insertion works is to activate DEBUG-level logs as BatchingBatcher will explicitly tell you the batch size it's executing ("Executing batch size: " + batchSize ).
Additionally, the following properties are important for achieving batch insertion. I daren't say they're required as I'm not enough of a Hibernate-expert to do so - perhaps it's just my particular configuration - but in my experience they were needed nonetheless:
hibernate.order_inserts = true
hibernate.order_updates = true
These properties are pretty poorly documented, but I believe what they did was enable for the SQL INSERT and UPDATE statements to be properly grouped for batch execution; I think this might be the multi-row inserts you're after. Don't shoot me if I'm wrong on this, I'm recalling from memory.
I'll also go ahead and assume that you set the following property; if not, this should serve as a reminder:
hibernate.jdbc.batch_size = xx
Where xx is your desired batch size, naturally.
The final solution for me was to use voetsjoeba's response as a jumping off point.
My hibernate config uses the following options:
hibernate.order_inserts = true
hibernate.order_updates = true
I changed from using Session to
StatelessSession
Re-ordered the
Java code to process all the elements
in a batch a table at a time. So all
of table x, then table y, etc.
Removed the <generator> from each
class. Java now creates it and
assigns it to the object
Created logic that allowed me to determine if just
an id was being set and not write
'empty' lines to the database
Finally, I turned on dynamic-insert
for my classes in their hibernate
definitions like so: <class name="com.my.class" table="MY_TABLE" dynamic-insert="true">
How can I generate insert statements like insert into table (sequence.nextval, 'b0) using hibernate?
Hibernate currently selects the sequence.nextval value and only then it uses the value to insert the entry in the table.
Note: I'm not very fond of custom id generators.
Hibernate selects sequence.nextval because it has to return that value back to you (e.g. set ID on your entity). Unless you're doing something very esoteric I strongly doubt this has a big impact on performance (e.g. it's nothing compared to the actual insert). That said, you can look at Hibernate's sequence hi-lo generator - it would only access the sequence once in a while instead of every insert.
If you're using Oracle 10 client or above, check out sequence-identity in the most recent Hibernate versions to do what you're asking for.