Hibernate Select on merge() seems useless

Hibernate Select on merge() seems useless - java

I have a single table i am trying to understand the logic behind session.merge but i think is in somehow useless i will try to explain more with some code.
public static void main(String[] args)
{
final Merge clazz = new Merge();
Ragonvalia ragonvalia = clazz.load();//LOADED FROM DATABASE...
System.out.println("ORIGINAL: "+ragonvalia);
//Prints c02=1953
clazz.session.evict(ragonvalia);//WE EVICT HERE FOR FORCE MERGE RELOAD FROM DB
//HERE I MAKE SOME MODIFICATIONS TO THE RECORD IN THE DB DIRECTLY.....
try{Thread.sleep(20000);}catch(final Exception e){e.printStackTrace();}
//now c02=2000
final Ragonvalia merge = ragonvalia = (Ragonvalia)clazz.session.merge(ragonvalia);//MERGE IN FACT THE SELECT IS THROWN
System.out.println("MERGING");
System.out.println("merge: "+merge);
System.out.println("ragonvalia: "+ragonvalia);
System.out.println(merge.toString().equals(ragonvalia.toString()));//PRINT EQUALS
ragonvalia.setC01("PUEBLO LINDO");//I MODIFIY THE C01 FIELD
System.out.println(ragonvalia);
final Transaction tx = clazz.session.beginTransaction();
clazz.session.update(merge);//WE UPDATE
tx.commit();
//In this point i can see the c02 was reset again to 1953
clazz.shutDown();
}
Yep i know that merge is using for detached objects and all that stuff but what really are behind the select i just thought some things.
If when i retrieve the record from the first time thing the field c02=1953 latter was changed to c02=2000 i just though when the merge was made they would keep the new field already changed c02=2000 and if i do not modify the field in my session they would replace the c02 from 1953 which was the original to 2000 in the update to dont hurts anybody job when the keep the 1953 and updates the field as 1953 and 1953 replaces the 2000 in the database of course the job from the other person is lost.
I have read some stuff over the internet and i see something like this Essentially, if you do not have a version or timestamp field, Hibernate must check the existing record before updating it so that concurrent modifications do not occur. You would not want a record updated that someone else modified after you read it. There are a couple of solutions, outlined in the link above. But it makes life much easier if can add a version field on each table. Sounds great but before updating it so that concurrent modifications do not occur this is not happening Hibernate is just updating the fields i have in my class even when they are not the same in the currently DB record.
Hibernate must check the existing record before updating checking for what what hibernates checks?
In fact i am not using any version in my Models but seems the merge is only works to check that the records exists in the database.
I know this question is somehow simple or duplicate but i just cant see the logic or the benefits of firing a select.
Resume
After the merge Hibernate is updating all the properties even those whose unmodified i dont know why is this i just though that hibernate would update only the modified to gain performance, and the values of those properties are the same when the clazz was loaded for 1 time or modified by hand i think the merge was useless.
update
ragonvalia
set
.....
.....
.....
c01=?,
c02=?,HERE IS 1953 EVEN WHEN THE MERGE WAS FIRED THE VALUE IN THE DB WAS 2000
c03=?
where
ID=?

Your question mixes what appears to be 2 concerns.
Dynamic Update Statements
Hibernate has had support for #DynamicUpdate since 4.1.
For updating, should this entity use dynamic sql generation where only changed
columns get referenced in the prepared sql statement.
Note, for re-attachment of detached entities this is not possible without the
#SelectBeforeUpdate annotation being used.
This simply means that within the bounds of an open session, any entity attached to the session flagged with #DynamicUpdate will track field level changes and only issue DDL statements that include only the altered fields.
Should your entity be deattached from the session and you issue a merge, the #SelectBeforeUpdate annotation forces hibernate to refresh the entity from the database, attach it to the session and then determine dirty attributes in order to write the DDL statement with only altered fields.
It's worth pointing out that this will not guard you against concurrent updates to the same database records in a highly concurrent environment. This is simply a means to minimize the DDL statement for legacy or wide tables where a majority of the columns aren't changed.
Concurrent Changes
In order to deal with concurrent operations on the same data, you can approach this using two types of locking mechanics.
Pessimistic
In this situation, you would want to apply a lock at read time which basically forces the database to prevent any other transaction from reading/altering that row until the lock is released.
Since this type of locking can have severe performance implications, particularly on a highly concurrent table, it's generally not preferred. Most modern databases will reach a point of row level locks and eventually escalate them to the data page or worse the table; causing all other sessions to block until the locks are released.
public void alterEntityWithLockExample() {
// Read row, apply a row level update/write lock.
Entity entity = entityManager.find(Entity.class, 1L, LockModeType.PESSIMISTIC_WRITE);
// update entity and save it.
entity.setField(someValue);
entityManager.merge(entity);
}
It is probably worth noting that had any other session queried the entity with id 1 prior to the write lock being applied in the code above, the two sessions would still step on one another. All operations on Entity that would result in state changes would need to query using a lock in order to prevent concurrency issues.
Opimistic
This is a much more desired approach in a highly concurrent environment. This is where you'd want to annotate a new field with #Version on your entity.
This has the benefit that you can query an entity, leave it detached, reattach it later, perhaps in another request or thread, alter it, and merge the changes. If the record had changed since it was originally fetched at the start of your process, an OptimisticLockException will be thrown, allowing your business case to handle that scenario however you need.
In a web application as an example, you may want to inform the user to requery the page, make their changes, and resave the form to continue.
Pessimisitic locking is a proactive locking mechanic where-as optimistic is more reactionary.

Related

How to optimize one big insert with hibernate

For my website, I'm creating a book database. I have a catalog, with a root node, each node have subnodes, each subnode has documents, each document has versions, and each version is made of several paragraphs.
In order to create this database the fastest possible, I'm first creating the entire tree model, in memory, and then I call session.save(rootNode)
This single save will populate my entire database (at the end when I'm doing a mysqldump on the database it weights 1Go)
The save coasts a lot (more than an hour), and since the database grows with new books and new versions of existing books, it coasts more and more. I would like to optimize this save.
I've tried to increase the batch_size. But it changes nothing since it's a unique save. When I mysqldump a script, and I insert it back into mysql, the operation coast 2 minutes or less.
And when I'm doing a "htop" on the ubuntu machine, I can see the mysql is only using 2 or 3 % CPU. Which means that it's hibernate who's slow.
If someone could give me possible techniques that I could try, or possible leads, it would be great... I already know some of the reasons, why it takes time. If someone wants to discuss it with me, thanks for his help.
Here are some of my problems (I think): For exemple, I have self assigned ids for most of my entities. Because of that, hibernate is checking each time if the line exists before it saves it. I don't need this because, the batch I'm executing, is executed only one, when I create the databse from scratch. The best would be to tell hibernate to ignore the primaryKey rules (like mysqldump does) and reenabeling the key checking once the database has been created. It's just a one shot batch, to initialize my database.
Second problem would be again about the foreign keys. Hibernate inserts lines with null values, then, makes an update in order to make foreign keys work.
About using another technology : I would like to make this batch work with hibernate because after, all my website is working very well with hibernate, and if it's hibernate who creates the databse, I'm sure the naming rules, and every foreign keys will be well created.
Finally, it's a readonly database. (I have a user database, which is using innodb, where I do updates, and insert while my website is running, but the document database is readonly and mYisam)
Here is a exemple of what I'm doing
TreeNode rootNode = new TreeNode();
recursiveLoadSubNodes(rootNode); // This method creates my big tree, in memory only.
hibernateSession.beginTrasaction();
hibernateSession.save(rootNode); // during more than an hour, it saves 1Go of datas : hundreads of sub treeNodes, thousands of documents, tens of thousands paragraphs.
hibernateSession.getTransaction().commit();

It's a little hard to guess what could be the problem here but I could think of 3 things:
Increasing batch_size only might not help because - depending on your model - inserts might be interleaved (i.e. A B A B ...). You can allow Hibernate to reorder inserts and updates so that they can be batched (i.e. A A ... B B ...).Depending on your model this might not work because the inserts might not be batchable. The necessary properties would be hibernate.order_inserts and hibernate.order_updates and a blog post that describes the situation can be found here: https://vladmihalcea.com/how-to-batch-insert-and-update-statements-with-hibernate/
If the entities don't already exist (which seems to be the case) then the problem might be the first level cache. This cache will cause Hibernate to get slower and slower because each time it wants to flush changes it will check all entries in the cache by iterating over them and calling equals() (or something similar). As you can see that will take longer with each new entity that's created.To Fix that you could either try to disable the first level cache (I'd have to look up whether that's possible for write operations and how this is done - or you do that :) ) or try to keep the cache small, e.g. by inserting the books yourself and evicting each book from the first level cache after the insert (you could also go deeper and do that on the document or paragraph level).
It might not actually be Hibernate (or at least not alone) but your DB as well. Note that restoring dumps often removes/disables constraint checks and indices along with other optimizations so comparing that with Hibernate isn't that useful. What you'd need to do is create a bunch of insert statements and then just execute those - ideally via a JDBC batch - on an empty database but with all constraints and indices enabled. That would provide a more accurate benchmark.
Assuming that comparison shows that the plain SQL insert isn't that much faster then you could decide to either keep what you have so far or refactor your batch insert to temporarily disable (or remove and re-create) constraints and indices.
Alternatively you could try not to use Hibernate at all or change your model - if that's possible given your requirements which I don't know. That means you could try to generate and execute the SQL queries yourself, use a NoSQL database or NoSQL storage in a SQL database that supports it - like Postgres.
We're doing something similar, i.e. we have Hibernate entities that contain some complex data which is stored in a JSONB column. Hibernate can read and write that column via a custom usertype but it can't filter (Postgres would support that but we didn't manage to enable the necessary syntax in Hibernate).

Audit Using Hibernate Envers

I am using hibernate envers for making history of my data, it's working fine as well. The problem here is, it's creating duplicate data in history table i.e. creating data in history table whether there is any change in audited table or not. I want only changed fields stored in my history table. I am new to hibernate envers. What can I do?

If I understand your question correctly, Envers doesn't work that way, at least not out of the box.
Envers is a commit-snapshot auditing solution where just before commit, it examines audited entity state and determines whether any attributes have been modified or not and records a snapshot of all audited fields of that entity at that point in time. This means that the only time an audit entry isn't created is when no attributes have been modified.
But it also uses the snapshot approach because it fits really well with the Query API.
Consider the inefficiency that would occur if a query to find an entity at a given revision had to read all rows from that revision back to the beginning of time, iterating each row and merging the column state captured to just instantiate a single row result-set.
With the snapshot approach, it boils down to the following query, no loops or iterative work.
SELECT e FROM AuditedEntity e WHERE e.revisionNumber = :revisionNumber
This is far more efficient from a I/O perspective both with the database reading the data pages and the network for streaming a single row result-set rather than multi-row result-set to the client.
I'd say in this case, the saying "space is cheap" really holds true when you compare that against the cost and inefficiencies your application would face doing it any other way.
If this is something you'd like Envers to support, perhaps via some user configured strategy then you're welcomed to log a new feature request in JIRA for hibernate-envers and I can take a look at its feasibility.

I had similar problem.
In my case the error was that audited field had higher precision than the database field. Please see my reply to another thread: https://stackoverflow.com/a/65844949/13381019

Hibernate INSERT, delayed SQL error (DATA TRUNCATION)

My application parses a CSV file, about 100 - 200 records per file, does database CRUD features and commits them all in the end.
public static void main(String[] args) {
try {
List<Row> rows = parseCSV();
Transaction t = openHibernateTransaction();
//doCrudStuff INSERTS some records in the database
for (Row r : rows)
doCrudStuff(r);
t.commit();
} catch (Exception ex) {
//log error
if (t != null) t.rollback();
}
}
When I was about to doCrudStuff on the 78th Row, I suddenly got this error:
Data truncation: Data too long for column 'SOME_COLUMN_UNRELATED_TO_78TH_ROW' at row 1.
I read the stack trace and the error was triggered by a SELECT statement to a table unrelated to the 78th row. Huh, weird right?
I checked the CSV file and found that on the 77th row, some field was indeed too long for the database column. But Hibernate didn't catch the error during the INSERT of the 77th row and threw the error when I was doing a SELECT for the 78th row. Why is it delayed?
Does Hibernate really behave like this? I commit only once at the very end because I want to make sure that everything succeeded, otherwise, rollback.

Actually not really if you take into account what hibernate is doing behind the scenes for you.
Hibernate does not actually execute your write statements (update,insert) until it needs to, thus in your case I assume your "doCrudStuff" executes a select and then executes an update or insert right?
This is what is happening:
You tell hibernate to execute "UPDATE my_table SET something = value;" which causes hibernate to cache this in the session and return right away.
You may do more writes, which Hibernate will likely continue to cache in the session until either 1) you manually flush the session or 2) hibernate decides its time to flush the session.
You then execute a SELECT statement to get some data from the database. At this point, the state of the database is not consistent with the state of the session since there is data waiting to be written. Hibernate will then start executing your writes to catch up the database state to the session state.
If one of the writes fails, when you look at the stack trace, you will actually not be able to map it to the exact point you asked (this a important distinction between an ORM and using JDBC directly) hibernate to execute the write, but rather it will fail when the session had to be flushed (either manually or automatically).
At the expense of performance, you can always tell hibernate to flush your session after your writes. But as long as you are aware of the lifecycle of the hibernate session and how it caches those queries, you should be able to more easily debug these.
By the way, if you want to see this is practice, you can tell hibernate to log the queries.
Hope this helps!
EDIT: I understand how this can be confusing, let me try to augment my answer by highlighting the difference between a Transaction and a Hibernate Session.
A transaction is a sequence of atomic operations performed on the database. Until a transaction is committed, it is typically not visible by other clients of the database. The state of the transaction is fully managed by the database - i.e. you can start a transaction and send you operations to the database, and it will ensure consistency of these operations within the transaction.
A Hibernate Session is a session managed by Hibernate, outside the database, mostly for performance reasons. Hibernate will queue operations whenever possible to improve performance, and only go to the database when it deems necessary.
Imagine you have 50 marbles that are all different colors and need to be stored in their correct buckets, but these buckets are 100 feet away and you need someone to correctly sort them inside their rightful buckets. You ask your friend Bob to store the blue marbles, then the red marbles then the green marbles. Your friend is smart and anticipates that you will ask him to make multiple round trips, so he ways until your last request to walk those 100 feet to store them in their proper buckets, which is much faster than making 3 round trips.
Now imagine that you ask him to store the yellow marbles, and then you ask him how many total marbles you have across all the buckets. He is then forced to go to the buckets (since he needs to gather information), store the yellow marbles (so he can accurately count all buckets) before he can give you an answer. This is in essence what hibernate is doing with your data.
How in your case, imagine there is NO yellow bucket. Bob unfortunately is not going to find that out until he tries to answer your query into how many total marbles you have - thus in the sequence of events, he will come back to you to tell you he couldn't complete your request only after he tries to count the marbles (as opposed to when you asked him to store the yellow ones, which is what he was actually unable to do).
Hope this helps clear things a little bit!

Prevent violating of UNIQUE constraint with Hibernate

I have a table like (id INTEGER, sometext VARCHAR(255), ....) with id as the primary key and a UNIQUE constraint on sometext. It gets used in a web server, where a request needs to find the id corresponding to a given sometext if it exists, otherwise a new row gets inserted.
This is the only operation on this table. There are no updates and no other operations on this table. Its sole purpose is to persistently number of encountered values of sometext. This means that I can't drop the id and use sometext as the PK.
I do the following:
First, I consult my own cache in order to avoid any DB access. Nearly always, this works and I'm done.
Otherwise, I use Hibernate Criteria to find the row by sometext. Usually, this works and again, I'm done.
Otherwise, I need to insert a new row.
This works fine, except when there are two overlapping requests with the same sometext. Then an ConstraintViolationException results. I'd need something like INSERT IGNORE or INSERT ... ON DUPLICATE KEY UPDATE (Mysql syntax) or MERGE (Firebird syntax).
I wonder what are the options?
AFAIK Hibernate merge works on PK only, so it's inappropriate. I guess, a native query might help or not, as it may or may not be committed when the second INSERT takes place.

Just let the database handle the concurrency. Start a secondary transaction purely for inserting the new row. if it fails with a ConstraintViolationException, just roll that transaction back and read the new row.

Not sure this scales well if the likelihood of a duplicate is high, a lot of extra work if some percent (depends on database) of transactions have to fail the insert and then reselect.
A secondary transaction minimizes the length of time the transaction to add the new text takes, assuming the database supports it correctly, it might be possible for the thread 1 transaction to cause the thread 2 select/insert to hang until the thread 1 transaction is committed or rolled back. Overall database design might also affect transaction throughput.
I don't necessarily question why sometext can't be a PK, wondering why you need to break it out at all. Of course, large volumes might substantially save space if sometext records are large, it almost seems like you're trying to emulate a lucene index to give you a complete list of text values.

How does hibernate handle collisions?

I hope someone can clarify the below scenerio for me.
From what I understand, when you request a 'row' from hibernate, for example:
User user = UserDao.get(1);
I know have the user with id=1 in memory.
In a web application, if 2 web pages request and load the user at the same time, and then both update a property on the user's object, what will happend? e.g.:
user.pageViews += 1; // the value is current 10 before the increment
UserDao.update(user);
Will this use the value that is in-memory (both requests have the value 10), or will it use the value in the database?

You must use two hibernate sessions for the two users. This means there are two instances of the object in the memory. If you use only one hibernate session (and so one instance of the object in memory), then the result is unpredictable.
In the case of a concurrent update the second update wins. The value of the first update is overwritten by the second update. To avoid the loss of the first update you normally use a version column (see the hibernate doc), and the second update then gets an error which you can catch and react on it (for example with an error message "Your record was modified in meantime. Please reload." which allows the second user to redo his modification on the modified record, to ensure his modif does not get lost.
in the case of a page view counter, like in your example, as a different solution you could write a synchronized methods which counts the page views sequentially.

By default the in memory value is used for the update.
In the following I assume you want to implement an automatic page view counter, not to modify the User in a web user interface. If you want this take a look at Hibernate optimistic locking.
So, supposing you need 100% accuracy when counting the page views, you can lock your User entity while you modify their pageView value to obtain exclusivity on the table row:
Session session = ...
Transaction tx = ...
session.lock(user, LockMode.UPGRADE);
user.increasePageViews();
tx.commit();
session.close();
The LockMode.UPGRADE will translate in a SELECT ... FOR UPDATE in your database so be careful to maintain the lock as little as possible to not impact application scalability.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.