Hibernate INSERT, delayed SQL error (DATA TRUNCATION) - java

My application parses a CSV file, about 100 - 200 records per file, does database CRUD features and commits them all in the end.
public static void main(String[] args) {
try {
List<Row> rows = parseCSV();
Transaction t = openHibernateTransaction();
//doCrudStuff INSERTS some records in the database
for (Row r : rows)
doCrudStuff(r);
t.commit();
} catch (Exception ex) {
//log error
if (t != null) t.rollback();
}
}
When I was about to doCrudStuff on the 78th Row, I suddenly got this error:
Data truncation: Data too long for column 'SOME_COLUMN_UNRELATED_TO_78TH_ROW' at row 1.
I read the stack trace and the error was triggered by a SELECT statement to a table unrelated to the 78th row. Huh, weird right?
I checked the CSV file and found that on the 77th row, some field was indeed too long for the database column. But Hibernate didn't catch the error during the INSERT of the 77th row and threw the error when I was doing a SELECT for the 78th row. Why is it delayed?
Does Hibernate really behave like this? I commit only once at the very end because I want to make sure that everything succeeded, otherwise, rollback.

Actually not really if you take into account what hibernate is doing behind the scenes for you.
Hibernate does not actually execute your write statements (update,insert) until it needs to, thus in your case I assume your "doCrudStuff" executes a select and then executes an update or insert right?
This is what is happening:
You tell hibernate to execute "UPDATE my_table SET something = value;" which causes hibernate to cache this in the session and return right away.
You may do more writes, which Hibernate will likely continue to cache in the session until either 1) you manually flush the session or 2) hibernate decides its time to flush the session.
You then execute a SELECT statement to get some data from the database. At this point, the state of the database is not consistent with the state of the session since there is data waiting to be written. Hibernate will then start executing your writes to catch up the database state to the session state.
If one of the writes fails, when you look at the stack trace, you will actually not be able to map it to the exact point you asked (this a important distinction between an ORM and using JDBC directly) hibernate to execute the write, but rather it will fail when the session had to be flushed (either manually or automatically).
At the expense of performance, you can always tell hibernate to flush your session after your writes. But as long as you are aware of the lifecycle of the hibernate session and how it caches those queries, you should be able to more easily debug these.
By the way, if you want to see this is practice, you can tell hibernate to log the queries.
Hope this helps!
EDIT: I understand how this can be confusing, let me try to augment my answer by highlighting the difference between a Transaction and a Hibernate Session.
A transaction is a sequence of atomic operations performed on the database. Until a transaction is committed, it is typically not visible by other clients of the database. The state of the transaction is fully managed by the database - i.e. you can start a transaction and send you operations to the database, and it will ensure consistency of these operations within the transaction.
A Hibernate Session is a session managed by Hibernate, outside the database, mostly for performance reasons. Hibernate will queue operations whenever possible to improve performance, and only go to the database when it deems necessary.
Imagine you have 50 marbles that are all different colors and need to be stored in their correct buckets, but these buckets are 100 feet away and you need someone to correctly sort them inside their rightful buckets. You ask your friend Bob to store the blue marbles, then the red marbles then the green marbles. Your friend is smart and anticipates that you will ask him to make multiple round trips, so he ways until your last request to walk those 100 feet to store them in their proper buckets, which is much faster than making 3 round trips.
Now imagine that you ask him to store the yellow marbles, and then you ask him how many total marbles you have across all the buckets. He is then forced to go to the buckets (since he needs to gather information), store the yellow marbles (so he can accurately count all buckets) before he can give you an answer. This is in essence what hibernate is doing with your data.
How in your case, imagine there is NO yellow bucket. Bob unfortunately is not going to find that out until he tries to answer your query into how many total marbles you have - thus in the sequence of events, he will come back to you to tell you he couldn't complete your request only after he tries to count the marbles (as opposed to when you asked him to store the yellow ones, which is what he was actually unable to do).
Hope this helps clear things a little bit!

Related

Hibernate Select on merge() seems useless

I have a single table i am trying to understand the logic behind session.merge but i think is in somehow useless i will try to explain more with some code.
public static void main(String[] args)
{
final Merge clazz = new Merge();
Ragonvalia ragonvalia = clazz.load();//LOADED FROM DATABASE...
System.out.println("ORIGINAL: "+ragonvalia);
//Prints c02=1953
clazz.session.evict(ragonvalia);//WE EVICT HERE FOR FORCE MERGE RELOAD FROM DB
//HERE I MAKE SOME MODIFICATIONS TO THE RECORD IN THE DB DIRECTLY.....
try{Thread.sleep(20000);}catch(final Exception e){e.printStackTrace();}
//now c02=2000
final Ragonvalia merge = ragonvalia = (Ragonvalia)clazz.session.merge(ragonvalia);//MERGE IN FACT THE SELECT IS THROWN
System.out.println("MERGING");
System.out.println("merge: "+merge);
System.out.println("ragonvalia: "+ragonvalia);
System.out.println(merge.toString().equals(ragonvalia.toString()));//PRINT EQUALS
ragonvalia.setC01("PUEBLO LINDO");//I MODIFIY THE C01 FIELD
System.out.println(ragonvalia);
final Transaction tx = clazz.session.beginTransaction();
clazz.session.update(merge);//WE UPDATE
tx.commit();
//In this point i can see the c02 was reset again to 1953
clazz.shutDown();
}
Yep i know that merge is using for detached objects and all that stuff but what really are behind the select i just thought some things.
If when i retrieve the record from the first time thing the field c02=1953 latter was changed to c02=2000 i just though when the merge was made they would keep the new field already changed c02=2000 and if i do not modify the field in my session they would replace the c02 from 1953 which was the original to 2000 in the update to dont hurts anybody job when the keep the 1953 and updates the field as 1953 and 1953 replaces the 2000 in the database of course the job from the other person is lost.
I have read some stuff over the internet and i see something like this Essentially, if you do not have a version or timestamp field, Hibernate must check the existing record before updating it so that concurrent modifications do not occur. You would not want a record updated that someone else modified after you read it. There are a couple of solutions, outlined in the link above. But it makes life much easier if can add a version field on each table. Sounds great but before updating it so that concurrent modifications do not occur this is not happening Hibernate is just updating the fields i have in my class even when they are not the same in the currently DB record.
Hibernate must check the existing record before updating checking for what what hibernates checks?
In fact i am not using any version in my Models but seems the merge is only works to check that the records exists in the database.
I know this question is somehow simple or duplicate but i just cant see the logic or the benefits of firing a select.
Resume
After the merge Hibernate is updating all the properties even those whose unmodified i dont know why is this i just though that hibernate would update only the modified to gain performance, and the values of those properties are the same when the clazz was loaded for 1 time or modified by hand i think the merge was useless.
update
ragonvalia
set
.....
.....
.....
c01=?,
c02=?,HERE IS 1953 EVEN WHEN THE MERGE WAS FIRED THE VALUE IN THE DB WAS 2000
c03=?
where
ID=?
Your question mixes what appears to be 2 concerns.
Dynamic Update Statements
Hibernate has had support for #DynamicUpdate since 4.1.
For updating, should this entity use dynamic sql generation where only changed
columns get referenced in the prepared sql statement.
Note, for re-attachment of detached entities this is not possible without the
#SelectBeforeUpdate annotation being used.
This simply means that within the bounds of an open session, any entity attached to the session flagged with #DynamicUpdate will track field level changes and only issue DDL statements that include only the altered fields.
Should your entity be deattached from the session and you issue a merge, the #SelectBeforeUpdate annotation forces hibernate to refresh the entity from the database, attach it to the session and then determine dirty attributes in order to write the DDL statement with only altered fields.
It's worth pointing out that this will not guard you against concurrent updates to the same database records in a highly concurrent environment. This is simply a means to minimize the DDL statement for legacy or wide tables where a majority of the columns aren't changed.
Concurrent Changes
In order to deal with concurrent operations on the same data, you can approach this using two types of locking mechanics.
Pessimistic
In this situation, you would want to apply a lock at read time which basically forces the database to prevent any other transaction from reading/altering that row until the lock is released.
Since this type of locking can have severe performance implications, particularly on a highly concurrent table, it's generally not preferred. Most modern databases will reach a point of row level locks and eventually escalate them to the data page or worse the table; causing all other sessions to block until the locks are released.
public void alterEntityWithLockExample() {
// Read row, apply a row level update/write lock.
Entity entity = entityManager.find(Entity.class, 1L, LockModeType.PESSIMISTIC_WRITE);
// update entity and save it.
entity.setField(someValue);
entityManager.merge(entity);
}
It is probably worth noting that had any other session queried the entity with id 1 prior to the write lock being applied in the code above, the two sessions would still step on one another. All operations on Entity that would result in state changes would need to query using a lock in order to prevent concurrency issues.
Opimistic
This is a much more desired approach in a highly concurrent environment. This is where you'd want to annotate a new field with #Version on your entity.
This has the benefit that you can query an entity, leave it detached, reattach it later, perhaps in another request or thread, alter it, and merge the changes. If the record had changed since it was originally fetched at the start of your process, an OptimisticLockException will be thrown, allowing your business case to handle that scenario however you need.
In a web application as an example, you may want to inform the user to requery the page, make their changes, and resave the form to continue.
Pessimisitic locking is a proactive locking mechanic where-as optimistic is more reactionary.

How to force commit Spring - hibernate transaction safely

We are using spring and hibernate for an web application:
The application has a shopping cart where user can place items in it. in order to hold the items to be viewed between different login's the item values in the shopping cart are stored in tables. when submitting the shopping cart the items will be saved into different table were we need to generate the order number.
When we insert the values into the table to get the order number, we use to get the max order number and add +1 to it. we are using spring transaction manager and hibernate, in the code flow we get the order number and update the hibernate object to hold the order num value. when i debug, i noticed that only when the complete transaction is issued the order number entity bean is being inserted.
Issue here is when we two request is being submitted to the server at the same time, the same order number is being used, and only one request data is getting inserted. could not insert the other request value which is again a unique one.
The order num in the table is a unique one.
i noticed when debugging the persistant layer is not getting inserted into the database even after issuing session flush
session.flush()
its just updating the memory and inserting the data to db only at the end of the spring transaction . i tried explicitly issuing a commit to transaction
session.getTransaction().commit();
this inserted the values into the database immediately, but on further code flow displayed message that could not start transaction.
Any help is highly appreciated.
Added:
Oracle database i used.
There is a sequence number which is unique for that table and also the order number maps to it.
follow these steps :- ,
1) Create a service method with propagation REQUIRES_NEW in different service class .
2)Move your code (whatever code you want to flush in to db ) in this new method .
3)Call this method from existing api (Because of proxy in spring, we have to call this new service method from different class otherwise REQUIRES_NEW will not work which make sure your flushing data ).
I would set the order number with a trigger which will run in the same transaction with the shopping cart insert one.
After you save the shopping cart, to see the updated order count, you'll have to call:
session.refresh(cart);
The count shouldn't be managed by Hibernate (insertable/updatable = false or #Transient).
Your first problem is that of serial access around the number generation when multiple thread are executing the same logic. If you could use Oracle sequences this would have been automatically taken care of at the database level as the sequences
are guranteed to return unique values any number of times they are called. However since this needs to be now managed at server side, you would need to
use synchronization mechanism around your number generation logic ( select max and increment by one) across the transaction boundary. You can make the Service
method synchronized ( your service class would be singleton and Spring managed) and declare the transaction boundary around it. However please note that this would be have performance implications and is usually bad for
scalability.
Another option could be variation of this - store the id to be allocated in a seperate table with one column "currentVal" and use pessimistic lock
for getting the next number. This way, the main table would not have any big lock. This way a lock would be held for the sequence generator code for the time the main entity creation transaction is complete. The main idea behind these techniques is to serialize
access to the sequence generator and hold the lock till the main entity transaction commits. Also delay the number generator as late as possible.
The solution suggested by #Vlad is an good one if using triggers is fine in your design.
Regarding your question around the flush behaviour, the SQL is sent to the database at flush call, however the data is not committed until the transaction is committed declaratively or a manual commit is called. The transaction can however see the data it purposes to change but not other transactions depending upon the isolation nature of transaction.

Ensuring unique serial numbers in a Hibernate session

I am writing a system that holds a hibernate-managed entity called Voucher that has a field named serialNumber, which holds a unique number for the only-existing valid copy of the voucher instance. There may be old, invalid copies in the database table as well, which means that the database field may not be declared unique.
The operation that saves a new valid voucher instance (that will need a new serial number) is, first of all, synchronized on an appropriate entity. Thereafter the whole procedure is encapsulated in a transaction, the new value is fetched by the JPQL
SELECT MAX(serialNumber) + 1 FROM Voucher
the field gets the result from the query, the instance is thereafter saved, the session is flushed, the transaction is committed and the code finally leaves the synchronized block.
In spite of all this, the database sometimes (if seldom) ends up with Vouchers with duplicate serial numbers.
My question is: Considering that I am rather confident in the synchronization and transaction handling, is there anything more or less obvious that I should know about hibernate that I have missed, or should I go back to yet another debugging session, trying to find anything else causing the problem?
The service running the save process is a web application running on tomcat6 and is managed by Spring's HttpRequestHandlerServlet. The db connections are pooled by C3P0, running a very much default-based configuration.
I'd appreciate any suggestion
Thanks
You can use a MultipleHiLoPerTableGenerator: it generate #Id outside current transaction.
You do not need to debug to find the cause. In a multi-threaded environment it is likely to happen. You are selecting max from your table. So suppose that TX1 reads the max value which is a and inserts a row with serial number a+1; at this stage if any TX2 reads DB, the max value is still a as TX1 has not committed its data. So TX2 may insert a row with serial number of a+1 as well.
To avoid this issue you might decide to change Isolation Level of your database or change the way you are getting serial numbers (it entirely depends on circumstances of your project). But generally I do not recommend changing Isolation Levels as it is too much effort for such an issue.

Multiple Prepared Statements or a Batch

My question is very simple and in the title. Google and stack overflow are giving me nothing so I figured it was time to ask a question.
I am currently in the process of making an sql query for when users register to my site. I have ALWAYS only used prepared statements b/c the extra coding in callable statements, and the performance hit of regular statements are both turn offs. However this query is causing me to think of possible alternatives to my previous one size fits all (prepared statements) ways.
This query has a total of 4 round trips to the database. The steps are
Insert a user into the database, get back the generated key (their user id) within a result set.
Take the user id and insert a row into the album table. Get back a generated key (album id)
Take the album id and insert a row into the images table. Get back a generated key (image id)
Take the image id and update the user tables current default column with the image id
Aside: For anyone interested in the way I am getting the keys back after my inserts it is with Statement.RETURN_GENERATED_KEYS and you can read a great article about this here - IBM Article
So anyway I'd like to know if the use of 4 round trip (but cacheable) prepared statements is okay or if I should go with batched (but not cacheable) statements?
JDBC batch statements let you reduce the number of roundtrips under a condition that there is no data dependency among the rows that you are inserting or updating. Your scenario fails this condition, because the changes are dependent on each other's data: statements 2 through 4 must pick up an ID from the prior statement 1 through 3.
On the other hand, four round-trips is definitely suboptimal. That is why scenarios like yours call for stored procedures: you can put all this logic into a create_user_proc, and return the user ID back to the caller. All insertions from 1 to 4 would happen inside your SQL code, letting you manage ID dependencies in SQL. You would be able to call this stored procedure in a single roundtrip, which is definitely faster, especially if you process multiple user registrations per minute.
I would advice to write one Stored Proc doing all this four operation and passing the all the required params from application (to stored proc) at once and there in stored proc, you can get the generated keys for resultset
To increase performance and reduce database round trips, I agree with dasblinkenlight and ajduke - stored procedures will achieve this.
But, it this really a performance bottleneck in your application?
How often do users register on your site?
Compare this to how often information is read from these tables (once per page access?)
If information in these tables are being read thousands of times more than being written via new registrations, then it might not be worth going for the stored procedure approach.
Why you might not want to use stored procedures and stick to prepared statements:
not as portable as using prepared statements (a different syntax/language for each database, some simpler databases don't even support them)
will not work with ORM solutions such as JPA* - you mentioned using PreparedStatements directly so this probably does not apply to you, at least not now but it might limit you later on if you wanted to use ORM in the future
*JPA 2.1 might actually support stored procedures, but as of writing it has not yet been released.

java persistance memory leaks

I have 1M rows in a mysql table and I am java persistence api when I execute following code then I get java heap error:
int counter = 0;
while (counter < 1000000) {
java.util.Collection<MyEntityClass> data = myQuery.setFirstResult(counter)
.setMaxResults(1000).getResultList();
for(MyEntityClass obj : data){
System.out.println(obj);
}
counter += 1000;
}
I'd wonder if JTable is really hanging onto all those old references when you click "next". I don't believe it's a persistence problem. Whatever backing data structure you have behind the JTable, I'd make sure that I cleared it before adding the next batch of records. That way the old values can be GC'd.
Your JTable shouldn't have a ResultSet. It'd be better to have a persistence tier that hid such details from clients. Make the query for a batch of values (not the entire data set), load it from the ResultSet into a data structure, and close the ResultSet and Statement in a finally block. You need to close those resources in the scope of the method in which they were created or you're asking for trouble.
The problem is almost certainly that your resultSet object is caching the entire result set, which will eat up a lot of memory for such a large query.
Rather than resetting the index on the resultSet as you do at present - which doesn't clear the cached result, I would suggest you write a query that retrieves the appropriate rows for the given page, and execute that each time the page changes. Thow away the old result set each time to ensure you're not caching anything.
Depending on the database you are using, you would either use the rownum pseudo-column (Oracle), the row_number() (DB2, MSSQL) function or the limit x offset y syntax (MySql).
Is this a Java EE or Java SE application?
How are you handling your entity
manager?
The entity manager is typically associated with a context. During a transaction every entity that you recover is going to be placed in it, and it will be a cache for all entities, when the transaction commits, JPA will search for modifications in the context and commit the changes to the database.
This implies that if you recover 1 million rows you will have 1 million entities in your context, and they will not be garbage collectable until you close your entity manager.
Since you are referring to a JTable I can only assume this is a JSE application. In this type of application you are in total control of the context. In this type of application there is a one-to-one relationship between context and the entity manager (which is not always the case in Java EE environment).
This implies that you can either create an entity manager per request (i.e. transaction or conversation) or an entity manager for the entire life of the application.
If you are using the second approach, you context is never garbage collected, and the more objects you read from the database the bigger it becomes, until you may eventually reach a memory problem like the one you describe.
I am not saying this is the cause of your problem, but it could certainly be a good lead on finding the root cause, don't you think?
Looks like your resultSet is not subject for GC in his particular case. Inspect your code and see, where the link to this resultSet really goes so that memory leak occurs.

Categories