I am writing a system that holds a hibernate-managed entity called Voucher that has a field named serialNumber, which holds a unique number for the only-existing valid copy of the voucher instance. There may be old, invalid copies in the database table as well, which means that the database field may not be declared unique.
The operation that saves a new valid voucher instance (that will need a new serial number) is, first of all, synchronized on an appropriate entity. Thereafter the whole procedure is encapsulated in a transaction, the new value is fetched by the JPQL
SELECT MAX(serialNumber) + 1 FROM Voucher
the field gets the result from the query, the instance is thereafter saved, the session is flushed, the transaction is committed and the code finally leaves the synchronized block.
In spite of all this, the database sometimes (if seldom) ends up with Vouchers with duplicate serial numbers.
My question is: Considering that I am rather confident in the synchronization and transaction handling, is there anything more or less obvious that I should know about hibernate that I have missed, or should I go back to yet another debugging session, trying to find anything else causing the problem?
The service running the save process is a web application running on tomcat6 and is managed by Spring's HttpRequestHandlerServlet. The db connections are pooled by C3P0, running a very much default-based configuration.
I'd appreciate any suggestion
Thanks
You can use a MultipleHiLoPerTableGenerator: it generate #Id outside current transaction.
You do not need to debug to find the cause. In a multi-threaded environment it is likely to happen. You are selecting max from your table. So suppose that TX1 reads the max value which is a and inserts a row with serial number a+1; at this stage if any TX2 reads DB, the max value is still a as TX1 has not committed its data. So TX2 may insert a row with serial number of a+1 as well.
To avoid this issue you might decide to change Isolation Level of your database or change the way you are getting serial numbers (it entirely depends on circumstances of your project). But generally I do not recommend changing Isolation Levels as it is too much effort for such an issue.
Related
My application parses a CSV file, about 100 - 200 records per file, does database CRUD features and commits them all in the end.
public static void main(String[] args) {
try {
List<Row> rows = parseCSV();
Transaction t = openHibernateTransaction();
//doCrudStuff INSERTS some records in the database
for (Row r : rows)
doCrudStuff(r);
t.commit();
} catch (Exception ex) {
//log error
if (t != null) t.rollback();
}
}
When I was about to doCrudStuff on the 78th Row, I suddenly got this error:
Data truncation: Data too long for column 'SOME_COLUMN_UNRELATED_TO_78TH_ROW' at row 1.
I read the stack trace and the error was triggered by a SELECT statement to a table unrelated to the 78th row. Huh, weird right?
I checked the CSV file and found that on the 77th row, some field was indeed too long for the database column. But Hibernate didn't catch the error during the INSERT of the 77th row and threw the error when I was doing a SELECT for the 78th row. Why is it delayed?
Does Hibernate really behave like this? I commit only once at the very end because I want to make sure that everything succeeded, otherwise, rollback.
Actually not really if you take into account what hibernate is doing behind the scenes for you.
Hibernate does not actually execute your write statements (update,insert) until it needs to, thus in your case I assume your "doCrudStuff" executes a select and then executes an update or insert right?
This is what is happening:
You tell hibernate to execute "UPDATE my_table SET something = value;" which causes hibernate to cache this in the session and return right away.
You may do more writes, which Hibernate will likely continue to cache in the session until either 1) you manually flush the session or 2) hibernate decides its time to flush the session.
You then execute a SELECT statement to get some data from the database. At this point, the state of the database is not consistent with the state of the session since there is data waiting to be written. Hibernate will then start executing your writes to catch up the database state to the session state.
If one of the writes fails, when you look at the stack trace, you will actually not be able to map it to the exact point you asked (this a important distinction between an ORM and using JDBC directly) hibernate to execute the write, but rather it will fail when the session had to be flushed (either manually or automatically).
At the expense of performance, you can always tell hibernate to flush your session after your writes. But as long as you are aware of the lifecycle of the hibernate session and how it caches those queries, you should be able to more easily debug these.
By the way, if you want to see this is practice, you can tell hibernate to log the queries.
Hope this helps!
EDIT: I understand how this can be confusing, let me try to augment my answer by highlighting the difference between a Transaction and a Hibernate Session.
A transaction is a sequence of atomic operations performed on the database. Until a transaction is committed, it is typically not visible by other clients of the database. The state of the transaction is fully managed by the database - i.e. you can start a transaction and send you operations to the database, and it will ensure consistency of these operations within the transaction.
A Hibernate Session is a session managed by Hibernate, outside the database, mostly for performance reasons. Hibernate will queue operations whenever possible to improve performance, and only go to the database when it deems necessary.
Imagine you have 50 marbles that are all different colors and need to be stored in their correct buckets, but these buckets are 100 feet away and you need someone to correctly sort them inside their rightful buckets. You ask your friend Bob to store the blue marbles, then the red marbles then the green marbles. Your friend is smart and anticipates that you will ask him to make multiple round trips, so he ways until your last request to walk those 100 feet to store them in their proper buckets, which is much faster than making 3 round trips.
Now imagine that you ask him to store the yellow marbles, and then you ask him how many total marbles you have across all the buckets. He is then forced to go to the buckets (since he needs to gather information), store the yellow marbles (so he can accurately count all buckets) before he can give you an answer. This is in essence what hibernate is doing with your data.
How in your case, imagine there is NO yellow bucket. Bob unfortunately is not going to find that out until he tries to answer your query into how many total marbles you have - thus in the sequence of events, he will come back to you to tell you he couldn't complete your request only after he tries to count the marbles (as opposed to when you asked him to store the yellow ones, which is what he was actually unable to do).
Hope this helps clear things a little bit!
I have a table like (id INTEGER, sometext VARCHAR(255), ....) with id as the primary key and a UNIQUE constraint on sometext. It gets used in a web server, where a request needs to find the id corresponding to a given sometext if it exists, otherwise a new row gets inserted.
This is the only operation on this table. There are no updates and no other operations on this table. Its sole purpose is to persistently number of encountered values of sometext. This means that I can't drop the id and use sometext as the PK.
I do the following:
First, I consult my own cache in order to avoid any DB access. Nearly always, this works and I'm done.
Otherwise, I use Hibernate Criteria to find the row by sometext. Usually, this works and again, I'm done.
Otherwise, I need to insert a new row.
This works fine, except when there are two overlapping requests with the same sometext. Then an ConstraintViolationException results. I'd need something like INSERT IGNORE or INSERT ... ON DUPLICATE KEY UPDATE (Mysql syntax) or MERGE (Firebird syntax).
I wonder what are the options?
AFAIK Hibernate merge works on PK only, so it's inappropriate. I guess, a native query might help or not, as it may or may not be committed when the second INSERT takes place.
Just let the database handle the concurrency. Start a secondary transaction purely for inserting the new row. if it fails with a ConstraintViolationException, just roll that transaction back and read the new row.
Not sure this scales well if the likelihood of a duplicate is high, a lot of extra work if some percent (depends on database) of transactions have to fail the insert and then reselect.
A secondary transaction minimizes the length of time the transaction to add the new text takes, assuming the database supports it correctly, it might be possible for the thread 1 transaction to cause the thread 2 select/insert to hang until the thread 1 transaction is committed or rolled back. Overall database design might also affect transaction throughput.
I don't necessarily question why sometext can't be a PK, wondering why you need to break it out at all. Of course, large volumes might substantially save space if sometext records are large, it almost seems like you're trying to emulate a lucene index to give you a complete list of text values.
We are using spring and hibernate for an web application:
The application has a shopping cart where user can place items in it. in order to hold the items to be viewed between different login's the item values in the shopping cart are stored in tables. when submitting the shopping cart the items will be saved into different table were we need to generate the order number.
When we insert the values into the table to get the order number, we use to get the max order number and add +1 to it. we are using spring transaction manager and hibernate, in the code flow we get the order number and update the hibernate object to hold the order num value. when i debug, i noticed that only when the complete transaction is issued the order number entity bean is being inserted.
Issue here is when we two request is being submitted to the server at the same time, the same order number is being used, and only one request data is getting inserted. could not insert the other request value which is again a unique one.
The order num in the table is a unique one.
i noticed when debugging the persistant layer is not getting inserted into the database even after issuing session flush
session.flush()
its just updating the memory and inserting the data to db only at the end of the spring transaction . i tried explicitly issuing a commit to transaction
session.getTransaction().commit();
this inserted the values into the database immediately, but on further code flow displayed message that could not start transaction.
Any help is highly appreciated.
Added:
Oracle database i used.
There is a sequence number which is unique for that table and also the order number maps to it.
follow these steps :- ,
1) Create a service method with propagation REQUIRES_NEW in different service class .
2)Move your code (whatever code you want to flush in to db ) in this new method .
3)Call this method from existing api (Because of proxy in spring, we have to call this new service method from different class otherwise REQUIRES_NEW will not work which make sure your flushing data ).
I would set the order number with a trigger which will run in the same transaction with the shopping cart insert one.
After you save the shopping cart, to see the updated order count, you'll have to call:
session.refresh(cart);
The count shouldn't be managed by Hibernate (insertable/updatable = false or #Transient).
Your first problem is that of serial access around the number generation when multiple thread are executing the same logic. If you could use Oracle sequences this would have been automatically taken care of at the database level as the sequences
are guranteed to return unique values any number of times they are called. However since this needs to be now managed at server side, you would need to
use synchronization mechanism around your number generation logic ( select max and increment by one) across the transaction boundary. You can make the Service
method synchronized ( your service class would be singleton and Spring managed) and declare the transaction boundary around it. However please note that this would be have performance implications and is usually bad for
scalability.
Another option could be variation of this - store the id to be allocated in a seperate table with one column "currentVal" and use pessimistic lock
for getting the next number. This way, the main table would not have any big lock. This way a lock would be held for the sequence generator code for the time the main entity creation transaction is complete. The main idea behind these techniques is to serialize
access to the sequence generator and hold the lock till the main entity transaction commits. Also delay the number generator as late as possible.
The solution suggested by #Vlad is an good one if using triggers is fine in your design.
Regarding your question around the flush behaviour, the SQL is sent to the database at flush call, however the data is not committed until the transaction is committed declaratively or a manual commit is called. The transaction can however see the data it purposes to change but not other transactions depending upon the isolation nature of transaction.
I hope someone can clarify the below scenerio for me.
From what I understand, when you request a 'row' from hibernate, for example:
User user = UserDao.get(1);
I know have the user with id=1 in memory.
In a web application, if 2 web pages request and load the user at the same time, and then both update a property on the user's object, what will happend? e.g.:
user.pageViews += 1; // the value is current 10 before the increment
UserDao.update(user);
Will this use the value that is in-memory (both requests have the value 10), or will it use the value in the database?
You must use two hibernate sessions for the two users. This means there are two instances of the object in the memory. If you use only one hibernate session (and so one instance of the object in memory), then the result is unpredictable.
In the case of a concurrent update the second update wins. The value of the first update is overwritten by the second update. To avoid the loss of the first update you normally use a version column (see the hibernate doc), and the second update then gets an error which you can catch and react on it (for example with an error message "Your record was modified in meantime. Please reload." which allows the second user to redo his modification on the modified record, to ensure his modif does not get lost.
in the case of a page view counter, like in your example, as a different solution you could write a synchronized methods which counts the page views sequentially.
By default the in memory value is used for the update.
In the following I assume you want to implement an automatic page view counter, not to modify the User in a web user interface. If you want this take a look at Hibernate optimistic locking.
So, supposing you need 100% accuracy when counting the page views, you can lock your User entity while you modify their pageView value to obtain exclusivity on the table row:
Session session = ...
Transaction tx = ...
session.lock(user, LockMode.UPGRADE);
user.increasePageViews();
tx.commit();
session.close();
The LockMode.UPGRADE will translate in a SELECT ... FOR UPDATE in your database so be careful to maintain the lock as little as possible to not impact application scalability.
I have 1M rows in a mysql table and I am java persistence api when I execute following code then I get java heap error:
int counter = 0;
while (counter < 1000000) {
java.util.Collection<MyEntityClass> data = myQuery.setFirstResult(counter)
.setMaxResults(1000).getResultList();
for(MyEntityClass obj : data){
System.out.println(obj);
}
counter += 1000;
}
I'd wonder if JTable is really hanging onto all those old references when you click "next". I don't believe it's a persistence problem. Whatever backing data structure you have behind the JTable, I'd make sure that I cleared it before adding the next batch of records. That way the old values can be GC'd.
Your JTable shouldn't have a ResultSet. It'd be better to have a persistence tier that hid such details from clients. Make the query for a batch of values (not the entire data set), load it from the ResultSet into a data structure, and close the ResultSet and Statement in a finally block. You need to close those resources in the scope of the method in which they were created or you're asking for trouble.
The problem is almost certainly that your resultSet object is caching the entire result set, which will eat up a lot of memory for such a large query.
Rather than resetting the index on the resultSet as you do at present - which doesn't clear the cached result, I would suggest you write a query that retrieves the appropriate rows for the given page, and execute that each time the page changes. Thow away the old result set each time to ensure you're not caching anything.
Depending on the database you are using, you would either use the rownum pseudo-column (Oracle), the row_number() (DB2, MSSQL) function or the limit x offset y syntax (MySql).
Is this a Java EE or Java SE application?
How are you handling your entity
manager?
The entity manager is typically associated with a context. During a transaction every entity that you recover is going to be placed in it, and it will be a cache for all entities, when the transaction commits, JPA will search for modifications in the context and commit the changes to the database.
This implies that if you recover 1 million rows you will have 1 million entities in your context, and they will not be garbage collectable until you close your entity manager.
Since you are referring to a JTable I can only assume this is a JSE application. In this type of application you are in total control of the context. In this type of application there is a one-to-one relationship between context and the entity manager (which is not always the case in Java EE environment).
This implies that you can either create an entity manager per request (i.e. transaction or conversation) or an entity manager for the entire life of the application.
If you are using the second approach, you context is never garbage collected, and the more objects you read from the database the bigger it becomes, until you may eventually reach a memory problem like the one you describe.
I am not saying this is the cause of your problem, but it could certainly be a good lead on finding the root cause, don't you think?
Looks like your resultSet is not subject for GC in his particular case. Inspect your code and see, where the link to this resultSet really goes so that memory leak occurs.