This is probably more a "design" or style question: I have just been considering how complex a Hibernate transaction should or could be. I am working with an application that persists messages to a database using Hibernate.
Building the message POJO involves factoring out one-to-many relationships from the message into their respective persistent objects. For example the message contains a "city" field. The city is extracted from the message, the database searched for an equivalent city object and the resulting object added to the message POJO. All of this is done within a single transaction:
Start transaction
test for duplicate
retrieve city object
setCity(cityObject) in message object
retreive country object
setCountry(countryObject) in message object
persist message object
commit End transaction
In fact the actual transactions are considerably more complex. Is this a reasonable structure or should each task be completed within a single transaction (rather than all tasks in one transaction)? I guess the second question relates to best practice in designing the tasks within a transaction. I understand that some tasks need to be grouped for referential integrity, however this is not always the case.
Whatever you put within your outer transaction boundary, the question is whether you can successfully roll back each action.
Bundle related actions within a boundary, and keep it as simple as possible.
Transactions should be grouped according to the business requirements, not technical complexity. If you have N operations that must succeed or fail together as a unit, then that's what the code should support. It should be more of a business consideration than a technical one.
Multiple transactions only make sense if the DB isn't left in a stupid state between them, because any single transaction could fail. Nested transactions may make sense if any block of activity must be atomic and the entire transaction depends on any of the other atomic units.
Related
I am extracting the following lines from the famous book - Mastering Enterprise JavaBeans™ 3.0.
Concurrent Access and Locking:Concurrent access to data in the database is always protected by transaction isolation, so you need not design additional concurrency controls to protect your
data in your applications if transactions are used appropriately. Unless you make specific provisions, your entities will be protected by container-managed transactions using the isolation levels that are configured for your persistence provider and/or EJB container’s transaction service. However, it is important to understand the concurrency control requirements and semantics of your applications.
Then it talks about Java Transaction API, Container Managed and Bean Managed Transaction, different TransactionAttributes, different Isolation Levels. It also states that -
The Java Persistence specification defines two important features that can be
tuned for entities that are accessed concurrently:
1.Optimistic locking using a version attribute
2.Explicit read and write locks
Ok - I read everything and understood them well. But the question comes in which scenario I need the use all these techniques? If I use Container Managed transaction and it does everything for me why I need to bother about all these details? I know the significance of TransactionAttributes (REQUIRED, REQUIRES_NEW) and know in which cases I need to use them, but what about the others? More specifically -
Why do I need Bean Managed transaction?
Why do we need Read and Write Lock on Entity classes?
Why do we need version attribute?
For Q2 and Q3 - I think Entity classes are not thread safe and hence we need locking over there. But database is managed at the EJB class by the JTA API (as stated in the first para), and then why do we need to manage the Entity classes separately? I know how the Lock and Version works and why they are required. But why they are coming into the picture since JTA is already present?
Can you please provide any answer to them? If you give me some URLs even that will be very highly appreciated.
Many thanks in advance.
You don't need locking because entity classes are not thread-safe. Entities must not be shared between threads, that's all.
Your database comes with ACID guarantees, but that is not always sufficient, and you sometimes nees to explicitely lock rows to get what you need. Imagine the following scenarios:
transaction A reads employee 1 from database
transaction B reads employee 1 from database
transaction A sets employee 1 salary to 3000
transaction B sets employee 1 salary to 4000
transaction A commits
transaction B commits
The end result is that the salary is 4000. The user that started transaction A is completely unaware that even though he set the salary to 3000, another user, concurrently, set it to 4000. Depending on which transaction writes last, the end result is different (and thus unpredictable). That's the kind of situation that can be avoided using optimistic locking.
Next scenario: you want to generate purely sequential invoice numbers, without lost values and without duplicates. You could imagine reading and incrementing a value in the database to do that. But two transactions might both read the same value concurrently, and then incrementing it. You would thus have a duplicate. Using a lock in the table row holding the next number allows avoiding this situation.
EntityManager maintains first level cache for retrieved objects, but if you want to have threadsafe aplication you're creating and closing entityManager for each transaction.
So whats the point of the level 1 cache if those entities are created and closed for every transaction? Or entityManager cache is usable if youre working in single thread?
The point is to have an application that works like you expect it to work, and that wouldn't be slow as hell. Let's take an example:
Order order = em.find(Order.class, 3L);
Customer customer = em.find(Customer.class, 5L);
for (Order o : customer.getOrders()) { // line A
if (o.getId().longValue == 3L) {
o.setComment("hello"); // line B
o.setModifier("John");
}
}
System.out.println(order.getComment)); // line C
for (Order o : customer.getOrders()) { // line D
System.out.println(o.getComment()); // line E
}
At line A, JPA executes a SQL query to load all the orders of the customer.
At line C, what do you expect to be printed? null or "hello"? You expect "hello" to be printed, because the order you modified at line B has the same ID as the one loaded in the first line. That wouldn't be possible without the first-level cache.
At line D, you don't expect the orders to be loaded again from the database, because they have already been loaded at line A. That wouldn't be possible without the first-level cache.
At line E, you expect once again "hello" to be printed for the order 3. That wouldn't be possible without the first-level cache.
At line B, you don't expect an update query to be executed, because there might be many subsequent modifications (like in the next line), to the same entity. So you expect these modifications to be written to the database as late as possible, all in one go, at the end of the transaction. That wouldn't be possible without the first-level cache.
The first level cache serves other purposes. It is basically the context in which JPA places the entities retrieved from the database.
Performance
So, to start stating the obvious, it avoids having to retrieve a record when it has already being retrieved serving as some form of cache during a transaction processing and improving performance. Also, think about lazy loading. How could you implement it without a cache to record entities that have already being lazy loaded?
Cyclic Relationships
This caching purpose is vital to the implementation of appropriate ORM frameworks. In object-oriented languages it is common that the object graph has cyclic relationships. For instance, a Department that has Employee objects and those Employee objects belong to a Department.
Without a context (aka as Unit of Work) it would be difficult to keep track of which records you have already ORMed and you would end up creating new objects, and in a case like this, you may even end up in an infinite loop.
Keep Track of Changes: Commit and Rollback
Also, this context keeps track of the changes you do to the objects so that they can be persisted or rolled back at some later point when the transaction ends. Without a cache like this you would be forced to flush your changes to the database immediately as they happen and then you could not rollback, neither could you optimize the best moment to flush them to the store.
Object Identity
Object identity is also vital in ORM frameworks. That is if you retrieve employee ID 123, then if at some time you need that Employee, you should always get the same object, and not some new Object containing the same data.
This type of cache is not to be shared by multiple threads, if it was so, you would compromise performance and force everyone to pay that penalty even when they could be just fine with a single-threaded solution. Besides the fact that you would end up with a much more complex solution that would be like killing a fly with a bazooka.
That is the reason why if what you need is a shared cache, then you actually need a 2nd-level cache, and there are implementations for that as well.
I've got two lists of entities: One that is the current state of the rows in the DB, the other is the changes that were made to the list. How do I audit the rows that were deleted, added, and the changes made to the entities? My audit table is used by all the entities.
Entity listeners and Callback methods look like a perfect fit, until you notice the sentence that says: A callback method must not invoke EntityManager or Query methods! Because of this restriction, I can collect audits, but I can't persist them to the database :(
My solution has been a complex algorithm to discover the audits.
If the entity is in the change list and has no key, it's an add
If the entity is in the db but not the changes list, it's a delete
If the entity is in both list, recursively compare their fields to find differences to audit (if any)
I collect these and insert them into the DB in the same transaction I merge the changes list. But I hate the fact that I'm writing this by hand. It seems like JPA should be able to do this logic for me.
One solution we've come up with is to use an Entity Listener that posts the audits to a JMS queue. The queue then inserts the audits into the database. But I don't like this solution because I think setting up a JMS queue is a pain. It's currently the best solution we've got though.
I'm using eclipselink (ideally, that's not relevant) and have found these two things that look helpful but the JMS queue is a better solution than them:
http://wiki.eclipse.org/EclipseLink/FAQ/JPA#How_to_access_what_changed_in_an_object_or_transaction.3F This looks really difficult to use. You search for the fields by a string. So if I refactor my entity and forget to update this, it'll throw a runtime error.
http://wiki.eclipse.org/EclipseLink/Examples/JPA/History This isn't consistent with the way we currently audit. It expects a special entity_history table.
The EntityListener looks like a good approach since you are able to collect the audit information.
Have you tried persisting the information in a different transaction than the one persisting the changes? perhaps obtaining a reference to a Stateless EJB (assuming you are using EJBs) and using methods marked with #TransactionAttribute(TransactionAttributeType.REQUIRES_NEW). In this way the transaction persisting the original changes is put on hold while the transaction of the audit completes. Note that you will not be able to access the updated information in this separate audit transaction, since the original one has not committed yet.
I'm using JDO to access Datastore entities. I'm currently running into issues because different processes access the same entities in parallel and I'm unsure how to go around solving this.
I have entities containing values and calculated values: (key, value1, value2, value3, calculated)
The calculation happens in a separate task queue.
The user can edit the values at any time.
If the values are updated, a new task is pushed to the queue that overwrite the old calculated value.
The problem I currently have is in the following scenario:
User creates entity
Task is started
User notices an error in his initial entry and quickly updates the entity
Task finishes based on the old data (from step 1) and overwrites the entire entity, also removing the newly entered values (from step 3)
User is not happy
So my questions:
Can I make the task fail on update in step 4? Wrapping the task in a transaction does not seem to solve this issue for all cases due to eventual consistency (or, quite possibly, my understanding of datastore transactions is just wrong)
Is using the low-level setProperty method the only way to update a single field of an entity and will this solve my problem?
If none of the above, what's the best way to deal with a use case like this
background:
At the moment, I don't mind trading performance for consistency. I will care about performance later.
This was my first AppEngine application, and because it was a learning process, it does not use some of the best practices. I'm well aware that, in hindsight, I should have thought longer and harder about my data schema. For instance, none of my entities use ancestor relationships where they would be appropriate. I come from a relational background and it shows.
I am planning a major refactoring, probably moving to Objectify, but in the meantime I have a few urgent issues that need to be solved ASAP. And I'd like to first fully understand the Datastore.
Obviously JDO comes with optimistic concurrency checking (should the user enable it) for transactions, which would prevent/reduce the chance of such things. Optimistic concurrency is equally applicable with relational datastores, so you likely know what it does.
Google's JDO plugin uses the low-level API setProperty() method obviously. The log even tells you what low level calls are made (in terms of PUT and GET). Moving to some other API will not on its own solve such problems.
Whenever you need to handle write conflicts in GAE, you almost always need transactions. However, it's not just as simple as "use a transaction":
First of all, make sure each logical unit of work can be defined in a transaction. There are limits to transactions; no queries without ancestors, only a certain number of entity groups can be accessed. You might find you need to do some extra work prior to the transaction starting (ie, lookup keys of entities that will participate in the transaction).
Make sure each unit of work is idempotent. This is critical. Some units of work are automatically idempotent, for example "set my email address to xyz". Some units of work are not automatically idempotent, for example "move $5 from account A to account B". You can make transactions idempotent by creating an entity before the transaction starts, then deleting the entity inside the transaction. Check for existence of the entity at the start of the transaction and simply return (completing the txn) if it's been deleted.
When you run a transaction, catch ConcurrentModificationException and retry the process in a loop. Now when any txn gets conflicted, it will simply retry until it succeeds.
The only bad thing about collisions here is that it slows the system down and wastes effort during retries. However, you will get at least one completed transaction per second (maybe a bit less if you have XG transactions) throughput.
Objectify4 handles the retries for you; just define your unit of work as a run() method and run it with ofy().transact(). Just make sure your work is idempotent.
The way I see it, you can either prevent the first task from updating the object because certain values have changed from when the task was first launched.
Or you can you embed the object's values within the task request so that the 2nd calc task will restore the object state with consistent value and calcuated members.
We are building a product, so from performance point of view I need some help.
We are using complete Spring (MVC, JPA, Security etc..)
We have a requirement where say for a particular flow there can be 100 Business Rules getting executed at the same time. There can be n number of such flows and business rules.
These rules when executed actually fetch records from tables in database, these will contain few LAZILY INITIALIZED ENTITIES also.
I used Futures/Callables for multi threading purpose, but the problem is it fails to load LAZY variables. It gives Hibernate loading exception, probably some issue in TRANSACTIONAL not getting distributed in different threads.
Please let me know if there is any other way to approach?
if some Entity /Entity Collection is lazy fetched , and you are accessing it in another thread, you will face LazyInitialization exception, as lazy loaded entities can be accessed only within a transaction, and transaction wont span across, threads.
You can use DTO pattern, or if you are sharing an entity across threads, call its lazy initialized collections getter within the transaction so that they are fetched within transaction itself.
If you really need async processing then I suggest you use the Java EE specified way i.e. using JMS/MDB.
Anyways, since it seems like you want all the data to be loaded anyway for your processing. So Eager fetch all the required data and then submit the data for parallel processing. Or let each of the Tasks (Callables) fetch the data they require. Essentially, I am asking you to change your design so that the transactional boundaries doesn't cross multiple threads. Localize the boundaries within a single thread.