I'm trying to use Objectify in my App Engine project. It works but I've got several "paths" where a single entity can be read and write by a single servlet. Now if I well understood the architecture, according to the load the servlet container can instantiate my servlet multiple times, isn't it? So the question is: do I need to use Objectify transactions in this case? My doubt is quite basic because I think this kind of situation happens 99% of times in this context, so at this point the other question is: when I can use the simple objectify load and save? I hope someone can clarify a bit.
From Objectify Wiki: If you operate on the datastore without an explicit transaction, each datastore operation is treated like a separate little transaction which is retried separately (link: https://github.com/objectify/objectify/wiki/Concepts#transactionless).
So all the save() or delete() are executed in separate transaction. So it doesn't matter even if GAE starts multiple instances of your Servlet.
You would want to start a transaction explicitly, when you want to perform multiple operations as an atomic transaction (either all or none). e.g. select and modify, or modify multiple objects together....
Related
I have Java-based web server, and I also have DAO singleton object with method, whose SQL operations' logic must be synchronized in some way in order to guarantee data
integrity (method can be accessed from several Java threads simultaneously).
I was wondering to know whether DB transaction wrapping (serializable level) is better than DAO's method explicit synchronization in server side?
Yes, using transactions is better. With synchronizing in your code, locking on the class, the scope of that lock is your classloader, and standing up a second instance of your application will invalidate your locking, because the two instances are using different locks.
With database transactions you can have multiple instances of your application and the database treats all the transactions the same.
Also with databases you have options like dialing down the isolation level to no higher than what you need for that transaction, or using row-level locking. Those are harder to implement in code and you're still stuck with not being able to deploy a second instance.
Depends deeply in what is what you want to synchronize, synchronization is about resources, if you have more than one database in your code, and the data integrity problem is distributed, you need a transaction context, not only declaring it but knowing how to manage it properly.
Assuming you have a single database and assuming your problem is integrity caused by a possible inconsistency of a SELECT clause with a UPDATE or INSERT clause happening later in the method, The right solution would be a DB transaction and the use of a SELECT FOR UPDATE clause.
If your problem is about UPDATE/INSERT of different tables in the same operation you may have two resources, one is including CONSTRAINTS, this is the preferred method, but in some cases is not possible.
In the case that a CONTRAINT is not possible, consider a redesign of your DATAMODEL as managing this kind of problems synchronyzing app code is the worst solution, but even so is a solution.
Consider Spring MVC java web-application, which provides some REST API.
Let's say it has many methods, one of them is DELETE /api/foo/{id}, which obviously deletes foo entity from the DB with given id.
The problem is that due to big data in the DB, this operation is not immediate, so if client tries perform simultaneously multiply delete operations on same entity, say
DELETE /api/foo/123 x N times (by mistake in client software of course),
it causes some unpleasant side effects in the DB (you know, if you try delete same entity in several transactions, that's not generally nice).
My question is: what is the best practice in Spring MVC to prevent such situations?
I can certainly introduce synchronisation on Foo id in each such update method (PUT/DELETE). I will need to do it for all entities and all PUT/DELETE API methods though, which I really don't want to do. I suppose it should be some elegant and nice solution, how to perform such type of synchronisation on interceptor/servlet level, i.e. not on service of controller level.
I can also create specific interceptor and perform there waiting for duplicated requests (requests with same URL and parameters). But again, it doesn't sound as an elegant solution (until I will be ensured that it is not possible to configure in Spring MVC somehow in more beauty way).
That is a problem of concurrency that shall be handled by using the appropriate transaction and locking level. Unfortunately, there is no single size fits all way here and depending on your actual requirements, you could have to implement optimistic or pessimistic locking, as well as one of the possible transaction level (from no transaction at all to serializable transactions).
In general, handling such questions at the web level is a bad idea, because you will end in questions like what to do in on request wants to delete some data that another one is displaying at the same time? In SpringMVC, the common way is to use transactional methods in the service layer. Additionaly, you should declare an optimistic or pessimistic locking system in the persistence layer.
Optimistic layer normally give a higher throughput, at the cost of some transaction ending in exceptions. In that case, current best practices are now to report the problem to the user asking him/her to send his/her request again.
I'm using JDO to access Datastore entities. I'm currently running into issues because different processes access the same entities in parallel and I'm unsure how to go around solving this.
I have entities containing values and calculated values: (key, value1, value2, value3, calculated)
The calculation happens in a separate task queue.
The user can edit the values at any time.
If the values are updated, a new task is pushed to the queue that overwrite the old calculated value.
The problem I currently have is in the following scenario:
User creates entity
Task is started
User notices an error in his initial entry and quickly updates the entity
Task finishes based on the old data (from step 1) and overwrites the entire entity, also removing the newly entered values (from step 3)
User is not happy
So my questions:
Can I make the task fail on update in step 4? Wrapping the task in a transaction does not seem to solve this issue for all cases due to eventual consistency (or, quite possibly, my understanding of datastore transactions is just wrong)
Is using the low-level setProperty method the only way to update a single field of an entity and will this solve my problem?
If none of the above, what's the best way to deal with a use case like this
background:
At the moment, I don't mind trading performance for consistency. I will care about performance later.
This was my first AppEngine application, and because it was a learning process, it does not use some of the best practices. I'm well aware that, in hindsight, I should have thought longer and harder about my data schema. For instance, none of my entities use ancestor relationships where they would be appropriate. I come from a relational background and it shows.
I am planning a major refactoring, probably moving to Objectify, but in the meantime I have a few urgent issues that need to be solved ASAP. And I'd like to first fully understand the Datastore.
Obviously JDO comes with optimistic concurrency checking (should the user enable it) for transactions, which would prevent/reduce the chance of such things. Optimistic concurrency is equally applicable with relational datastores, so you likely know what it does.
Google's JDO plugin uses the low-level API setProperty() method obviously. The log even tells you what low level calls are made (in terms of PUT and GET). Moving to some other API will not on its own solve such problems.
Whenever you need to handle write conflicts in GAE, you almost always need transactions. However, it's not just as simple as "use a transaction":
First of all, make sure each logical unit of work can be defined in a transaction. There are limits to transactions; no queries without ancestors, only a certain number of entity groups can be accessed. You might find you need to do some extra work prior to the transaction starting (ie, lookup keys of entities that will participate in the transaction).
Make sure each unit of work is idempotent. This is critical. Some units of work are automatically idempotent, for example "set my email address to xyz". Some units of work are not automatically idempotent, for example "move $5 from account A to account B". You can make transactions idempotent by creating an entity before the transaction starts, then deleting the entity inside the transaction. Check for existence of the entity at the start of the transaction and simply return (completing the txn) if it's been deleted.
When you run a transaction, catch ConcurrentModificationException and retry the process in a loop. Now when any txn gets conflicted, it will simply retry until it succeeds.
The only bad thing about collisions here is that it slows the system down and wastes effort during retries. However, you will get at least one completed transaction per second (maybe a bit less if you have XG transactions) throughput.
Objectify4 handles the retries for you; just define your unit of work as a run() method and run it with ofy().transact(). Just make sure your work is idempotent.
The way I see it, you can either prevent the first task from updating the object because certain values have changed from when the task was first launched.
Or you can you embed the object's values within the task request so that the 2nd calc task will restore the object state with consistent value and calcuated members.
We are building a product, so from performance point of view I need some help.
We are using complete Spring (MVC, JPA, Security etc..)
We have a requirement where say for a particular flow there can be 100 Business Rules getting executed at the same time. There can be n number of such flows and business rules.
These rules when executed actually fetch records from tables in database, these will contain few LAZILY INITIALIZED ENTITIES also.
I used Futures/Callables for multi threading purpose, but the problem is it fails to load LAZY variables. It gives Hibernate loading exception, probably some issue in TRANSACTIONAL not getting distributed in different threads.
Please let me know if there is any other way to approach?
if some Entity /Entity Collection is lazy fetched , and you are accessing it in another thread, you will face LazyInitialization exception, as lazy loaded entities can be accessed only within a transaction, and transaction wont span across, threads.
You can use DTO pattern, or if you are sharing an entity across threads, call its lazy initialized collections getter within the transaction so that they are fetched within transaction itself.
If you really need async processing then I suggest you use the Java EE specified way i.e. using JMS/MDB.
Anyways, since it seems like you want all the data to be loaded anyway for your processing. So Eager fetch all the required data and then submit the data for parallel processing. Or let each of the Tasks (Callables) fetch the data they require. Essentially, I am asking you to change your design so that the transactional boundaries doesn't cross multiple threads. Localize the boundaries within a single thread.
I'm asking this question given my chosen development frameworks of JPA (Hibernate implementation of), Spring, and <insert MVC framework here - Struts 1, Struts 2, Spring MVC, Stripes...>.
I've been thinking a bit about relationships in my entity layer - for example I have an order entity that has many order lines. I've set up my app so that it eagerly loads the order lines for every order. Do you think this is a lazy way to get around the lazy initialization problems that I would come across if I was to set the fetch strategy to false?
The way I see it, I have the following alternatives when retrieving entities and their associations:
Use the Open Session In View pattern to create the session on each request and commit the transaction before returning the response.
Implement a DTO (Data Transfer Object) layer such that every DAO query I execute returns the correctly initialized DTO for my purposes. I don't really like this option much because in my experience I've found that it creates a lot of boilerplate copying code and becomes messy to maintain.
Don't map any associations in JPA so that every query I execute returns only the entities I'm interested in - this will probably require me to have DTOs anyway and will be a pain to maintain and I think defeats the purpose of having an ORM in the first place.
Eagerly fetch all (or most associations) - in the example above, always fetch all order lines when I retrieve an order.
So my question is, when and under what circumstances would you use which of these options? Do you always stick with one way of doing it?
I would ask a colleague but I think that if I even mentioned the term 'Open Session in View' I would be greeted with blank stares :( What I'm really looking for here is some advice from a senior or very experienced developer.
Thanks guys!
Open Session in View has some problems.
For example, if the transaction fails, you might know it too late at commit time, once you are nearly done rendering your page (possibly the response already commited, so you can't change the page !) ... If you had know that error before, you would have followed a different flow and ended up rendering a different page...
Other example, reading data on-demand might turn to many "N+1 select" problems, that kill your performance.
Many projects use the following path:
Maintain transactions at the business layer ; load at that point everything you are supposed to need.
Presentation layer runs the risk of LazyExceptions : each is considered a programming error, caught during tests, and corrected by loading more data in the business layer (you have the opportunity to do it efficiently, avoiding "N+1 select" problems).
To avoid creating extra classes for DTOs, you can load the data inside the entity objects themselves. This is the whole point of the POJO approach (uses by modern data-access layers, and even integration technologies like Spring).
I've successfully solved all my lazy initialization problems with Open Session In View -pattern (ie. the Spring implementation). The technologies I used were the exact same as you have.
Using this pattern allows me to fully map the entity relationships and not worry about fetching child entities in the dao. Mostly. In 90% of the cases the pattern solves the lazy initialization needs in the view. In some cases you'll have to "manually" initialize relationships. These cases were rare and always involved very very complex mappings in my case.
When using Open Entity Manager In View pattern it's important to define the entity relationships and especially propagation and transactional settings correctly. If these are not configured properly, there will be errors related to closed sessions when some entity is lazily initialized in the view and it fails due to the session having been closed already.
I definately would go with option 1. Option 2 might be needed sometimes, but I see absolutely no reason to use option 3. Option 4 is also a no no. Eagerly fetching everything kills the performance of any view that needs to list just a few properties of some parent entities (orders in tis case).
N+1 Selects
During development there will be N+1 selects as a result of initializing some relationships in the view. But this is not a reason to discard the pattern. Just fix these problems as they arise and before delivering the code to production. It's as easy to fix these problems with OEMIV pattern as it's with any other pattern: add the proper dao or service methods, fix the controller to call a different finder method, maybe add a view to the database etc.
I have successfully used the Open-Session-in-View pattern on a project. However, I recently read in "Spring In Practice" of an interesting potential problem with non-repeatable reads if you manage your transactions at a lower layer while keeping the Hibernate session open in the view layer.
We managed most of our transactions in the service layer, but kept the hibernate session open in the view layer. This meant that lazy reads in the view were resulting in separate read transactions.
We managed our transactions in our service layer to minimize transaction duration. For instance, some of our service calls resulted in both a database transaction and a web service call to an external service. We did not want our transaction to be open while waiting for a web service call to respond.
As our system never went into production, I am not sure if there were any real problems with it, but I suspect that there was the potential for the view to attempt to lazily load an object that has been deleted by someone else.
There are some benefits of DTO approach though. You have to think beforehand what information you need. In some cases this will prevent you from generating n+1 select statements. It helps also to see where to use eager fetching and/or optimized views.
I'll also throw my weight behind the Open-Session-in-View pattern, having been in the exact same boat before.
I work with Stripes without spring, and have created a manual filter before that tends to work well. Coding transaction logic on the backend turns messy really quick as you've mentioned. Eagerly fetching everything becomes TERRIBLE as you map more and more objects to each other.
One thing I want to add that you may not have come across is Stripersist and Stripernate - Stripersist being the more JPA flavor - auto-hydration filters that take a lot of the work off your shoulders.
With Stripersist you can say things like /appContextRoot/actions/view/3 and it will auto-hydrate the JPA Entity on the ActionBean with id of 3 before the event is executed.
Stripersist is in the stripes-stuff package on sourceforge. I now use this for all new projects, as it's clean and easily supports multiple datasources if necessary.
Does the Order and Order Lines compose a high volume of data? Do they take part in online processes where real-time response is required? If so, you might consider not using eager fetching - it does make a huge diference in performance. If the amount of data is small, there is no problem in eager fetching.
About using DTOs, it might be a viable implementation.
If your business layer is used internally by your own application (i.e a small web app and its business logic) it'd probably be best to use your own entities in your view with open session in view pattern since it's simpler.
If your entities are used by many applications (i.e a backend application providing a service in your corporation) it'd be interesting to use DTOs since you would not expose your model to your clients. Exposing it could mean you would have a harder time refactoring your model since it could mean breaking contracts with your clients. A DTO would make that easier since you have another layer of
abstraction. This can be a bit strange since EJB3 would theorically eliminate the need of DTOs.