Hibernate Second Level Cache In Case of Soft Delete - java

Read operations are very high as compare to insert/update/delete for master data module. We are using JDBC for read,write and update operations till now. We are doing soft delete (Marking IS_DELETED column to 'Y') on delete operation. All write/update methods are synchronized to handle the concurreny. We are using oracle and have no plan to support multiple databases.
Now, We are planning to cache the data and we also have plans to go for clustering.
The easiest option we have is to change the insert/update/delete methods and use something like ehcache to manage the cache as per our requirement and handle concurrency in the clustered environment by using version column in the table and remove synchronized keyword.
Other option that people around me are suggesting (Infact asking me to do) is to move to the hibernate (I don't know much about hibernate) which will take care of caching and concurrency automatically.
Here are my doubts:
1) Is it worth to change the complete DAO code given we have around 200 tables to mangage the master data ?.
2) Would hibernate second level cache help in this case given we need to filter the cached data again to discard deleted rows or there is a mechanism in hibernate (or any other way) by which we can perform update operation in database but delete operation in the cached data ?
3) We have exposed the data transfer objects to other modules having all the fields of the table with primary key stored in the separate PK Objects (Having Primary key fields in a separate object) and we don't have reference DO in it (Composite DO are not there). Given, We can't afford to change the exposed methods and DO structure - so do we have to pack the hibernate cached entities data in our DO again ? Or we can reuse the old DO structure as hibernate entity (As Per my understaindg PK column should be there directly in the hibenate entity rather than being in some composite object). I mentioned composite DO because we also have dependent dropdown requirement which could have been used with hibernate lazy loading for the child objects if we would have composite DO at the first place. Argument against is to provide new methods which would use cached data and depricate the old methods. Other modules would slowly migrate as per their need on caching but we will have maintaince issues as we have to maintain both methods in case of the db changes. Also, 1 and 2 doubts are still there.
I am sure that hibernate is not the way to go for us at this stage and I have to convince people around me but I want to know your views on long term advantages of moving to hibernate other than automatic management of second level cache, concurrency handling (Can we done by small code change at common place) and db indepedency (We are not interested in) on the cost of changing the complete code.

If you plan to migrate to hibernate you should take in account
1) You'll need to map all your structure to POJO's (if you have not already)
2) Rewrite all DAO's to use hibernate (bare in mind, that hibernate QL/criteria API has certain limitations
3) Be ready to fight lazy initialization problems and so on...
Personaly i don't thinks it's worth migrating to hibernate with working model unless it's extremly painfull to maintain current model
Concerning your 2 and 3 questions
2) Second level cache holds only loaded instances, accessed by primary key. i.e. if you say hibernateSession.load(User, 10) - it will lookup User object in second level cache using id=10. If i understand clearly that's not the case. Most of the time you want to load your data using more complex query - in that case you will need StandarQueryCache, which will map your query string to a list of loaded IDs which in turn will be retrieved from second-level cache. But if you have a lot of queries with a low similarity - both StandartQueryCache and second level cache will be total useless (take a look http://darren.oldag.net/2008/11/hibernate-query-cache-dirty-little_04.html)
3)You can use components and such, but im not sure about your DTO structure.
Hope that helps

Related

Auditing using Data tables vs Separate Audit tables

I am in the process of designing a new java application which has very strict requirements for auditing. A brief context is here:
I have a complex entity with multiple one to many nested relationships. If any of the field changes, I need to consider it as a new version of the object and all this need to be audited as well. Now I have two options:
1.) Do not do any update operation, just insert a new entity whenever anything changes. This would require me to create all the relational objects (even if they have not been changed) as I do not want to hold references to any previous version objects. My data tables becomes my auditing table as well.
OR
2.) Always do an update operation and maintain the auditing information in separate tables. That would add some more complexity in terms of implementation.
I would like to know if there is a good vs bad practice for any of these two approaches.
Thanks,
-csn
What should define your choice is your insert/update/read patterns for both the "live" data and the audits.
Most commonly these pattern are very different for both kinds.
- Conserning "live" it depends a lot on your application but I can imagine you have significants inserts; significatant updates; lot of reads. Live data also require transactionality and have lot relationship between tables for which you need to keep consistency. They might require fast and complex search. Indexes on many columns
- Audits have lot of inserts; almost no update; few reads. Read, search don't requires complex search (e.g. you only consult audits and sort them by date) and indexes on many columns.
So with increased load and data size you will probably need to split the data and optimize tables for your use cases.

JDBC Query Caching and Precaching

Scenario:
I have a need to cache the results of database queries in my web service. There about 30 tables queried during the cycle of a service call. I am confident data in a certain date range will be accessed frequently by the service, and I would like to pre-cache that data. This would mean caching around 800,000 rows at application startup, the data is read-only. The data does not need to be dynamically refreshed, this is reference data. The cache can't be loaded on each service call, there's simply too much data for that. Data outside of this 'frequently used' window is not time critical and can be lazy loaded. Most queries would return 1 row, and none of the tables have a parent/child relationship to each other, though there will be a few joins. There is no need for dynamic sql support.
Options:
I intended to use myBatis, but there isn't a good method to warm up the cache. myBatis can't understand that the service query select * from table where key = ? is already covered by the startup pre-cache query select * from table.
As far as I understand it (documentation overload), Hibernate has the same problem. Additionally, these tables were designed with composite keys and no primary key, which is an extra hassle for Hibernate.
Question:
Preferred: Is there a myBatis solution for this problem ? I'd very much like to use it. (Familiarity, simplicity, performance, funny name, etc)
Alternatively: Is there an ORM or DB-friendly cache that offers what I'm looking for ?
You can use distributed caching solution like NCache or Tayzgrid which provide indexing and queries features along with cache startup loader.
You can configure indexes on attributes of your entities in cache. A cache startup loader can be configured to load all data from database in cache at cache startup. While loading data, cache will create indexes for all entities in memory.
Object Query Language (OQL) feature, which provides queries similar to SQL can then be used to query in-memory data.
The variety of options for third-party products (free and paid) is too broad and too dependent on your particular requirements and operational capabilities to try to "answer" here.
However, I will suggest an alternative to an explicit cache of your read-only data.
You clearly believe that the memory footprint of your dataset will fit into RAM on a reasonably-sized server. My suggestion is that you use your database engine directly (no additional external cache), but configured the database with internal cache large enough to hold your whole dataset. If all of your data is residing in the database server's RAM, it will be accessed very quickly.
I have used this technique successfully with mySQL, but I expect the same applies to all major database engines. If you cannot figure out how to configure your chosen database appropriately, I suggest that you follow ask a separate, detailed question.
You can warm the cache by executing representative queries when you start your system. These queries will be relatively slow because they have to actually do the disk I/O to pull the relevant blocks of data into the cache. Subsequent queries that access the same blocks of data will be much faster.
This approach should give you a huge performance boost with no additional complexity in your code or your operational environment.
Sormula may do want you want. You would need to annotate each POJO to be cached like:
#Cached(type=ReadOnlyCache.class)
public class SomePojo {
...
}
Pre-populate the cache by invoking selectAll method for each:
Database db = new Database(one of the JNDI constructors);
Table<SomePojo> t = db.getTable(SomePojo.class);
t.selectAll();
The key is that the cache is stored in the Table object, t. So you would need to keep a reference to t and use it for subsequent queries. Or in the case of many tables, keep reference to database object, db, and use db.getTable(...) to get tables to query.
See javadoc and tests in org.sormula.tests.cache.readonly package.

JPA merge in a RESTful web application with DTOs and Optimistic Locking?

My question is this: Is there ever a role for JPA merge in a stateless web application?
There is a lot of discussion on SO about the merge operation in JPA. There is also a great article on the subject which contrasts JPA merge via a more manual Do-It-Yourself process (where you find the entity via the entity manager and make your changes).
My application has a rich domain model (ala domain-driven design) that uses the #Version annotation in order to make use of optimistic locking. We have also created DTOs to send over the wire as part of our RESTful web services. The creation of this DTO layer also allows us to send to the client everything it needs and nothing it doesn't.
So far, I understand this is a fairly typical architecture. My question is about the service methods that need to UPDATE (i.e. HTTP PUT) existing objects. In this case we have these two approaches 1) JPA Merge, and 2) DIY.
What I don't understand is how JPA merge can even be considered an option for handling updates. Here's my thinking and I am wondering if there is something I don't understand:
1) In order to properly create a detached JPA entity from a wire DTO, the version number must be set correctly...else an OptimisticLockException is thrown. But the JPA spec says:
An entity may access the state of its version field or property or
export a method for use by the application to access the version, but
must not modify the version value[30]. Only the persistence provider
is permitted to set or update the value of the version attribute in
the object.
2) Merge doesn't handle bi-directional relationships ... the back-pointing fields always end up as null.
3) If any fields or data is missing from the DTO (due to a partial update), then the JPA merge will delete those relationships or null-out those fields. Hibernate can handle partial updates, but not JPA merge. DIY can handle partial updates.
4) The first thing the merge method will do is query the database for the entity ID, so there is no performance benefit over DIY to be had.
5) In a DYI update, we load the entity and make the changes according to the DTO -- there is no call to merge or to persist for that matter because the JPA context implements the unit-of-work pattern out of the box.
Do I have this straight?
Edit:
6) Merge behavior with regards to lazy loaded relationships can differ amongst providers.
Using Merge does require you to either send and receive a complete representation of the entity, or maintain server side state. For trivial CRUD-y type operations, it is easy and convenient. I have used it plenty in stateless web apps where there is no meaningful security hazard to letting the client see the entire entity.
However, if you've already reduced operations to only passing the immediately relevant information, then you need to also manually write the corresponding services.
Just remember that when doing your 'DIY' update you still need to pass a Version number around on the DTO and manually compare it to the one that comes out of the database. Otherwise you don't get the Optimistic Locking that spans 'user think-time' that you would have if you were using the simpler approach with merge.
You can't change the version on an entity created by the provider, but when you have made your own instance of the entity class with the new keyword it is fine and expected to set the version on it.
It will make the persistent representation match the in-memory representation you provide, this can include making things null. Remember when an object is merged that object is supposed to be discarded and replaced with the one returned by merge. You are not supposed to merge an object and then continue using it. Its state is not defined by the spec.
True.
Most likely, as long as your DIY solution is also using the entity ID and not an arbitrary query. (There are other benefits to using the 'find' method over a query.)
True.
I would add:
7) Merge translates to insert or to update depending on the existence of the record on DB, hence it does not deal correctly with update-vs-delete optimistic concurrency. That is, if another user concurrently deletes the record and you update it, it must (1) throw a concurrency exception... but it does not, it just inserts the record as new one.
(1) At least, in most cases, in my opinion, it should. I can imagine some cases where I would want this use case to trigger a new insert, but they are far from usual. At least, I would like the developer to think twice about it, not just accept that "merge() == updateWithConcurrencyControl()", because it is not.

How to increase a version field on save in Hibernate regardless if dirty or not?

I'm using Hibernate with a version column to implement optimistic concurrency control.
The question: Is it possible to increment the version number of an entity every time I save it to database, regardless if it was changed or not?
As long as some field is changed in the entity, the version number gets increased. But, if no field changed in the entity, the version number of the entity stays unchanged.
The reason behind this question is that I've got a logical master-detail relationship between two tables and I'd like to increase the version number in the master table whenever something changes in details, even if master data didn't change. This master-detail relationship is not mapped in Hibernate. I just always save them together in a single transaction.
You can use Hibernate interceptors to update the version number of the master record when you identify that a detail has changed.
http://docs.jboss.org/hibernate/core/3.3/reference/en/html/events.html
One limitation is that this solution is specific to Hibernate. JPA also allows event-driven logic using annotations (e.g. PostPersist, PostUpdate, etc...) but these methods don't give you access to the underlying session (and, more importantly, the documentation cautions you from using these methods to modify session data). I've typically used interceptors to perform auditing, but they could easily be extended to update a version number when a record is altered.
You can call lock() (or use other methods that take LockMode) with LockMode.OPTIMISTIC_FORCE_INCREMENT.

Saving tree-structures in Databases

I use Hibernate/Spring and a MySQL Database for my data management.
Currently I display a tree-structure in a JTable. A tree can have several branches, in turn a branch can have several branches (up to nine levels) again, or having leaves. Lately I have performanceproblemes, as soon as I want to create new branches on deeper levels.
At this time a branch has a foreign key to its parent. The domainobject has access to its parent by calling getParent(), which returns the parent-branch. The deeper the level, the longer it takes to create a new branch.
Microbenchmark results for creating a new branch are like:
Level 1: 32 ms.
Level 3: 80 ms.
Level 9: 232 ms.
Obviously the level (which means the number of parents) is responsible for this. So I wanted to ask, if there are any appendages to work around this kind of problem. I don’t understand why Hibernate needs to know about the whole object tree (all parents until the root) while creating a new branch. But as far as I know this can be the only reason for the delay while creating a new branch, because a branch doesn’t have any other relations to any other objects.
I would be very thankful for any workarounds or suggestions.
greets,
ymene
Basically you are having some sort of many to one relationships structure right?
In hibernate all depends on mapping. Tweak your mapping, Use One-to-many relationship from parent to child using java.util.Set.
Do not use ArrayList becasue List is ordered, so hibernate will add extra column for that ordering only.
Also check your lazy property. If you load parent and you have set lazy="false" on its child set property, then all of its children will be loaded from DB which can affect the performance.
Also check 'inverse' property for children. If inverse is true in child table, that means you can manage the child entity separately. Otherwise you have to do that using the parent only.
google around for inverse, it will sure help you.
thank.
I don't know how Hibernate handles this internally. However, there are different ways to store tree structures in a database. One which is quite efficient for many queries done on the tree is using a "nested set" approach - but this would basically yield the performance issues that you're seeing (e.g. expensive insertion). If you need fast insertion or removal I'd go with what you have, e.g. a simple parent-ID, and try to see what Hibernate is doing all this time.
If you don't need to report on your data in SQL, you could just serialize your JTable to the database instead (perhaps using something like XStream). That way you wouldn't have to worry about expensive database queries that deal with trees.
One thing you can do is use the XML support in MySQL. This will give you native ability to support hierarchies. I've never used XML support in MySQL, so I don't know if it is as full-featured as other DBMSes (SQL Server and DB2 I know have great support, probably Oracle too I would guess).
Note, that I have never used hibernate, so I don't know if you could interface with that, or if you would have to write your own DB code in this case (my guess is, you're going to be writing your own queries).

Categories