I'm using ehcache with hibernate and I'd like to use the blocking or SelfPopulating cache to avoid the issues presented in http://ehcache.org/documentation/constructs-0_5.html#mozTocId722946
An expensive operation is required, say rendering a large web page, which takes 30 seconds. The page is not considered stale until it is 5 minutes old. The page is hit very heavily and will be hit an average of 20 times per minute each 5 minutes.
Do I have to do this programmatically as http://ehcache.org/documentation/cache_decorators.html suggests or is there a declarative (in xml) way to do so?
thanks a lot
There is no way to do this in ehcache.xml since you must register the class with the CacheManager before the cache config is read.
So you must use the code mentioned in the docs and you must run this code before you do anything with Hibernate. A simple way to do this is to use the hibernate.cache.provider_class property which tells Hibernate a factory for the cache. Have a look at the source of an implementation which should give you an idea what you need to do.
Related
The data in db changes every 14 seconds but there may be many calls from clients to fetch same data in that duration. So, in Servlet I implemented a logic like
if(fetchedDataMoreThan3SecondsAgo)
/*a servlet field */ lastFetchedData=fetchData();
return lastFetchedData;
But when I measured, fetching data from db only takes few milliseconds. So the thing I did is probably already done by MySql.
Is this an unnecasary optimization? Because with my "optimazation",in some rare cases a client may recieve data 2-3 seconds longer than it should.
From architecture point of view in your case caching is good idea - you have data which is not changing constantly, does not make sense to bother the DB - is expensive resource so caching makes a lot of sense.
This is very important regarding scalability - if you have hundreds requests is OK, but what if your app gets millions of requests, this could lead to many problems without caching.
Implementing cache properly is good question, the code which you provided is the simplest, I would probably implement separate caching mechanism which will take care about updating in separate thread and share it on application level - servlets dont need to care about update period.
Actually you can determine in your app when the changing happened - first compare new data and old data with cache turned off, and when you see change you can start waiting for 13 seconds, then turn cache off until you see data change again. In this case you wouldnt have wrong data.
Im using eclipse link with derby DB or MaxDB,when I did performace testing to persist data for 10 entities with 1000 record each this take 12.9 sec ,my code take 0.9 and the commit of the entity manager using JPA takes about 12 sec .
1.is it OK that for 10,000 records the time will be 12 secondes?
2.I read that there is option use
<property name="eclipselink.jdbc.batch-writing" value="JDBC" />
what is the drawback of using it?how the logging is work?
3.what about using Thread for the commit is it OK?
It's OK if it's sufficiently fast for you. Only you knows that. You could compare it with code written by hand using JDBC. But don't forget to also take into account the maintainability and the correctness of the code and the time it takes to write and test it. Hardware is cheap. Developers are not. Note that the use-case you tested (inserting lots of rows in lots of tables) is not a very frequent use-case in most typical applications, and not well-suited for JPA, which is typically used to implement short transactions (like buying a book on Amazon, or adding a message in a blog, things like that).
No idea.
JPA entitymanagers are not thread-safe, and the current transaction is typically associated to the current thread. You can't start a transaction in a thread and commit it in another one.
You should definitely enable batch writing (and confirm that your database/driver supports it).
See,
http://java-persistence-performance.blogspot.com/2011/06/how-to-improve-jpa-performance-by-1825.html
Technology Versions:
Hibernate 3.6.5
Hibernate Search 3.4.0
Lucene (lucene-core-*.jar) 3.1.0
Spring 3.1.0.M2
Hibernate (Search) Configuration:
<prop key="hibernate.search.default.indexBase">/some/dir</prop>
<prop key="hibernate.search.default.directory_provider">org.hibernate.search.store.FSDirectoryProvider</prop>
<prop key="hibernate.search.default.locking_strategy">native</prop>
<prop key="hibernate.search.default.exclusive_index_use">false</prop>
The issue:
Our application is deployed in Amazon (AWS cloud), but we've faced this issue in our local clusters as well:
The application design is such that there is a Thread that is spawned from within the main (Web) application wherefrom we need to update an Indexed entity. Basically it's like a status monitor thread which reads a .status file and updates the database every 30 seconds or so. This keeps happening for about 10mins to 1/2 an hour on an average.
The issue we see is that: every few days we need to regenerate the indexes because Hibernate Search stops returning anything for the entity in question (the one discussed above).
I went through few forums and seems it is suggested that only a single thread should be updating the Lucene indexes. But it is also given that index writing is thread-safe. So even if multiple threads are writing to the same index, I still expect that it should not cause the issue (of nothing being returned in the search). That is to say, I may get stale status of the entity in question, but still, something should be returned.
We're using the default IndexReader/Writer implementation of Hibernate Search.
Any help would be highly appreciated. Thanks.
here are some thoughts.
I went through few forums and seems it is suggested that only a single
thread should be updating the Lucene indexes.
That's not generally true. Lucene and Hibernate Search allow multiple index writer, BUT access to the index must be properly synchronized which happens via Lucene's org.apache.lucene.store.LockFactory. The lock factory is configurable and you are using the native one via the property *hibernate.search.default.locking_strategy*. The problem might be that this strategy is file based. I don't know much about Amazon's distributed file system internal workings, but I would imagine that file locks just don't work in this case. You might need to implement a custom lock strategy.
But it is also given that index writing is thread-safe. So even if multiple threads are
writing to the same index, I still expect that it should not cause the issue
Correct, provided the locking works.
An alternative is to work w/o locks (setting *hibernate.search.default.locking_strategy* to none) provided you can guarantee that only your update thread will ever write to the index. Do you have aside of the update thread automatic indexing enabled? If so try turning it of (permitted your use case allows you to).
Ok, for those who have landed here looking for a solution -- here's what we did finally, which seems to have addressed the issue (we've load-tested this fix on AWS and have had no issues so far):
We wrote our implementation of a lock factory based on Lucene's NativeFSLockFactory. We modified the obtain method so as to retry obtaining the lock a couple of times before giving-up. We added a short lag (sleep) between retries to handle NFS latencies.
We tested this fix with Lucene's LockVerifyServer and observed that under high load, even though a few of the lock obtain requests have to wait to obtain the lock -- but eventually every lock obtain request is addressed. In real-time, this interprets to a successful update of the index files.
Thank you Hardy for showing us the path. :)
Update: Yesterday, we had to bump-up the retry count to a higher number ~30 because we faced an index update slippage with the previous value of 3. Things seem fine thereafter.
In order to minimize the number of database queries I need some sort of cache to store pairs of data. My approach now is a hashtable (with Strings as keys, Integers as value). But I want to be able to detect updates in the database and replace the values in my "cache". What I'm looking for is something that makes my stored pairs invalid after a preset timespan, perhaps 10-15 minutes. How would I implement that? Is there something in the standard Java package I can use?
I would use some existing solution(there are many cache frameworks).
ehcache is great, it can reset the values on given timespan and i bet it can do much more(i only used that)
You can either use existing solutions (see previous reply)
Or if you want a challenge, make your own easy cache class (not recommended for production project, but it's a great learning experience.
You will need at least 3 members
A cache data stored as hashtable object,
Next cache expiration date
Cache expiration interval set via constructor.
Then simply have public data getter methods, which verify cache expiration status:
if not expired, call hastable's accessors;
if expired, first call "data load" method that is also called in the constructor to pre-populate and then call hashtable accessors.
For an even cooler cache class (I have implemented it in Perl at my job), you can have additional functionality you can implement:
Individual per-key cache expiration (coupled with overall total cache expiration)
Auto, semi-auto, and single-shot data reload (e.g., reload entire cache at once; reload a batch of data defined either by some predefined query, or reload individual data elements piecemail). The latter approach is very useful when your cache has many hits on the same exact keys - that way you don't need to reload universe every time 3 kets that are always accessed expire.
You could use a caching framework like OSCache, EHCache, JBoss Cache, JCS... If you're looking for something that follows a "standard", choose a framework that supports the JCache standard interface (javax.cache) aka JSR-107.
For simple needs like what you are describing, I'd look at EHCache or OSCache (I'm not saying they are basic, but they are simple to start with), they both support expiration based on time.
If I had to choose one solution, I'd recommend Ehcache which has my preference, especially now that it has joined Terracotta. And just for the record, Ehcache provides a preview implementation of JSR107 via the net.sf.cache.jcache package.
I have a web application that receives messages through an HTTP interface, e.g.:
http://server/application?source=123&destination=234&text=hello
This request contains the ID of the sender, the ID of the recipient and the text of the message.
This message should be processed like:
finding the matching User object for both the source and the destination from the database
creating a tree of objects: a Message that contains a field for the message text and two User objects for the source and the destination
persisting this tree to a database.
The tree will be loaded by other applications that I can't touch.
I use Oracle as the backing database and JPA with Toplink for the database handling tasks. If possible, I'd stay with these.
Without much optimization I can achieve ~30 requests/sec throughput in my environment. That's not much, I'd require ~300 requests/sec. So I measured where the performance bottleneck is and found that the calls to em.persist() takes most of the time. If I simply comment out that line, the throughput go well over 1000 requests/sec.
I tried to write a small test application that used simple JDBC calls to persist 1 million messages to the same database. I used batching, meaning I did 100 inserts then a commit, and repeated until all the records was in the database. I measured ~500 requests/sec throughput in this scenario, that would meet my needs.
It is clear that I need to optimize insert performance here. However as I mentioned earlier I would like to keep using JPA and Toplink for this, not pure JDBC.
Do you know a way to create batch inserts with JPA and Toplink? Can you recommend any other technique for improving JPA persist performance?
ADDITIONAL INFO:
"requests/sec" means here: total number of requests / total time from beginning of test to last record written to database.
I tried to make the calls to em.persist() asynchronous by creating an in-memory queue between the servlet stuff and the persister. It helped the performance greatly. However the queue did grow really fast and as the application will receive ~200 requests/second continuously, It is not an acceptable solution for me.
In this decoupled approach I collected requests for 100 msec and called em.persist() on all collected items before commiting the transaction. The EntityManagerFactory is cached between each transaction.
You should decouple from the JPA interface and use the bare TopLink API. You can probably chuck the objects you're persisting into a UnitOfWork and commit the UnitOfWork on your schedule (sync or async). Note that one of the costs of em.persist() is the implicit clone that happens of the whole object graph. TopLink will work rather better if you uow.registerObject() your two user objects yourself, saving itself the identity tests it has to otherwise do. So you'll end up with:
uow=sess.acquireUnitOfWork();
for (job in batch) {
thingyCl=uow.registerObject(new Thingy());
user1Cl=uow.registerObject(user1);
user2Cl=uow.registerObject(user2);
thingyCl.setUsers(user1Cl,user2Cl);
}
uow.commit();
This is very old school TopLink btw ;)
Note that the batch will help a lot, because batch writing and more especially batch writing with parameter binding will kick in which for this simple example will probably have a very large impact on your performance.
Other things to look for: your sequencing size. A lot of the time spent writing objects in TopLink is actually spent reading sequencing information from the database, especially with the small defaults (I would probably have several hundred or even more as my sequence size).
What is your measure of "requests/sec"? In other words, what happens for the 31st request? What resource is being blocked? If it is the front-end/servlet/web portion, can you run em.persist() in another thread and return immediately?
Also, are you creating transactions each time? Are you creating EntityManagerFactory objects with each request?