Eagerly repopulate EhCache instead of waiting for a read

Eagerly repopulate EhCache instead of waiting for a read - java

In my scenario, getting the fresh (non-cached) values is a very expensive operation so it is imperative pre-calculated cached values exist at all times instead of refreshing them on read, like EhCache seems to do.
For this it sounds reasonable to have a thread firing on TTL expiration repopulating the cache with fresh values, so no reads are ever waiting.
Is there a way to achieve this using Ehcache? Listening for OnElementExpired/Evicted events to repopulate the cache seems like a no-go (by the time I receive the event, a read would already be waiting).
I guess I could make the cache itself eternal and have my own scheduled task that repopulates, but then I get nothing from EhCache over dumb maps that I have now. Is this really how it is? Is there no way to have EhCache help me in this situation?

Ehcache provides a way of doing what you want with scheduled refresh.
You will need two things in order to make this work with Ehcache:
Use a cache loader - that is move to a cache read-through pattern. This is required as otherwise Ehcache has no idea how to get to the data mapped to a key.
Configure scheduled refresh - this works by launching a quartz scheduler instance.

Take a look at RefreshAheadCache, provided by EHCache.
However, I cannot find any examples of its use and indicators that this is mature.
The comment of the class says:
A cache decorator which implements read ahead refreshing. Read ahead occurs when a cache entry is accessed prior to its expiration, and triggers a reload of the value in the background.
This does not directly solve the problem as you mention below:
My problem is how to repopulate the cache without waiting for a read to trigger it
As far as I know there is no standard way to do it. The reason for it, is that the expiry is not timer based.
(Shameless) hint: Since I think this is quite useful, I implemented this in cache2k. The feature is called background refresh, enabled by CacheBuilder.backgroundRefresh(true).

may be this code could help :
https://github.com/jsr107/jsr107spec/issues/328

Related

Is there any third party java cache which provides control over the expiration event?

My use-case is that I need to implement a cache on top of a service should expire entries after a certain amount of time (from their time of creation).
And if the entry is getting expired, then service look up should be done to get the latest entry. lets call is service refresh.
But, lets say if service refresh fails, then I should be able to use the stale data in the cache.
But since the cache is already expired, I don't have that entry.
So, I am thinking of controlling the expiration of the cache and cache entry would only be expired only if service is available to get the latest data, otherwise don't remove that entry.
I was looking into Google Guava cache, but it only provides a removalListener which would just notify me with the event but I am not able to control the expiration event with this.
Is there any third party cache implementation which can serve my purpose?

This kind of resilience and robustness semantics are implemented in cache2k. We use this in production for quite some time. The setup looks like
CacheBuilder.newCache(Key.class, Value.class)
.name("myCache")
.source(new YourSourceImplementation())
.backgroundRefresh(true)
.suppressExceptions(true)
.expiryDuration(60, TimeUnit.SECONDS)
.build();
With exceptionExpiryDuration you can actually set a shorter interval for the retry. There was a similar question on SO, which I also answered, see: Is it possible to configure Guava Cache (or other library) behaviour to be: If time to reload, return previous entry, reload in background (see specs) There you find some more details about it.
Regardless what cache you use, you will run into a lot of issues, since exception handling in general and building robust and resilient applications needs some thoughts in the details.
That said, I am not totally happy with the solution yet, since I think we need more control, e.g. how long stale data should be served. Furthermore, the cache needs to have some alerting if there is stale data in it. I put some thoughts on how to improve this here: https://github.com/cache2k/cache2k/issues/26
Feedback is very welcome.

Avoid history table explosion for long-running cyclic processes?

I have a process which has a conditional loop iterating each 1 minute. The process itself can run for weeks, but I expect 99% of it's the history to be a repeating entries reflecting invocation of executions which are the part of the said cycle. Example:
|15:34:30.167|15:34:30.238|TimerCatchEvent|
|15:34:30.258|15:34:30.323|CheckConditionServiceTask|
|15:34:30.371|15:34:30.410|ExclusiveGateway1|
|15:34:30.457|15:34:30.501|ReturningtoTimerEventServiceTask|
|15:35:30.167|15:35:30.238|TimerCatchEvent|
|15:35:30.258|15:35:30.323|CheckConditionServiceTask|
|15:35:30.371|15:35:30.410|ExclusiveGateway1|
|15:35:30.457|15:35:30.501|ReturningtoTimerEventServiceTask|
|15:36:30.167|15:36:30.238|TimerCatchEvent|
|15:36:30.258|15:36:30.323|CheckConditionServiceTask|
|15:36:30.371|15:36:30.410|ExclusiveGateway1|
|15:36:30.457|15:36:30.501|ReturningtoTimerEventServiceTask|
Is there any way to somehow collapse these repeating history entries on the camunda level? Or maybe someone had come up with other solutions to this problem?
P.S. This is a follow-up for the question on cross-process synchronization: Cross-process synchronization in Camunda? - I have implemented what I need using post-fact timer-based "are all ready to sync?"-check.

In Camunda the History is Event Driven. It is possible to
implement a custom HistoryLevel controlling the amount of events produced and the data they contain,
implement a custom History Backend which allows you to logg the events in a different way than the default handler does.
Maybe that is useful to you?

You could also delete history entries from the database using some clean-up-script. It is safe to delete information from the history - it will not affect the runtime behavior.
Cheers
Bernd

How to know when updates to the Google AppEngine HRD datastore are complete?

I have a long running job that updates 1000's of entity groups. I want to kick off a 2nd job afterwards that will have to assume all of those items have been updated. Since there are so many entity groups, I can't do it in a transaction, so i've just scheduled the 2nd job to run 15 minutes after the 1st completes using task queues.
Is there a better way?
Is it even safe to assume that 15 minutes gives a promise that the datastore is in sync with my previous calls?
I am using high replication.
In the google IO videos about HRD, they give a list of ways to deal with eventual consistency. One of them was to "accept it". Some updates (like twitter posts) don't need to be consistent with the next read. But they also said something like "hey, we're only talking miliseconds to a couple of seconds before they are consistent". Is that time frame documented anywhere else? Is it safe assuming that waiting 1 minute after a write before reading again will mean all my preivous writes are there in the read?
The mention of that is at the 39:30 mark in this video http://www.youtube.com/watch?feature=player_embedded&v=xO015C3R6dw

I don't think there is any built in way to determine if the updates are done. I would recommend adding a lastUpdated field to your entities and updating it with your first job, then check for the timestamp on the entity you're updating with the 2nd before running... kind of a hack but it should work.
Interested to see if anybody has a better solution. Kinda hope they do ;-)

This is automatic as long as you are getting entities without changing the consistency to Eventual. The HRD puts data to a majority of relevant datastore servers before returning. If you are calling the asynchronous version of put, you'll need to call get on all the Future objects before you can be sure it's completed.
If however you are querying for the items in the first job, there's no way to be sure that the index has been updated.
So for example...
If you are updating a property on every entity (but not creating any entities), then retrieving all entities of that kind. You can do a keys-only query followed by a batch get (which is approximately as fast/cheap as doing a normal query) and be sure that you have all updates applied.
On the other hand, if you're adding new entities or updating a property in the first process that the second process queries, there's no way to be sure.

I did find this statement:
With eventual consistency, more than 99.9% of your writes are available for queries within a few seconds.
at the bottom of this page:
http://code.google.com/appengine/docs/java/datastore/hr/overview.html
So, for my application, a 0.1% chance of it not being there on the next read is probably OK. However, I do plan to redesign my schema to make use of ancestor queries.

Ehcache -- Expiring on certain time

When using ehcache, is there a way to expire cache on certain time of the day?
Thanks,
Lawardy

There is no such functionality out-of-the-box. You need an external solution like Quartz, also from Terracotta umbrella.
In fact, even normal timeToLive parameter does not remove element in question after this time elapses, because this would required additional thread. Instead the item is removed when new one is to be added which takes its place.

Session management using Hibernate in a multi-threaded Swing application

I'm currently working on a (rather large) pet project of mine , a Swing application that by it's very nature needs to be multi-threaded. Almost all user interactions might fetch data from some remote servers over the internet , since I neither control these servers nor the internet itself, long response times are thus inevitable. A Swing UI obviously cannot repaint itself while the EDT is busy so all remote server calls need to be executed by background thread(s).
My problem:
Data fetched by the background threads gets 'enriched' with data from a local (in-memory) database (remote server returns IDs/references to data in the local database). This data later eventually gets passed to the EDT where it becomes part of the view model. Some entities are not completely initialized at this point (lazy-fetching enabled) so the user might trigger lazy-fetching by e.g. scrolling in a JTable. Since the hibernate session is already closed this will trigger a LazyInitializationException. I can't know when lazy-fetching might be triggered by the user so creating a session on demand/attaching the detached object will not work here.
I 'solved' this problem by:
using a single (synchronized , since Session instances are not thread-safe) Session for the whole application
disabling lazy-fetching completely
While this works, the application's performance has suffered greatly (sometimes being close to unusable). The slowdown is mainly caused by the large number of objects that are now fetched by each query.
I'm currently thinking about changing the application's design to 'Session-per-thread' and migrating all entities fetched by non-EDT threads to the EDT thread's Session (similar to this posting on the Hibernate forums).
Side-note: Any problems related to database updates do not apply since all database entities are read-only (reference data).
Any other ideas on how to use Hibernate with lazy-loading in this scenario ?

Don't expose the Session itself in your data API. You can still do it lazily, just make sure that the hydration is being done from the 'data' thread each time. You could use a block (runnable or some kind of command class is probably the best Java can do for you here unfortunately) that's wrapped by code that performs the load async from the 'data' thread. When you're in UI code, (on the UI thread of course) field some kind of a 'data is ready' event that is posted by the data service. You can then get the data from the event use in the UI.

You could look have a look at Ebean ORM. It is session-less and lazy loading just works. This doesn't answer your question but really proposes an alternative.
I know Ebean has built in support for asynchronous query execution which may also be interesting for your scenario.
Maybe worth a look.
Rob.

There are two distinct problems, that should get resolved seperately:
Handling of Hibernate Sessions in Swing Applications. Let me recommend my own article, regarding this problem: http://blog.schauderhaft.de/2008/09/28/hibernate-sessions-in-two-tier-rich-client-applications/
The basic idea is to have a session for every frame, excluding modal frames which use the session of the spawning frame. It is not easy but it works. Meaning, you won't get any LLEs anymore.
How to get your GUI thread separated from the back end.
I recommend to keep the hibernate objects strictly on the back end thread they originate from. Only give wrapper objects to the ETD. If these wrapper objects are asked for a value, they create a request which gets passed to the backend thread, which eventually will return the value.
I'd envision three kinds of wrapper Implementations:
Async: requests the value, and gets notified when the value is available. It would return immediately with some dummy value. On notification it will fire a PropertyChange event i.O. to inform the GUI about the 'changed' value (changed from unknown to a real value).
Sync: requests the value and waits for it to be available.
Timed: a mixture between the two, waiting for a short time (0.01) seconds, before returning. This would avoid plenty change events, compared to the async version.
As a basis for these wrappers a recommend the ValueModel of the JGoodies Binding library: http://www.jgoodies.com/downloads/libraries.html
Obviously You need to take care that any action is only performed on actually loaded values, but since you don't plan on doing updates this shouldn't be to much of an issue.
Let me end with a warning: I have thought about it a lot, but never actually tried it, so move with care.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.