Can this be solved with Ehcache? - java

I would like to use Ehcache for the following task:
There's a routine that can be executed only n times a day. Each time it's invoked, a counter in the database is decreased. When it reaches 0, this fact is denoted in a shared hash map (filed under the current date), and there's no need to contact the database until the end of day. The database counter is reset to n at midnight by an asynchronous task, the hash map does not have appropriate entry for the new date, and database polling resumes.
Now I'd like to implement this behaviour in Ehcache, because we use it already for other caches, and because I'd like to be able to turn off all caching in one place. This poses the following problems:
The condition for cache activation is known only inside the #Cacheable method (when it's discovered that the DB counter is zero). This probably rules out declarative cache specification, correct?
The time to live needs to be specified as a point in time, not as duration. Is this possible?

You will probably have to implements eviction policy if you want to do that with ehcache.
see here : http://ehcache.org/documentation/apis/cache-eviction-algorithms#plugging-in-your-own-eviction-algorithm

Related

Caching in Apache Beam: Static variable vs Stateful processing

I'm adding a caching functionality to one of the DoFns inside a Dataflow Pipeline in Java. The DoFn is currently using a REST client to send a request to an endpoint (that charges based on number of request, and the response will change roughly every hour) for every input element, and what I want to achieve is to cache the response from the endpoint, and have it expiries every 15 mins.
I found two ways to do this: one from a similar stackoverflow question that suggested to use static variable to host a cache service (I used guava for caching). However I wasn't sure how to update the expiry from outside of the DoFn.
Another approach that I found is to use stateful processing to store a hash that keep track of the requests and responses, and use a TimerSpec to clear the "cache" every 15 mins. Although it appears that there is no way to set a timer for each element in the cache.
I haven't tried the second approach yet. While I'm going to implement it, I wonder if someone had running into similar situations, and has any suggestions, or has better approaches.
However I wasn't sure how to update the expiry from outside of the DoFn.
Do you need to? DoFn does a lookup first and if the entry already expired it issues a request and updates cache. Cache reads & write need to be thread-safe.
Although it appears that there is no way to set a timer for each element in the cache.
You can probably sets a timer that scan the entire cache every X minutes and refresh expired entries. However, if your state is not keyed using a global state like this will limit the parallelism of your pipeline.

Reliably tracking changes made by Hibernate

I'm using a PostUpdateEventListener registered via
registry.appendListeners(EventType.POST_COMMIT_UPDATE, listener)
and a few other listeners in order to track changes made by Hibernate. This works perfectly, however, I see a problem there:
Let's say, for tracking some amount by id, I simply execute
amountByIdConcurrentMap.put(id, amount);
on every POST_COMMIT_UPDATE (let's ignore other operations). The problem is that this call happens some time after the commit. So with two commits writing the same entity shortly one after the other, I can receive the events in the wrong order, ending up with the older amount stored.
Is this really possible or are the operations synchronized somehow?
Is there a way how to prevent or at least detect such situation?
Two questions and a proposal later
Are you sure, that you need this optimization. Why not fetch the amount as it is written to the database by querying there. What gives you reason to work with caching.
How do you make sure, that the calculation of the amount before writing it to the database is properly synchronized, so that multiple threads or probably nodes do not use old data to calculate the amount and therefore overwrite the result of a later calculation?
I suppose you handle question number 2 right. Then you have to options:
Pessimistic locking, that means that immediatly before the commit you can exclusively update your cache without concurrency issues.
Optimistic locking: In that case you have a kind of timestamp or counter in your database-record you can also put into the cache together with the amount. This value you can use to find out, what the more recent value is.
No, there are no ordering guarantees, so you'll have to take care to ensure proper synchronization manually.
If the real problem you are solving is caching of entity state and if it is suitable to use second-level cache for the entity in question, then you would get everything out of the box by enabling the L2 cache.
Otherwise, instead of updating the map from the update listeners directly, you could submit tasks to an Executor or messaging system that would asynchronously start a new transaction and select for update the amount for the given id from the database. Then update the map in the same transaction while holding the corresponding row lock in the db, so that map updates for the same id are done serially.

Set up both TTL and TTI in Ehcache 3 XML configuration

What I am trying to accomplish is to set up both TTL (time to live) and TTI (time to idle) for a cache, so that the key either expires after TTL time or it can be expired earlier in case in hasn't been accessed for TTI period.
In Ehcache 2 it was possible with the following configuration:
<cache name="my.custom.Cache"
timeToIdleSeconds="10"
timeToLiveSeconds="120">
</cache>
In Ehcache 3 the analogous configuration block looks like the following:
<cache alias="my.custom.Cache">
<expiry>
<tti unit="seconds">10</tti>
<ttl unit="minutes">2</ttl>
</expiry>
</cache>
The problem is that such configuration is considered invalid since ehcache.xsd states that there should only be one option under expiry tag (either tti or ttl, but not both).
As mentioned by Louis Jacomet on the mailing list:
In order to achieve what you want, you need to create a custom Expiry, which you can do with the Expirations.builder() that was introduced in 3.3.1, or with a custom implementation of the Expiry interface.
Note however that your explanation of what the expiration did in Ehcache 2 is slightly incorrect. When you combine TTL and TTI, the element remains valid for the whole TTL whether it is accessed or not. However, if it is accessed close to the end of the TTL period, the last access time + TTI can make it stay for longer in the cache. And if it is access again during that period, the last access time is updated again thus extending the life of the mapping.
The way Expiry works in Ehcache 3 is slightly different, as effectively we compute an expiration time each time the mapping is created, accessed or updated. This is done to reduce overhead in stored mappings.
So if you configure your Expiry with getExpiryForCreation returning 120 seconds but getExpiryForAccess returning 10 seconds, a created but never access element will be considered expired after 120 seconds. While a created but accessed element will be considered expired 10 seconds after the last access, even if that time is still within the 120 seconds.
TTI is really a weird concept when you think about it, that we kept for JCache compatibility, but which is effectively closer to eviction than expiration. Because what does it mean for the freshness of a value that it is being read? While it indeed means this is a useful value in the cache that should not be evicted.
And in XML, you cannot use the tti and ttl shortcut in combination. But you can configure an expiry by fully qualified class name. We should consider extending the XML system so that you can do some equivalent of the added builder in code.

How to know when updates to the Google AppEngine HRD datastore are complete?

I have a long running job that updates 1000's of entity groups. I want to kick off a 2nd job afterwards that will have to assume all of those items have been updated. Since there are so many entity groups, I can't do it in a transaction, so i've just scheduled the 2nd job to run 15 minutes after the 1st completes using task queues.
Is there a better way?
Is it even safe to assume that 15 minutes gives a promise that the datastore is in sync with my previous calls?
I am using high replication.
In the google IO videos about HRD, they give a list of ways to deal with eventual consistency. One of them was to "accept it". Some updates (like twitter posts) don't need to be consistent with the next read. But they also said something like "hey, we're only talking miliseconds to a couple of seconds before they are consistent". Is that time frame documented anywhere else? Is it safe assuming that waiting 1 minute after a write before reading again will mean all my preivous writes are there in the read?
The mention of that is at the 39:30 mark in this video http://www.youtube.com/watch?feature=player_embedded&v=xO015C3R6dw
I don't think there is any built in way to determine if the updates are done. I would recommend adding a lastUpdated field to your entities and updating it with your first job, then check for the timestamp on the entity you're updating with the 2nd before running... kind of a hack but it should work.
Interested to see if anybody has a better solution. Kinda hope they do ;-)
This is automatic as long as you are getting entities without changing the consistency to Eventual. The HRD puts data to a majority of relevant datastore servers before returning. If you are calling the asynchronous version of put, you'll need to call get on all the Future objects before you can be sure it's completed.
If however you are querying for the items in the first job, there's no way to be sure that the index has been updated.
So for example...
If you are updating a property on every entity (but not creating any entities), then retrieving all entities of that kind. You can do a keys-only query followed by a batch get (which is approximately as fast/cheap as doing a normal query) and be sure that you have all updates applied.
On the other hand, if you're adding new entities or updating a property in the first process that the second process queries, there's no way to be sure.
I did find this statement:
With eventual consistency, more than 99.9% of your writes are available for queries within a few seconds.
at the bottom of this page:
http://code.google.com/appengine/docs/java/datastore/hr/overview.html
So, for my application, a 0.1% chance of it not being there on the next read is probably OK. However, I do plan to redesign my schema to make use of ancestor queries.

Ehcache -- Expiring on certain time

When using ehcache, is there a way to expire cache on certain time of the day?
Thanks,
Lawardy
There is no such functionality out-of-the-box. You need an external solution like Quartz, also from Terracotta umbrella.
In fact, even normal timeToLive parameter does not remove element in question after this time elapses, because this would required additional thread. Instead the item is removed when new one is to be added which takes its place.

Categories