android cache > internal storage vs. object cache

android cache > internal storage vs. object cache - java

i need to cache images (only 5 or up to 100) from the web and displayed in a listview. if the user selects a row of the listview the cache can be cleared. i had a look on some examples. some use external storage. some use internal and external. some objects..
so what are the advantages/disadvantages of internal storage ( http://developer.android.com/guide/topics/data/data-storage.html#filesInternal via getCacheDir()) and object cache (something like WeakHashMap or HashMap < String, SoftReference < Drawable > )?
a problem with softreferences seems to be that they may get gc'ed too fast ( SoftReference gets garbage collected too early) . what about the android internal storage? the reference sais "These files will be ones that get deleted first when the device runs low on storage.".
does it make any difference to use a object cache or the temporary internal storage? except for the object cache should be a bit faster

Here are the few differences between the two:
Object cache is faster than internal storage, but has lower capacity.
Object cache is transient in nature while internal storage has longer life span
Object cache takes the actual space in the heap. Internal storage doesn't. This is an important point, as making your object cache too large could cause the OutOfMemoryException even with SoftReference
Now given those differences, they are not totally mutually exclusive. A lot of we implemented is using multi layer caching especially related to the image loading. Here are the steps that we use:
If the image hasn't been cached, fetch from the URL and cache it in first level cache which is the SoftReference/WeakHashMap or even hard cache with limited size using LinkedHashMap
We then implement removeEldestEntry() in LinkedHashMap. Upon hitting the hard cache capacity, we move things to secondary cache which is the internal storage. Using this method, you don't have to refetch the image from the URL + it's still be faster and it frees up your memory
We ran a cleanup on timely basis on the background for the internal storage using LRU algorithm. You shouldn't rely on Android to clean this up for you.
We have made the multi layers caching a common component and have used it many of our projects for our clients. This technique is pretty much following to what L1, L2 cache in Computer Architecture.

You should look for the question "How do I lazy download images in ListView" and related.

Related

Load Pre-Load configuration using java

There are 20 production servers.Whenever the team make a config change and I requested to reload the configuration /restart the services to refresh the cache stored in hashmap..
When the actual transactions are hitting into the server, Will pick the configuration values from map to process the transactions instead of hitting DB every transaction.
I used the following code to connect each server. I am having a couple of questions and suggestions on this approach.
1) Is that logic is fine and stores large data in the memory will create any performance degradation?
2) is there any best approach could suggest on the logic?
httpurlcon = (HttpURLConnection) url.openConnection();
httpurlcon.setDoOutput(true);
httpurlcon.setRequestMethod("POST");
httpurlcon.setRequestProperty("Content-Type", "application/x-www-form-urlencoded");
httpurlcon.connect();

If you are implementing a cache on the server side using just a HashMap, it might be a better idea to look at using something like a LRU (Least Recently Used) cache, which is really easy to implement in Java using a LinkedHashMap.
Where the HashMap approach could potentially take up too much memory and give you a an out of memory approach, the least recently used cache will get rid of elements to stay at a certain size, prioritising recently used requests. Also, using just a HashMap, you'll want to synchronize all operations on data structure in order to ensure thread safety.
It would also be worth looking at tuning some parameters of the database itself. In MySQL for example, you can tune InnoDB's (the storage engine of MySQL) parameters such as the buffer pool size or the query cache. InnoDB's buffer pool is itself an LRU cache, and if the database is living on some server by itself, you can set the size of the buffer pool to be quite large and increase performance, since the data will be cached in memory.

When to use Java Cache and how it differs from HashMap?

I've gone through javax.cache.Cache to understand it's usage and behavior. It's stated that,
JCache is a Map-like data structure that provides temporary storage of
application data.
JCache and HashMap stores the elements in the local Heap memory and don't have persistence behavior by default. By implementing custom CacheLoader and CacheWriter we can achieve persistence. Other than that, When to use it?

Caches usually have more management logic than a map, which are nothing else but a more or less simple datastructure.
Some concepts, JCaches may implement
Expiration: Entries may expire and get removed from the cache after a certain period of time or since last use
Eviction: elements get removed from the cache if space is limited. There can be different eviction strategies .e. LRU, FIFO, ...
Distribution: i.e. in a cluster, while Maps are local to a JVM
Persistence: Elements in the cache can be persistent and present after restart, contents of a Map are just lost
More Memory: Cache implementations may use more memory than the JVM Heap provides, using a technique called BigMemory where objects are serialized into a separately allocated bytebuffer. This JVM-external memory is managed by the OS (paging) and not the JVM
option to store keys and values either by value or by reference (in maps you to handle this yourself)
option to apply security
Some of these some are more general concepts of JCache, some are specific implementation details of cache providers

Here are the five main differences between both objects.
Unlike java.util.Map, Cache :
do not allow null keys or values. Attempts to use null will result in a java.lang.NullPointerException
provide the ability to read values from a javax.cache.integration.CacheLoader (read-through-caching) when a
value being requested is not in a cache
provide the ability to write values to a javax.cache.integration.CacheWriter (write-through-caching) when a
value being created/updated/removed from a cache
provide the ability to observe cache entry changes
may capture and measure operational statistics
Source : GrepCode.com

Mostly, caching implementations keep those cached objects off heap (outside the reach of GC). GC keeps track of each and every object allocated in java. Imagine you have millions of objects in memory. If those objects are not off heap, believe me, GC will make your application performance horrible.

Java: use concurrent hashmap or memcached cache

I have a simple country states hashmap, which is a simple static final unmodifiable concurrent hashmap.
Now we have implemented memcached cache in our application.
My question is, Is it beneficial to get the values from cache instead of such a simple map?
What benefits I will get or not get if I move this map to cache?

This really depends on the size of the data and how much memory is you've allocated for your JVM.
For simple data like states of a country which are within a few hundred entries, a simple HashMap would suffice and using memcache is an overkill and in fact slower.
If it's large amount of data which grow (typically 10s/100s MBs or larger) and require frequent access, memcache (or any other persistent cache) would be better than an in-memory storage.

It will be much faster as a HashMap because it is stored in memory and the lookup can be done via the jvm by it's reference. The lookup from memcache would require extra work for the processor to look up the map.

If your application is hosted on only one server then you don't need distributed feature of memcache and HashMap will be damn fast. Stats
But this is not case of web applications. ~99% cases for web applications you host it on multiple servers and want to use distributed caching, memcache is best in such cases.

Lower the size of a Webflow persistance-context using a read-only second level cache

I'm trying to improve the performance of a Java 7,Spring 3.1,Webflow 3.1,Hibernate 3.6 application. The main issue is for each user there Web-flow persistance-context is very large and the application runs out of RAM quickly. Most of the data in the context is read only entity data, static data like labels, lists etc ... that can be shared across these Web-flow persistance-contexts.
After reading a number of posts on caching and performance I believe the best approach is to use a second level cache and mark these objects as read only. This will share these objects across each persistance-context.
However my question is when these objects are loaded from the shared cache does the Web-flow persistance-context share a reference to a global ready-only entity or will there be a copy of this read-only entity in each Web-flow persistance-context i.e the look up will be faster but the size of the contexts will stay the same?
Thanks

too much information in HttpSession

Hi what do you think about this problem?
We do have too much information in HttpSession, because much information is computed and a few large graph of objects are needed to store between requests finally.
Is it appropriate to use any cache like memcache or so? Or is it the same as increasing memory for JVM?
There's fear of storing it in DB between requests. What would you use if we are getting
OutOfMemory error?
Thank you.

I think the real point is the lifespan of your data.
Think about these two characteristics of the HttpSession:
When in a cluster, the container is responsible for replicating the HttpSession. This is good (you don't have to manage this yourself), but can be dangerous in terms of performance if this leads to too much exchanges... If your application is not clustered, forget about this point.
The lifespan of the HttpSession can be a few minutes or a few hours, that is while the user keeps active. This is perfect for information that has that lifespan (connection information, preferences, authorizations...). But it is not appropriate for data that is useful from one screen to the next, let's call it transient+ data.
If you have clustering needs, the database takes care of it. But beware, you can't cache anything in memory then.
Storing in the database has even longer lifespan (persistent between session, and even between reboots!), so the problem would be even worth (except you trade a memory problem for a performance problem).
I think this is the wrong approach for data whose lifespan is not expected to be persistent ...
Transient data
If data is useful only for one request, then it is typically stored in the HttpRequest, fine.
But if it is used for a few requests (interactions within one screen, or within a screen sequence like an assistant ..), the HttpRequest is too short to store it, but the HttpSession is too long. The data needs to be cleaned regularly.
And many memory problems in the HttpSession are related to such data that is transient but was not cleaned (forgotten at all, or not cleaned when an Exception, or when the user doesn't respect the regular flow: hits Back, use a previous bookmark, clic on a different menu or whatever).
Caching library to have the correct lifespan
To avoid this cleaning effort altogether (and avoid the risks of OutOfMemory when things go wrong), you can store information in a data structure that has the right lifespan. As the container doesn't provide this (it is application-related anyway), you need to implement this yourself using a cache library (like the ones mentioned; we use EhCache).
The idea is that you have a technical code (not related to one functional page, but implemented globally, such as with a ServletFilter ...) that ensures cleaning is always done after the objects are not needed any more.
You can design this cache using one (or several as needed) of the following policies for cleaning the cache. Each policy related to a functional lifespan:
for data only related to one screen (but several requests : reloading of the screen, Ajax requests ...), the cache can store data only for one screen at a time (for each session), call it "currentScreenCache". That guarantees that, if the user goes to another screen (even in an unmanaged way), the new screen will override the "currentScreenCache" information, and the previous information can be garbage-collected.
Implementation idea: each request must carry its screenId, and the technical code responsible for clearing the cache detects when, for the current HttpSession id, the current screenId doesn't match the one in the cache. Then it cleans or resets that item in the cache.
for data only used in a series of connected screens (call it a functional module), the same applies at the level of the module.
Implementation: same as before, every request has to carry the module id...
for data that is expensive to recompute, the cache library can be configured to store the last X computed ones (the previous ones are considered less-likely to be useful in the near-future). In typical usage, the same ones are asked for regularly, so you have many cache hits. On intensive use, the X limit is reached and the memory doesn't inflate, preventing OutOfMemory errors (at the expense of re-computation the next time).
Implementation: cache libraries support natively this limiting factor, and several more...
for data that is only valid for a few minutes, the cache library can natively be configured to discard it after that delay...
... many more, see the caching library configuration for other ideas.
Note: Each cache can be application-wide, or specific to a user, a HttpSession id, a Company id or other functional value...

It's true that HttpSession doesn't scale well but that's mainly in relation to clustering. It's a convenience but at some point yes you are better off using something like memcache or Terracotta or EHCache to persist data between requests (or between users).

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.