Query Results caching using Java : Any better Approaches?

Query Results caching using Java : Any better Approaches? - java

I have a class ClassA which calls another class(DAO) to fetch the query results. In a specific business scenario,
ClassA invokes the DAO with queryparameters about 20,000 times.
Out of this, about 10,000 times ClassA sends same set of query parameters to DAO. Obviously Resultset will be the same and can be cached.
The following is the code I implemented.
Class A
{
.
.
.
.
Map<String, CachData> cachDataMap= new HashMap<String, CachData>();
priavate void getQueryResults(String queryParam)
{
try {
Set<String> cacheSet = cachDataMap.keySet();
CachData cachData = null;
if(!cacheSet.contains(queryParam))
{
dao.getResuslts((queryParam)));
cachData = new CachData();
cachData.setResult0(__getStringResult(0));
cachData.setResult1(__getStringResult(1));
cachData.setResult2(__getStringResult(2));
cachData.setResult3(__getStringResult(3));
cachData.setResult4(__getStringResult(4));
cachData.setResult5(__getStringResult(5));
cachDataMap.put(queryParam, cachData);
}else
{
cachData = cachDataMap.get(queryParam);
}
} catch(Exception e) {
//handle here
}
}
Do we have anyother better solution other than using any framework? A better datastructure or better method.. For good performance?

You could use ehcache.
Whatever you do don't use a Map as your interface for a cache. A good cache interface would allow the implementation to do cleanup of the cache. The Map contract won't allow this.
Depending on the implementation, cleanup can be based on time in the cache, or based on usage statistics, or memory availability, ...
The Map approach you're using here seems prone to go out of memory over a longer period of usage.

You could use the Table from guava library and use ehCache to save the object as it is with the key being the query.

Related

How to create a reusable Map

Is there a way to populate a Map once from the DB (through Mongo repository) data and reuse it when required from multiple classes instead of hitting the Database through the repository.

As per your comment, what you are looking for is a Caching mechanism. Caches are components which allow data to live in memory, as opposed to files, databases or other mediums so as to allow for the fast retrieval of information (against a higher memory footprint).
There are probably various tutorials online, but usually caches all have the following behaviour:
1. They are key-value pair structures.
2. Each entity living in the cache also has a Time To Live, that is, how long will it considered to be valid.
You can implement this in the repository layer, so the cache mechanism will be transparent to the rest of your application (but you might want to consider exposing functionality that allows to clear/invalidate part or all the cache).
So basically, when a query comes to your repository layer, check in the cache. If it exists in there, check the time to live. If it is still valid, return that.
If the key does not exist or the TTL has expired, you add/overwrite the data in the cache. Keep in mind that when updating the data model yourself, you also invalidate the cache accordingly so that new/fresh data will be pulled from the DB on the next call.

You can declare the map field as public static and this would allow application wide access to hit via ClassLoadingData.mapField
I think a better solution, if I understood the problem would be a memoized function, that is a function storing the value of its call. Here is a sketch of how this could be done (note this does not handle possible synchronization problem in a multi threaded environment):
class ClassLoadingData {
private static Map<KeyType,ValueType> memoizedValues = new HashMap<>();
public Map<KeyType,ValueType> getMyData() {
if (memoizedData.isEmpty()) { // you can use more complex if to handle data refresh
populateData(memoizedData);
} else {
return memoizedData;
}
}
private void populateData() {
// do your query, and assign result to memoizedData
}
}

Premise: I suggest you to use an object-relational mapping tool like Hibernate on your java project to map the object-oriented
domain model to a relational database and let the tool handle the
cache mechanism implicitally. Hibernate specifically implements a multi-level
caching scheme ( take a look at the following link to get more
informations:
https://www.tutorialspoint.com/hibernate/hibernate_caching.htm )
Regardless my suggestion on premise you can also manually create a singleton class that will be used from every class in the project that goes to interact with the DB:
public class MongoDBConnector {
private static final Logger LOGGER = LoggerFactory.getLogger(MongoDBConnector.class);
private static MongoDBConnector instance;
//Cache period in seconds
public static int DB_ELEMENTS_CACHE_PERIOD = 30;
//Latest cache update time
private DateTime latestUpdateTime;
//The cache data layer from DB
private Map<KType,VType> elements;
private MongoDBConnector() {
}
public static synchronized MongoDBConnector getInstance() {
if (instance == null) {
instance = new MongoDBConnector();
}
return instance;
}
}
Here you can define then a load method that goes to update the map with values stored on the DB and also a write method that instead goes to write values on the DB with the following characteristics:
1- These methods should be synchronized in order to avoid issues if multiple calls are performed.
2- The load method should apply a cache period logic ( maybe with period configurable ) to avoid to load for each method call the data from the DB.
Example: Suppose your cache period is 30s. This means that if 10 read are performed from different points of the code within 30s you
will load data from DB only on the first call while others will read
from cached map improving the performance.
Note: The greater is the cache period the more is the performance of your code but if the DB is managed you'll create inconsistency
with cache if an insertion is performed externally ( from another tool
or manually ). So choose the best value for you.
public synchronized Map<KType, VType> getElements() throws ConnectorException {
final DateTime currentTime = new DateTime();
if (latestUpdateTime == null || (Seconds.secondsBetween(latestUpdateTime, currentTime).getSeconds() > DB_ELEMENTS_CACHE_PERIOD)) {
LOGGER.debug("Cache is expired. Reading values from DB");
//Read from DB and update cache
//....
sampleTime = currentTime;
}
return elements;
}
3- The store method should automatically update the cache if insert is performed correctly regardless the cache period is expired:
public synchronized void storeElement(final VType object) throws ConnectorException {
//Insert object on DB ( throws a ConnectorException if insert fails )
//...
//Update cache regardless the cache period
loadElementsIgnoreCachePeriod();
}
Then you can get elements from every point in your code as follow:
Map<KType,VType> liveElements = MongoDBConnector.getElements();

Should I always have a separate "DataService" that make invokes another service?

I am building a new RESTful service that interacts with other microservices.
The routine task is to fetch some data from another RESTful service, filter it, match it against existing data and return a response.
My question is is it a good design pattern to always separate steps "get data" and "filter it" in two different classes and name one is as EntityDataService and the second one is simply EntityService?
For instance, I can make a call to a service that returns a list of countries that has to be filtered against some conditions as inclusion in EU or date of creation, etc.
In this case, which option is better:
separate CountryDataService class that only have one method
getAllCountries and EUCountryService that filters them
make one class CountryService with public methods getEUCountries and
getCountriesCreatedInDateRange and private getAllCountries
Which one is better?
I'm trying to follow KISS pattern but also want to make my solution maintainable and extensible.

In systems with lots of data, having a method getAllSomething is not that good of an idea.
If you don't have lots of data, it's ok to have it, but still be careful.
If you have 50 records, it's not that bad, but if you have millions of records that whould be a problem.
Having a Service or Repository with methods getBySomeCriteria is the better way to go.
If you have lots of different queries that you want to perform, so you may end up with lots of methods: getByCriteria1, getByCriteria2,..., getByCriteria50. Also, each time you need a different query you will have to add a new method to the Service.
In this case you can use the Specification Pattern. Here's an example:
public enum Continent { None, Europe, Africa, NorthAmerica, SouthAmerica, Asia }
public class CountrySpecification {
public DateRange CreatedInRange { get; set; }
public Continent Location { get; set; }
}
public class CountryService {
public IEnumerable<Country> Find(CountrySpecification spec) {
var url = "https://api.myapp.com/countries";
url = AddQueryParametersFromSpec(url, spec);
var results = SendGetRequest(url);
return CreateCountryFromApiResults(results);
}
}

Jpa Specification to find subset of field's value

I am writing a webapp using Spring Data JPA on persistence layer, more specifically, my DAOs extends the JpaSpecificationExecutor interface, so I am able to implement some kind of filter; imagine list of Items with several attributes (I omit annotations and other metadata for sake of clarity):
data class Item(var tags: MutableList<String>)
On my service layer, my filter method looks like this:
fun findBy(tagsToFilterBy: List<String>): List<Items> {
return dao.findAll { root, query, builder ->
builder.//??
}
}
What I want to achieve is to retrieve only Items that contain exactly that tagsToFilterBy, in other words, tagsToFilterBy should be a subset of Item.tags.
I know about isMember(...) method, but I think that it's usage wouldn't be very pleasant with many tags as it accepts only single "entity" at call. Could you advice me something?
My other question is that whether it is safe to use user input directly in let's say builder.like(someExpression, inputFromUser) or I have to put it in builder.parameter(...) and then query.setParameter(...).
Thank you for any idea

So I managed to write by myself. I'm not saying that it is pretty, but it is the prettiest one, I could come with:
dao.findAll { root, query, builder ->
val lst = mutableListOf<Predicate>()
val tagsPath = root.get<List<Tag>>("tags")
tagsToFilterBy.forEach {
lst.add(cb.isMember(it, tagsPath))
}
cb.or(*lst.toTypedArray())
}
This is basically going through the given tags, and checking whether it is a member of tags or not.

One way is to use filter and test each element to see if your filter list contains it.
val result = dao.filter { tagsToFilterBy.contains(it.tag) }
To speed it up, you could force sort your filter list, and maybe use binarySearch, but performance improvement (or not) would depend on the size of the filter list. For example, assuming tagsToFilterBy is sorted, then:
val result2 = dao.filter { tagsToFilterBy.binarySearch(it.tag) >= 0 }
The Kotlin Collection page describes each of these extension methods.

Asynchronous multiple query from different datasources or databases

I'm having trouble to find appropriate solution for that:
I have several databases with the same structure but with different data. And when my web app execute a query, it must separate this query for each database and execute it asynchronously and then aggregate results from all databases and return it as single result. Additionaly I want to be able to pass a list of databases where query would be executed and also I want to pass maximum expiration time for query executing. Also result must contains meta information for each databases such as excess execution time.
It would be great if it possible to use another datasource such as remote web service with specific API, rather than relational database.
I use Spring/Grail and need java solution but I will be glad to any advice.
UPD: I want to find prepared solution, maybe framework or something like that.

This is basic OO. You need to abstract what you are trying to achieve - loading data - from the mechanism you are using to achieve - a database query or a web-service call.
Such a design would usually involve an interface that defines the contract of what can be done and then multiple implementing classes that make it happen according to their implementation.
For example, you'd end up with an interface that looked something like:
public interface DataLoader
{
public Collection<Data> loadData() throws DataLoaderException;
}
You would then have implementations like JdbcDataLoader, WebServiceDataLoader, etc. In your case you would need another type of implementation that given one or more instances of DataLoader, runs each sumulatiously aggregating the results. This implementation would look something like:
public class AggregatingDataLoader implements DataLoader
{
private Collection<DataLoader> dataLoaders;
private ExecutorService executorService;
public AggregatingDataLoader(ExecutorService executorService, Collection<DataLoader> dataLoaders)
{
this.executorService = executorService;
this.dataLoaders = dataLoaders;
}
public Collection<Data> loadData() throws DataLoaderException
{
Collection<DataLoaderCallable>> dataLoaderCallables = new ArrayList<DataLoaderCallable>>();
for (DataLoader dataLoader : dataLoaders)
{
dataLoaderCallables.add(new DataLoaderCallable(dataLoader));
}
List<Future<Collection<Data>>> futures = executorService.invokeAll(dataLoaderCallables);
Collection<Data> data = new ArrayList<Data>();
for (Future<Collection<Data>> future : futures)
{
add.addAll(future.get());
}
return data;
}
private class DataLoaderCallable implements Callable<Collection<Data>>
{
private DataLoader dataLoader;
public DataLoaderCallable(DataLoader dataLoader)
{
this.dataLoader = dataLoader;
}
public Collection<Data> call()
{
return dataLoader.load();
}
}
}
You'll need to add some timeout and exception handling logic to this, but you get the gist.
The other important thing is your call code should only ever use the DataLoader interface so that you can swap different implementations in and out or use mocks during testing.

Record method calls in one session for replaying in future test sessions?

I have a backend system which we use a third-party Java API to access from our own applications. I can access the system as a normal user along with other users, but I do not have godly powers over it.
Hence to simplify testing I would like to run a real session and record the API calls, and persist them (preferably as editable code), so we can do dry test runs later with API calls just returning the corresponding response from the recording session - and this is the important part - without needing to talk to the above mentioned backend system.
So if my application contains line on the form:
Object b = callBackend(a);
I would like the framework to first capture that callBackend() returned b given the argument a, and then when I do the dry run at any later time say "hey, given a this call should return b". The values of a and b will be the same (if not, we will rerun the recording step).
I can override the class providing the API so all the method calls to capture will go through my code (i.e. byte code instrumentation to alter behavior of classes outside my control is not necessary).
What framework should I look into to do this?
EDIT: Please note that bounty hunters should provide actual code demonstrating the behavior I look for.

Actually You can build such framework or template, by using proxy pattern. Here I explain, how you can do it using dynamic proxy pattern. The idea is to,
Write a proxy manager to get recorder and replayer proxies of API on demand!
Write a wrapper class to store your collected information and also implement hashCode and equals method of that wrapper class for efficient lookup from Map like data structure.
And finally use recorder proxy to record and replayer proxy for replaying purpose.
How recorder works:
invokes the real API
collects the invocation information
persists data in expected persistence context
How replayer works:
Collect the method information (method name, parameters)
If collected information matches with previously recorded information then return the previously collected return value.
If returned value does not match, persist the collected information (As you wanted).
Now, lets look at the implementation. If your API is MyApi like bellow:
public interface MyApi {
public String getMySpouse(String myName);
public int getMyAge(String myName);
...
}
Now we will, record and replay the invocation of public String getMySpouse(String myName). To do that we can use a class to store the invocation information like bellow:
public class RecordedInformation {
private String methodName;
private Object[] args;
private Object returnValue;
public String getMethodName() {
return methodName;
}
public void setMethodName(String methodName) {
this.methodName = methodName;
}
public Object[] getArgs() {
return args;
}
public void setArgs(Object[] args) {
this.args = args;
}
public Object getReturnValue() {
return returnType;
}
public void setReturnValue(Object returnValue) {
this.returnValue = returnValue;
}
#Override
public int hashCode() {
return super.hashCode(); //change your implementation as you like!
}
#Override
public boolean equals(Object obj) {
return super.equals(obj); //change your implementation as you like!
}
}
Now Here comes the main part, The RecordReplyManager. This RecordReplyManager gives you proxy object of your API , depending on your need of recording or replaying.
public class RecordReplyManager implements java.lang.reflect.InvocationHandler {
private Object objOfApi;
private boolean isForRecording;
public static Object newInstance(Object obj, boolean isForRecording) {
return java.lang.reflect.Proxy.newProxyInstance(
obj.getClass().getClassLoader(),
obj.getClass().getInterfaces(),
new RecordReplyManager(obj, isForRecording));
}
private RecordReplyManager(Object obj, boolean isForRecording) {
this.objOfApi = obj;
this.isForRecording = isForRecording;
}
#Override
public Object invoke(Object proxy, Method method, Object[] args) throws Throwable {
Object result;
if (isForRecording) {
try {
System.out.println("recording...");
System.out.println("method name: " + method.getName());
System.out.print("method arguments:");
for (Object arg : args) {
System.out.print(" " + arg);
}
System.out.println();
result = method.invoke(objOfApi, args);
System.out.println("result: " + result);
RecordedInformation recordedInformation = new RecordedInformation();
recordedInformation.setMethodName(method.getName());
recordedInformation.setArgs(args);
recordedInformation.setReturnValue(result);
//persist your information
} catch (InvocationTargetException e) {
throw e.getTargetException();
} catch (Exception e) {
throw new RuntimeException("unexpected invocation exception: " +
e.getMessage());
} finally {
// do nothing
}
return result;
} else {
try {
System.out.println("replying...");
System.out.println("method name: " + method.getName());
System.out.print("method arguments:");
for (Object arg : args) {
System.out.print(" " + arg);
}
RecordedInformation recordedInformation = new RecordedInformation();
recordedInformation.setMethodName(method.getName());
recordedInformation.setArgs(args);
//if your invocation information (this RecordedInformation) is found in the previously collected map, then return the returnValue from that RecordedInformation.
//if corresponding RecordedInformation does not exists then invoke the real method (like in recording step) and wrap the collected information into RecordedInformation and persist it as you like!
} catch (InvocationTargetException e) {
throw e.getTargetException();
} catch (Exception e) {
throw new RuntimeException("unexpected invocation exception: " +
e.getMessage());
} finally {
// do nothing
}
return result;
}
}
}
If you want to record the method invocation, all you need is getting an API proxy like bellow:
MyApi realApi = new RealApi(); // using new or whatever way get your service implementation (API implementation)
MyApi myApiWithRecorder = (MyApi) RecordReplyManager.newInstance(realApi, true); // true for recording
myApiWithRecorder.getMySpouse("richard"); // to record getMySpouse
myApiWithRecorder.getMyAge("parker"); // to record getMyAge
...
And to replay all you need:
MyApi realApi = new RealApi(); // using new or whatever way get your service implementation (API implementation)
MyApi myApiWithReplayer = (MyApi) RecordReplyManager.newInstance(realApi, false); // false for replaying
myApiWithReplayer.getMySpouse("richard"); // to replay getMySpouse
myApiWithRecorder.getMyAge("parker"); // to replay getMyAge
...
And You are Done!
Edit:
The basic steps of recorder and replayers can be done in above mentioned way. Now its upto you, that how you want to use or perform those steps. You can do what ever you want and whatever you like in the recorder and replayer code blocks and just choose your implementation!

I should prefix this by saying I share some of the concerns in Yves Martin's answer: that such a system may prove frustrating to work with and ultimately less helpful than it would seem at first blush.
That said, from a technical standpoint, this is an interesting problem, and I couldn't not take a go at it. I put together a gist to log method calls in a fairly general way. The CallLoggingProxy class defined there allows usage such as the following.
Calendar original = CallLoggingProxy.create(Calendar.class, Calendar.getInstance());
original.getTimeInMillis(); // 1368311282470
CallLoggingProxy.ReplayInfo replayInfo = CallLoggingProxy.getReplayInfo(original);
// Persist the replay info to disk, serialize to a DB, whatever floats your boat.
// Come back and load it up later...
Calendar replay = CallLoggingProxy.replay(Calendar.class, replayInfo);
replay.getTimeInMillis(); // 1368311282470
You could imagine wrapping your API object with CallLoggingProxy.create prior to passing it into your testing methods, capturing the data afterwards, and persisting it using whatever your favorite serialization system happens to be. Later, when you want to run your tests, you can load the data back up, create a new instance based on the data with CallLoggingProxy.replay, and passing that into your methods instead.
The CallLoggingProxy is written using Javassist, as Java's native Proxy is limited to working against interfaces. This should cover the general use case, but there are a few limitations to keep in mind:
Classes declared final can't be proxied by this method. (Not easily fixable; this is a system limitation)
The gist assumes the same input to a method will always produce the same output. (More easily fixable; the ReplayInfo would need to keep track of sequences of calls for each input instead of single input/output pairs.)
The gist is not even remotely threadsafe (Fairly easily fixable; just requires a little thought and effort)
Obviously the gist is simply a proof of concept, so it's also not been very thoroughly tested, but I believe the general principle is sound. It's also possible there's a more fully baked framework out there to achieve this sort of goal, but if such a thing does exist, I'm not aware of it.
If you do decide to continue with the replay approach, then hopefully this will be enough to give you a possible direction to work in.

I had the same needs some months ago for non-regression testing when planning a heavy technical refactoring of a large application and... I have found nothing available as a framework.
In fact, replaying may be particularly difficult and may only work in a specific context - no (or few) application with a standard complexity can be really considered as stateless. It is a common problem when testing persistence code with a relational database. To be relevant, the complete system initial state must be restored and each replay step must impact the global state the same way. It becomes a challenge when a system state is distributed into pieces like databases, files, memory... Let's guess what happens if a timestamp taken from a system's clock is used somewhere !
So a more pratical option is to only record... and then do a clever comparison for subsequent runs.
Depending of the number of runs you plan, a human-driven session on the application may be enough, or you have to investing in an automated scenario in a robot playing with your application user interface.
First to record: you can use dynamic proxy interface or aspect programming to intercept method call and to capture state before and after invocation. It may mean: dump concerned database tables, copy some files, serialize Java objects in text format like XML.
Then compare this reference capture with a new run. This comparison should be tuned to exclude any irrelevant elements from each piece of state, like row identifiers, timestamps, file names... to only compare data where your backend's added value shines.
Finally nothing really standard, and often a few specific scripts and codes may be enough to achieve the aim: detect as much errors as possible and try to prevent non-expected side-effects.

This can be done with AOP, aspect oriented programming. It allows to intercept method calls by byte code manipulation. Do a bit of search for examples.
In one case this can do recording, in the other replaying.
Pointers: wikipedia, AspectJ, Spring AOP.
Unfortunately one moves a bit outside the java syntax, and a simple example can better be sought elsewhere. With explanation.
Maybe combined with unit tests / some mocking test framework for offline testing with recorded data.

you could look into 'Mockito'
Example:
//You can mock concrete classes, not only interfaces
LinkedList mockedList = mock(LinkedList.class);
//stubbing
when(mockedList.get(0)).thenReturn("first");
when(mockedList.get(1)).thenThrow(new RuntimeException());
//following prints "first"
System.out.println(mockedList.get(0));
//following throws runtime exception
System.out.println(mockedList.get(1));
//following prints "null" because get(999) was not stubbed
System.out.println(mockedList.get(999));
after you could replay each test more times and it will return data that you put in.

// pseudocode
class LogMethod {
List<String> parameters;
String method;
addCallTo(String method, List<String> params):
this.method = method;
parameters = params;
}
}
Have a list of LogMethods and call new LogMethod().addCallTo() before every call in your test method.

The idea of playing back the API calls sounds like a use case for the event sourcing pattern. Martin Fowler has a good article on it here. This is a nice pattern that records events as a sequence of objects which are then stored, you can then replay the sequence of events as required.
There is an implementation of this pattern using Akka called Eventsourced, which may help you build the type of system you require.

I had a similar problem some years ago. None of the above solutions would have worked for methods that are not pure functions (side effect free). The major task is in my opinion:
how to extract a snapshot of the recorded object(s) (not only restricted to objects implementing Serializable)
how to generate test code of a serialized representation in a readable way (not only restricted to beans, primitives and collections)
So I had to go my own way - with testrecorder.
For example, given:
ResultObject b = callBackend(a);
...
ResultObject callBackend(SourceObject source) {
...
}
you will only have to annotate the method like this:
#Recorded
ResultObject callBackend(SourceObject source) {
...
}
and start your application (the one that should be recorded) with the testrecorder agent. Testrecorder will manage all tasks for you, such as:
serializing arguments, results, state of this, exceptions (complete object graph!)
finding a readable representation for object construction and object matching
generating a test from the serialized data
you can extend recordings to global variables, input and output with annotations
An example for the test will look like this:
void testCallBackend() {
//arrange
SourceObject sourceObject1 = new SourceObject();
sourceObject1.setState(...); // testrecorder can use setters but is not limited to them
... // setting up backend
... // setting up globals, mocking inputs
//act
ResultObject resultObject1 = backend.callBackend(sourceObject1);
//assert
assertThat(resultObject, new GenericMatcher() {
... // property matchers
}.matching(ResultObject.class));
... // assertions on backend and sourceObject1 for potential side effects
... // assertions on outputs and globals
}

If I understood you question correctly, you should try db4o.
You will store the objects with db4o and restore later to mock and JUnit tests.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.