How to implement caching system? - java

I'm developing a web application where the backend, developed in spring boot, consumes data from a public API that returns data in JSON.
The search is done through terms, full-text (like a google), the backend receives from the application frontend to the user's query, which in turn searches the public API, waits for a response, handles the information and sends it to the frontend.
I wanted to implement the caching system in the backend, Spring Boot.
Basically, before the spring boot makes a call to the API publishes and wait for the response, it checks on a key/value system if the search has already been done in the past, if yes, return what is in the value of the key.
The caching system:
Key: terms of search, Value: json with API public response.
It has to persist data, not be volatile.
It has to be a key-value search (cache).
It has to be updated by a system other than the described one that updates the data of the cache, verifying if the data was changed in the base (Public API).
Initially I thought of using a NoSQL database, such as mongoDB. But after investigating better, I came across Redis. What do you think is best?
I would like to ask some suggestions to implement this architecture. I'm not sure how to implement it, I doubt both Redis or MongoDB or other.
Thanks.

I am not sure cache will help you much in this case, because of different forms of search words.
If you need to protect your backend from multiple executions of the same query, you can use Spring Cache.
It supports different providers including Redis and has evict mechanism

You can use MemCache. It's ready system for caching.

Related

Find out whose using Redis

We have one Redis for our company and multiple teams are using it. We are getting a surge of requests and nobody seems to know which application is causing it. We have only one password that goes around the whole company and our Redis is secured under a VPN so we know it's not coming from the outside.
Is there a way to know whose using Redis? Maybe we can pass in some headers with the connection from every app to identify who makes the most requests, etc.
We use Spring Data Redis for our communication.
This question is too broad since different strategies can be used here:
Use Redis MONITOR command. This is basically a built-in debugging tool that monitors all the commands executed by Redis
Use some kind of intermediate proxy. Instead of routing all the commands directly to redis - route everything to proxy that will do some processing like measuring the amounts of commands by the calling host or maybe types of commands depending what you want.
This is still only a configuration related solution so you won't need any changes at the level of applications
Since you have spring boot, you can use Micrometer / metering integration. This way you could create a counter / gauge that will get updated upon each request to Redis. If you also stream the metering data to tools like Prometheus, you'll be able to create a dashboard, say in grafana to see the whole picture. Micrometer can integrate also with other products, Prometheus/Grafana was only an example, you can chose any other solution (maybe in your organization you already have something like that).

(Architecture) Grabbing Data for angular2 app. Directly check MongoDB or my Java REST?

I have a quick architecture question as this is one of my first web applications.
On the frontend I have an Angular2 NodeJS app, backend I have a Java server aggregating some data for me in a MongoDB.
My question is simple. Should I create REST controllers in my java server to get data from the database? Or call the database directly from the Angular app.
I am leaning towards the Java REST idea. I just feel it is more secure, easier to do, and when I scale I can have processing done in Java when a rest call is made.
But I am worried this may slow things down too much? I can directly call the database and get info to put on my angular site. Does anyone know if this is a real concern for speed?
Keep in mind the data returned from the calls could be thousands of lines of JSON and hundreds of objects.
I think you can benefit from checking out this link:
https://www.mongodb.com/blog/post/building-your-first-application-mongodb-creating-rest-api-using-mean-stack-part-1
or
https://www.mongodb.com/blog/post/the-modern-application-stack-part-1-introducing-the-mean-stack?jmp=blog
As a side note - maybe it's just me - but I prefer Elastic to MongoDB - as it comes with Java-based REST API out of the box, and handles all the complexities of scalability and load balancing among nodes in the cluster.

How can I configure simple web cache for Java client application?

My application sends http requests to a webservice, but because the Terms of Service limit it to one query per second it is very important for me not to send more queries than I need. I put the results of some queries into a database that I check before trying the query again but some queries results are not well suited to putting in database so I would like some sort of dumb cache that would intercept my webservice calls and if the call was a duplicate just send the results of the previous call. I would expect to be able to configure the size of the cache and have it automatically remove the oldest entry if it fills up, it would be great if the cache could be configured as a file rather than use heap memory because my application is already quite memory intensive
For a simple caching solution try Google Guava libraries. The CacheBuilder/CacheLoader could be configured to your requirements. Guava provides a simple caching solution that is more sofisticated than java's own HashMap but light weight when compared to Ehcache and others. This cache could be used in a web service request interceptor that helps to decide whether to initiate a web service call.
A good tutorial with an example for Guava cache could be found here

any benefit from using Hazelcast instead of MongoDB to store user sessions/keys?

We are running mongodb instance to store data in a collections, no problems with it and mongo is our main data storage.
Today, we are going to develop Oauth2 support for the product and have to store the user sessions (security key, access token and etc.. ) and the access token have to be validated against the authentication server only after the defined timeout so that not every request will wait for validation by authentication server.
First request for secured resource (create) shall always be authenticated against the authentication server. Any subsequent request will be validated internally (cache) and check the internal timeout and only if expired, another request to the authentication server will be issued.
To solve that requirements, we have to introduce some kind of a distributed cache, to store (with TTL support) the user sessions and etc, expire it based on a ttl.. .i wrote about that above.
Two options here:
store user session in the hazelcast and share it across all App servers - nice choice, to persists all user session in eviction map.
store user sessions in MongoDb - and do the same.
Do you see any benefits of using Hazelcast instead of storing the temp data inside Mongo? Any significant performance improvements you're aware of ?
I'm new to Hazelcast, so don't aware about all killer features.
Disclaimer: I am the founder of Hazelcast...
Hazelcast is much simpler and simplicity matters a lot.
You can embed Hazelcast into your application (if your application is
written in Java). No need to deploy and maintain remote nosql
cluster.
Hazelcast works directly with your application objects. No
JSON or any other format. Write and read java objects.
You can
execute Java code on your in-memory data. No need to fetch and
process data; send your code over to the data.
You can listen for
the updates on your data. "Notify me when this map or key is
updated".
Hazelcast has rich set of data structures like queue,
topic, semaphores, locks, multimap etc. Imagine sharing a queue
across multiple nodes and be able to do blocking queue poll/take
operation... this is really cool :)
Hazelcast is an in-memory grid so it should be significantly faster than MongoDB for that kind of usage. They also have pre-made session clustering code for Java servlets if you do not want to create that yourself.
Code for the session clustering here on github. Or here for Maven artifact.

Are there any design patterns that could work in this scenario?

We have a system (Java web application) that's been in active development / maintenance for a long time now (something like ten years).
What we're looking at doing is implementing a RESTful API to the web app. This web application, using Jersey, will be a separate project with the intent that it should be able to run alongside the main application or deployed in the cloud.
Because of the nature and age of our application, we've had to implement a (somewhat) comprehensive caching layer on top of the database (postgres) to help keep load down. Anyway, for the RESTful API, the idea is that GET requests will go to the cache first instead of the database to keep load of the database.
The cache will be populated in a way to help ensure that most things registered API users will need should be in there.
If there is a cache miss, the needed data should be retrieved from the database (also being entered into the cache in the process).
Obviously, this should remain transparent from the RESTful endpoint methods in my code. We've come up with the idea of creating a 'Broker' to handle communications with the DB and the cache. The REST layer will simply pass across ids (if looking to retrieve) or populated Java objects (if looking to insert / update) and the broker will take care of retrieving / updating / invalidating, etc.
There is also the issue of extensibility. To begin with, the API will be living alongside the rest of servers so access to the database won't be an issue however if we deploy to the cloud, we're going to need a different Broker implementation that will communicate with the system (namely the database) in a different manner (potentially through the use of an internal API).
I already have a rough idea on how I can implement this but it struck me that is probably a problem for which a suitable pattern could exist. If I could follow an established pattern as opposed to coming up with my own solution, that'll probably be a better choice. Any ideas?
Ehcache has an implementation of just such a cache that it calls a SelfPopulatingCache.
Requests are made to the cache, not to the database. Then if there is a cache miss Ehcache will call the database (or whatever external data source you have) on your behalf.
You just need to implement a CacheEntryFactory which has a single method:
Object createEntry(Object key) throws Exception;
So as the name suggests, Ehcache implements this concept with a pretty standard factory pattern...
There's no pattern. Just hide the initial DB services behind interfaces, build tests around their intended behavior, then switch in an implementation that uses the caching layer. I guess dependency injection would be the best thing to help you do that?
Sounds like decorator pattern will suit your need: http://en.wikipedia.org/wiki/Decorator_pattern.
You can create an DAO interface for data access, something like:
Value get(long id);
And firstly create a direct DB implementation, then create a Cache implementation which calls underlying DAO instance, in this case it should be the DB implementation.
The Cache implementation will try to get value from its own managed Cache, and from underlaying DAO if it fails.
So both of your old application or the REST will only see DAO interface, without knowing any implemntation details, and in future you can change the implementation freely.
The best design pattern for transparently caching HTTP requests is to use an HTTP cache.

Categories