Add failover to Java microservice - java

I have a Java miroservice, which will be deployed as multiple service instances. The microservice works statelessly in a simple way: fetch a request according to predefined criteria from DB, process the request, fetch next and process it, and so on.
I consider now to add failover to the service. I might need add extra information(like processor_id) to the request if it's processed by some instance. So if the instance is determined without response, the request can be taken by other instances. I also need add heartbeat to the microservice. Maybe I can leverage Apache Zookeeper or Curator to achieve this.
But I don't know how to make different pieces work together. It'll be better if there are examples in Java.

What i have seen in my career is that there is a load balancer to distribuite the requests across the microservices. The load balancer knows the availability of the microservices invoking a status end point which returns an http status. If there is no response, or the response is negative, the load balancer excludes the microservice form the group.

Related

Kubernetes liveness - Reserve threads/memory for a specific endpoint with Spring Boot

Do you know (if it is possible) how to reserve threads/memory for a specific endpoint in a spring boot microservice?
I've one microservice that accepts HTTP Requests via Spring MVC, and those requests trigger http calls to 3rd system, which sometimes is partially degraded, and it responds very slow. I can't reduce the timeout time because there are some calls that are very slow by nature.
I've the spring-boot-actuator /health endpoint enabled and I use it like a container livenessProbe in a kubernetes cluster. Sometimes, when the 3rd system is degraded, the microservice doesn't respond to /health endpoint and kubernetes restarts my service.
This is because I'm using a RestTemplate to make HTTP calls, so I'm continuously creating new threads, and JVM starts to have problems with the memory.
I have thought about some solutions:
Implement a high availability “/health” endpoint, reserve threads, or something like that.
Use an async http client.
Implement a Circuit Breaker.
Configure custom timeouts per 3rd endpoint that I'm using.
Create other small service (golang) and deploy it in the same pod. This service is going to process the liveness probe.
Migrate/Refactor services to small services, and maybe with other framework/languages like Vert.x, go, etc.
What do you think?
The actuator health endpoint is very convenient with Spring boot - almost too convenient in this context as it does deeper health checks than you necessarily want in a liveness probe. For readiness you want to do deeper checks but not liveness. The idea is that if the Pod is overwhelmed for a bit and fails readiness then it will be withdrawn from the load balancing and get a breather. But if it fails liveness it will be restarted. So you want only minimal checks in liveness (Should Health Checks call other App Health Checks). By using actuator health for both there is no way for your busy Pods to get a breather as they get killed first. And kubernetes is periodically calling the http endpoint in performing both probes, which contributes further to your thread usage problem (do consider the periodSeconds on the probes).
For your case you could define a liveness command and not an http probe - https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-probes/#define-a-liveness-command. The command could just check that the Java process is running (so kinda similar to your go-based probe suggestion).
For many cases using the actuator for liveness would be fine (think apps that hit a different constraint before threads, which would be your case if you went async/non-blocking with the reactive stack). Yours is one where it can cause problems - the actuator's probing of availability for dependencies like message brokers can be another where you get excessive restarts (in that case on first deploy).
I have a prototype just wrapping up for this same problem: SpringBoot permits 100% of the available threads to be filled up with public network requests, leaving the /health endpoint inaccessible to AWS load balancer which knocks the service offline thinking it's unhealthy. There's a different between unhealthy and busy... and health is more than just a process running, port listening, superficial check, etc - it needs to be a "deep ping" which checks that it and all its dependencies are operable in order to give a confident health check response back.
My approach to solving the problem is to produce two new auto-wired components, the first to configure Jetty with a fixed, configurable maximum number of threads (make sure your JVM is allocated enough memory to match), and the second to keep a counter of each request as it starts and completes, throwing an Exception which maps to an HTTP 429 TOO MANY REQUESTS response if the count approaches a ceiling which is the maxThreads - reserveThreads. Then I can set reserveThreads to whatever I want and the /health endpoint is not bound by the request counter, ensuring that it's always able to get in.
I was just searching around to figure out how others are solving this problem and found your question with the same issue, so far haven't seen anything else solid.
To configure Jetty thread settings via application properties file:
http://jdpgrailsdev.github.io/blog/2014/10/07/spring_boot_jetty_thread_pool.html
Sounds like your Microservice should still respond to health checks /health whilist returning results from that 3rd service its calling.
I'd build an async http server with Vert.x-Web and try a test before modifying your good code. Create two endpoints. The /health check and a /slow call that just sleeps() for like 5 minutes before replying with "hello". Deploy that in minikube or your cluster and see if its able to respond to health checks while sleeping on the other http request.

Light4J Total Requests Processed

I'm using Light-4J as microserver, sitting between my clients and a 3rd party API. Everything is setup and working, the clients are able to POST requests and responses are sent in reply.
However I want to know how many requests have been processed since the server started. Since I use Log4j to each successful API call I thought I might be able to read the number of lines in the log file. This works but is not accurate since I discovered that other processes are also writing to the file so the total is skewed.
Is there another way to get the data I require without me having to ensure that my requests have exclusive access to a log file?
light-4j supports metrics that can be pushed to influxdb or pulled by prometheus. You can enable it in your microservice service.yml or handler.yml (if you are using release 1.5.18 or later)
https://www.networknt.com/concern/metrics/
https://www.networknt.com/concern/prometheus/
If you generate the project from light-codegen, then the Influxdb metrics is wired in but disabled. You just need to install an InfluxDB instance and enabled it in your microservice.
Also, if you only need to proxy to your backend service, light-proxy might be the way to go unless you have some business logic in your microservice.

OData Olingo (API-) wrapper for different systems

Introduction
I want to set up a Olingo oData service (2.0, Java). The service has a fixed model defintion in its own package. I also have an User management in Java. When the user sends a request to the Service, the result metadata stays the same (model), but the data can come from from different systems.
That means:
User "John" receives data from System1
User "Adam" receives data from System2
Problem
What is the best practice to archieve such a system of "API-Wrapper" for different services? There can be a system (System1) that can also work with OData, so we only "forward" the request? On the other system (System2) there is a "special API" I must build with raw GET parameters, handle filters, and and and.
Is this possbile with Olingo? Is it possible to forward batch requests to System1? And System2 has its own implementation of batch requests?
More infos: I am working with SAP HANA Cloud Platform and want to work with different backend systems.
I see no problem with this scenario. You will just have to send requests from your Java app to your different systems based on the logged in user, then parse the data and send it back to the user.
It shouldn't be an issue to forward your OData requests to System1, and create a specific GET request for calls to System2, either.
You will have to do quite a this manually (there's no magic method for forwarding an incoming request to another OData service), but it should be very much doable.
I am in fact doing something similar on HCP as well (my OData service has two datasources, one being a database, the other a remote system with which I communicate using web services).
Update: you will probably need to expose your OData service with JDBC for flexibility. JPA is very much bound to the database objects.

Centralized Logging - Correlating Messages Across Servers

We have a very distributed system. A user's request on the site may involve calls to several services. For example, when a user log onto the site, calls are made to ads service, personalization service, related news service, etc. to construct the data needed for display up on login. High-level design: A request to a URL is mapped to a Spring MVC Controller and that controller makes the call (most of them using HttpClient) to different services.
We are implementing a centralized log solution using Logstash, ElasticSearch, Kibana, Log4j/SLF4J. When a issue is reported on the site, we want to be able to change log level to debug and see the log messages for a particular request across all services. We are populating request id in Log4j MDC, so we are able to identify log messages for that particular request on the webapp server. How do I go about correlating messages from the calls made to other services?
Flow:
User log in --> request mapped to Spring MVC Controller which logs messages by populating request id in Log4j MDC --> http client calls to service1, service2, service 3
How to correlate messages from service1, service2, service3 to the messages logged by MVC controller. One solution is to pass the request id in http client calls. There are lot of applications that follow this paradigm so changing code everywhere is not an ideal solution.
UPDATE1:
I don't know much about jvm agents, but I'm wondering if a custom agent can be developed to intercept network calls and add a parameter. The custome agent on the receiving side will detect the parameter add it to a ThreadLocal variable. Dynatrace PurePath technology somehow correlates calls across JVMs - they require injecting their jvm agent, so I'm guessing they are intercepting calls in the agent. Check out this video
You're going to have to bite the bullet and add the request ID to the HTTP client calls. If you don't want to pollute your APIs, add it as a custom HTTP header, then extract it using some kind of HTTP interceptor on the service side (depends on what web service stack you're using), and re-add it to the MDC.

How to implement rate limiting based on a client token in Spring?

I am developing a simple REST API using Spring 3 + Spring MVC. Authentication will be done through OAuth 2.0 or basic auth with a client token using Spring Security. This is still under debate. All connections will be forced through an SSL connection.
I have been looking for information on how to implement rate limiting, but it does not seem like there is a lot of information out there. The implementation needs to be distributed, in that it works across multiple web servers.
Eg if there are three api servers A, B, C and clients are limited to 5 requests a second, then a client that makes 6 requests like so will find the request to C rejected with an error.
A recieves 3 requests \
B receives 2 requests | Executed in order, all requests from one client.
C receives 1 request /
It needs to work based on a token included in the request, as one client may be making requests on behalf of many users, and each user should be rate limited rather than the server IP address.
The set up will be multiple (2-5) web servers behind an HAProxy load balancer. There is a Cassandra backed, and memcached is used. The web servers will be running on Jetty.
One potential solution might be to write a custom Spring Security filter that extracts the token and checks how many requests have been made with it in the last X seconds. This would allow us to do some things like different rate limits for different clients.
Any suggestions on how it can be done? Is there an existing solution or will I have to write my own solution? I haven't done a lot of web site infrastructure before.
It needs to work based on a token included in the request, as one client may be making requests on behalf of many users, and each user should be rate limited rather than the server IP address.
The set up will be multiple (2-5) web servers behind an HAProxy load balancer. There is a Cassandra backed, and memcached is used. The web servers will be running on Jetty.
I think the project is request/response http(s) protocol. And you use HAProxy as fronted.
Maybe the HAProxy can load balancing with token, you can check from here.
Then the same token requests will reach same webserver, and webserver can just use memory cache to implement rate limiter.
I would avoid modifying application level code to meet this requirement if at all possible.
I had a look through the HAProxy LB documentation nothing too obvious there, but the requirement may warrant a full investigation of ACLs.
Putting HAProxy to one side, a possible architecture is to put an Apache WebServer out front and use an Apache plugin to do the rate limiting. Over-the-limit requests are refused out front and the application servers in the tier behind Apache are then separated from rate limit concerns making them simpler. You could also consider serving static content from the Web Server.
See the answer to this question How can I implement rate limiting with Apache? (requests per second)
I hope this helps.
Rob
You could put rate limits at various points in the flow (generally the higher up the better) and the general approach you have makes a lot of sense. One option for the implementation is to use 3scale to do it (http://www.3scale.net) - it does rate limits, analytics, key managed etc. and works either with a code plugin (the Java plugin is here: https://github.com/3scale/3scale_ws_api_for_java) which pushes or by putting something like Varnish (http://www.varnish-cache.org) in the pipeline and having that apply rate limits.
I was also thinking of the similar solutions a couple of day's ago. Basically, I prefer the "central-controlled" solution to save the state of the client request in the distributed environment.
In my application, I use a "session_id" to identify the request client. Then create a servlet filter or spring HandlerInterceptorAdapter to filter the request, then check the "session_id" with the central-controlled data repository, which could be memcached, redis, cassandra or zookeeper.
We use redis as leaky bucket backend
Add a controller as entrance
google cache that token as key with expired time
then filter every request
It is best if you implement ratelimit using REDIS. For more info please look this Rate limiting js Example.

Categories