I have a kubernetes cluster where one service (java application) connects to another service to write data (elasticsearch).
When elasticsearch (service & replicationcontroller) is restarted/redeployed, the java-application looses it's connection, which can only be recovered by restarting the java-application (rc). This is not the desired behaviour and should be solved.
Using curl from the kubernetes pod of the application to query elasticsearch does work fine after restart, so it must be probably something java is doing.
It does work when only the replicationcontroller for elasticsearch is touched, leaving the service as it is. But why does curl work in that case, however this should not be the solution.
Using the same konfiguration in a local docker setup without kubernetes does also not lead to problems.
Promising solutions that did not worked:
Setting networkaddress.cache.ttlor networkaddress.cache.negative.ttl to zero (or other small positive values)
Hacking /etc/nsswitch.conf as described in https://stackoverflow.com/a/32550032/363281
I'm using kubernetes 1.1.3, OpenJDK 8u66, service Dockerfile is derived from java:8
Try java.security.Security.setProperty("networkaddress.cache.ttl" , "60");
This means sixty seconds and you should adapt to your needs.
One solution is not to restart your Service: a Service resolves the Pods by IPs and watches the Pods by selectors, so you don't need to restart the Service when you restart your Pods.
Now likely what is happening is that your application is resolving the Service at start up, and it then caches the IP. When you restart the Service it likely gets a new IP which messes up your application's behavior. You need to check how you can reset this cache or initiate some sort of restart of that App when the pods/services are changes.
If you don't restart the Service, the IP won't change, but it will still proxy to the Pods that are restarted.
Related
I'm trying to get JMX monitoring from jconsole for an application that is running inside a Kubernetes pod.
Currently, I'm following this method:
I expose a port, let's say 5000 in the YAML
I create a nodePort service that binds that pod port to the worker nodes port
I add the following 4 java properties:
JMX_REMOTE_AUTHENTICATE
JMX_REMOTE_PORT
JMX_REMOTE_RMI_PORT
JMX_REMOTE_SSL
Then I can go into a monitoring tool like jvisualvm and create a connection to the public IP of the worker node hosting that pod at port 5000 and I can monitor that pod, which works great.
Issue:
Now let's say my application scales up, and a new pod comes up on another worker node, I can manually add all the above steps again to monitor that pod.
But that isn't ideal. Ideally, I'd like every pod to be automatically get monitored as it comes online. I can add the properties for JMX in my statefulset YAML, but do I need a nodePort service for every single pod that comes online that binds to a different port? If so, there must be a way to do this through a script or a built-in function?
Anyone with any experience with this, any pointers would be very helpful?
We are trying to use Application level clustering using the Akka Clustering for our distributed application which runs within docker containers across multiple nodes. We plan to run the docker container in the "host" mode networking.
When the dockerized application comes up for the first time, the Akka Clustering does seem to work and we do not see any Gossip messages being exchanged between the cluster nodes. This gets resolved only when we remove the file "/var/lib/docker/network/files/local-kv.db” and restart the docker service. This is not an acceptable solution for the production deployment and so we are trying to do an RCA and provide a proper solution.
Any help here would be really appreciated.
Tried removing file "/var/lib/docker/network/files/local-kv.db” and restarting docker service worked. But this workaround is unacceptable in the production deployment
Tried using the bridge network mode for the dockerized container. That helps, but our current requirement requires us to run the container in "host" mode.
application.conf has the following settings for the host and port currently.
hostname = "" port = 2551 bind-hostname = "0.0.0.0" bind-port = 2551
No gossip messages are exchanged between the akka cluster nodes. Whereas we see those messages after applying the mentioned workaround
There is config for CircleCI.
On the local machine, when you run CircleCI, everything passes. In this case, the server is a lot of mistakes, one of them is
java.lang.IllegalStateException: Can not connect to Ryuk
At the same time in the future there is an error connecting tests in containers launched earlier in test-containers, I think this is due to an error connecting to Ryuk. Confuses that fact that on the local machine everything works and on the server everything falls.
The reason for the problem is here: https://gist.github.com/OlegGorj/52ca84624503a5e85624c6eb38df4590
where it says:
Separation of Environments The job and remote docker run in separate environments. Therefore, Docker containers cannot directly communicate with the containers running in remote docker.
Accessing Services It’s impossible to start a service in remote docker and ping it directly from a primary container (and vice versa).
There appear to be three options:
Do your entire build in another remote docker container.
Use a dedicate VM for the build (https://www.testcontainers.org/supported_docker_environment/continuous_integration/circle_ci/)
If you can get away with creating the test container at the start then do that and don't use testcontainers within circleci (https://circleci.com/docs/2.0/executor-types/#using-multiple-docker-images). Just remember that each test case will be interacting with the same instance of the service.
More details on option 3
Basically, don't use testcontainers (one word) when using circleci.
In your circleci/config.yaml do something like this:
jobs:
build:
docker:
- image: circleci/openjdk:14.0.1-jdk-buster
- image: rabbitmq:3.8-alpine
environment:
So circleci runs the rabbit container on the same host as your image.
You can then communicate with it on localhost on whatever ports it opens, and circleci will close these secondary containers when your build (which is always in the first container) finishes.
There are a few downsides to this:
testcontainers lets you start and stop containers, this approach doesn't so you fundamentally cannot test the restart of a container.
all of your tests will run with the same instance so, in the rabbit instance, each test should use a unique exchange and queue.
if, like me, you need to build in circleci and on the desktop (and in Jenkins) then you need to have circleci conditional logic in your tests (just check for System.getenv("CIRCLECI")) to determine which approach to take.
I had the same error, fixed it by turning off Experimental Features in Docker.
You can find them in Preferences.
I deployed a container on Google Container Engine and it runs fine. Now, I want to expose it.
This application is a service that listens on 2 ports. Using kubectl expose deployment, I created 2 load balancers, one for each port.
I made 2 load balancers because the kubectl expose command doesn't seem to allow more than one port. While I defined it as type=LoadBalancer on kubectl, once these got created on GKE, they were defined as Forwarding rules associated to 2 Target pools that were also created by kubectl. kubectl also automatically made firewall rules for each balancer.
The first one I made exposes the application as it should. I am able to communicate with the application and get a response.
The 2nd one does not connect at all. I keep getting either connection refused or connection timeout. In order to troubleshoot this issue, I further stripped down my firewall rules, to be as permissive as possible, to troubleshoot this issue. Since ICMP is allowed, by default, pinging the ip for this balancer results in replies.
Does kubernetes only allow one load balancer to work, even if more than one can be configured? If it matters any, the working balancer's external ip is in the pattern 35.xxx.xxx.xxx and the ip of the balancer that's not working is 107.xxx.xxx.xxx.
As a side question, is there a way to expose more than one port using kubectl expose --port, without defining a range i.e. I just need 2 ports?
Lastly, I tried using the Google console, but I couldn't get the load balancer, or forwarding rules to work with what's on kubernetes, the way doing it on kubectl does.
Here is the command I used, modifying the port and service name on the 2nd use:
kubectl expose deployment myapp --name=my-app-balancer --type=LoadBalancer --port 62697 --selector="app=my-app"
My firewall rule is basically set to allow all incoming TCP connections over 0.0.0.0/0.
Edit:
External IP had nothing to do with it. I kept deleting & recreating the balancers until I was given an IP of xxx.xxx.xxx.xxx for the working balancer, and the balancer still worked fine.
I've also tried deleting the working balancer and re-creating the one that wasn't working, to see if it's a conflict between balancers. The 2nd balancer still didn't work, even if it was the only one running.
I'm currently investigating the code for the 2nd service of my app, though it's practically the same as the 1st service, a simple ServerSocket implementation that listens on a defined port.
After more thorough investigation (opening a console in the running pod, installing tcpdump, iptables etc), I found that the service (i.e. load balancer) was, in fact, reachable. What happened in this situation was, although traffic reached the container's virtual network interfrace (eth0), the data wasn't routed to the listening services, even when these were ip aliases for the interface (eth0:1, eth0:2).
The last step to getting this to work was to create the required routes through
iptables -t nat -A PREROUTING -p tcp -i eth0 --dport <listener-ip> -j DNAT --to-destination <listener-ip>
Note, there are other ways to accomplish this, but this was the one I chose. I wish the Docker/Kubernetes documentation mentioned this.
I'm trying to write a simple spark application, and when i run it locally it works with setting the master as
.master("local[2]")
But after configuring spark cluster on AWS (EMR) i can't connet to the master url:
.master("spark://<master url>:7077")
Is this the way to do it? am i missing something here?
The cluster is up and running, and when i tried adding my application as a step jar, so it will run directly in the cluster it worked. But i want to be able to run it from a remote machine.
would appreciate some help here,
Thanks
To run from a remote machine, you will need to open the appropriate ports in the Security Group assigned to your EMR master node. You will need to add at least 7077.
If by "remote" you mean one that isn't in your AWS environment, you will also need to setup a way to route traffic to it from the outside.