I am new to YARN, and I am developing a framework to launch java applications via YARN container. To register my ApplicationMaster to resource manager, the code is executing registerApplicationMaster("",0,"") which works fine on single node cluster. But the same call hangs forever in case of multi-node cluster. I am wondering if not passing these parameters properly is causing this.
Even if it is not, I want to know what are these for.
public abstract RegisterApplicationMasterResponse registerApplicationMaster(String appHostName, int appHostPort, String appTrackingUrl)
appHostName - Name of the host on which master is running
appHostPort - Port master is listening on
appTrackingUrl - URL at which the master info can be seen
Related
I'm trying to get JMX monitoring from jconsole for an application that is running inside a Kubernetes pod.
Currently, I'm following this method:
I expose a port, let's say 5000 in the YAML
I create a nodePort service that binds that pod port to the worker nodes port
I add the following 4 java properties:
JMX_REMOTE_AUTHENTICATE
JMX_REMOTE_PORT
JMX_REMOTE_RMI_PORT
JMX_REMOTE_SSL
Then I can go into a monitoring tool like jvisualvm and create a connection to the public IP of the worker node hosting that pod at port 5000 and I can monitor that pod, which works great.
Issue:
Now let's say my application scales up, and a new pod comes up on another worker node, I can manually add all the above steps again to monitor that pod.
But that isn't ideal. Ideally, I'd like every pod to be automatically get monitored as it comes online. I can add the properties for JMX in my statefulset YAML, but do I need a nodePort service for every single pod that comes online that binds to a different port? If so, there must be a way to do this through a script or a built-in function?
Anyone with any experience with this, any pointers would be very helpful?
We are trying to use Application level clustering using the Akka Clustering for our distributed application which runs within docker containers across multiple nodes. We plan to run the docker container in the "host" mode networking.
When the dockerized application comes up for the first time, the Akka Clustering does seem to work and we do not see any Gossip messages being exchanged between the cluster nodes. This gets resolved only when we remove the file "/var/lib/docker/network/files/local-kv.db” and restart the docker service. This is not an acceptable solution for the production deployment and so we are trying to do an RCA and provide a proper solution.
Any help here would be really appreciated.
Tried removing file "/var/lib/docker/network/files/local-kv.db” and restarting docker service worked. But this workaround is unacceptable in the production deployment
Tried using the bridge network mode for the dockerized container. That helps, but our current requirement requires us to run the container in "host" mode.
application.conf has the following settings for the host and port currently.
hostname = "" port = 2551 bind-hostname = "0.0.0.0" bind-port = 2551
No gossip messages are exchanged between the akka cluster nodes. Whereas we see those messages after applying the mentioned workaround
There is config for CircleCI.
On the local machine, when you run CircleCI, everything passes. In this case, the server is a lot of mistakes, one of them is
java.lang.IllegalStateException: Can not connect to Ryuk
At the same time in the future there is an error connecting tests in containers launched earlier in test-containers, I think this is due to an error connecting to Ryuk. Confuses that fact that on the local machine everything works and on the server everything falls.
The reason for the problem is here: https://gist.github.com/OlegGorj/52ca84624503a5e85624c6eb38df4590
where it says:
Separation of Environments The job and remote docker run in separate environments. Therefore, Docker containers cannot directly communicate with the containers running in remote docker.
Accessing Services It’s impossible to start a service in remote docker and ping it directly from a primary container (and vice versa).
There appear to be three options:
Do your entire build in another remote docker container.
Use a dedicate VM for the build (https://www.testcontainers.org/supported_docker_environment/continuous_integration/circle_ci/)
If you can get away with creating the test container at the start then do that and don't use testcontainers within circleci (https://circleci.com/docs/2.0/executor-types/#using-multiple-docker-images). Just remember that each test case will be interacting with the same instance of the service.
More details on option 3
Basically, don't use testcontainers (one word) when using circleci.
In your circleci/config.yaml do something like this:
jobs:
build:
docker:
- image: circleci/openjdk:14.0.1-jdk-buster
- image: rabbitmq:3.8-alpine
environment:
So circleci runs the rabbit container on the same host as your image.
You can then communicate with it on localhost on whatever ports it opens, and circleci will close these secondary containers when your build (which is always in the first container) finishes.
There are a few downsides to this:
testcontainers lets you start and stop containers, this approach doesn't so you fundamentally cannot test the restart of a container.
all of your tests will run with the same instance so, in the rabbit instance, each test should use a unique exchange and queue.
if, like me, you need to build in circleci and on the desktop (and in Jenkins) then you need to have circleci conditional logic in your tests (just check for System.getenv("CIRCLECI")) to determine which approach to take.
I had the same error, fixed it by turning off Experimental Features in Docker.
You can find them in Preferences.
I'm trying to write a simple spark application, and when i run it locally it works with setting the master as
.master("local[2]")
But after configuring spark cluster on AWS (EMR) i can't connet to the master url:
.master("spark://<master url>:7077")
Is this the way to do it? am i missing something here?
The cluster is up and running, and when i tried adding my application as a step jar, so it will run directly in the cluster it worked. But i want to be able to run it from a remote machine.
would appreciate some help here,
Thanks
To run from a remote machine, you will need to open the appropriate ports in the Security Group assigned to your EMR master node. You will need to add at least 7077.
If by "remote" you mean one that isn't in your AWS environment, you will also need to setup a way to route traffic to it from the outside.
I have a kubernetes cluster where one service (java application) connects to another service to write data (elasticsearch).
When elasticsearch (service & replicationcontroller) is restarted/redeployed, the java-application looses it's connection, which can only be recovered by restarting the java-application (rc). This is not the desired behaviour and should be solved.
Using curl from the kubernetes pod of the application to query elasticsearch does work fine after restart, so it must be probably something java is doing.
It does work when only the replicationcontroller for elasticsearch is touched, leaving the service as it is. But why does curl work in that case, however this should not be the solution.
Using the same konfiguration in a local docker setup without kubernetes does also not lead to problems.
Promising solutions that did not worked:
Setting networkaddress.cache.ttlor networkaddress.cache.negative.ttl to zero (or other small positive values)
Hacking /etc/nsswitch.conf as described in https://stackoverflow.com/a/32550032/363281
I'm using kubernetes 1.1.3, OpenJDK 8u66, service Dockerfile is derived from java:8
Try java.security.Security.setProperty("networkaddress.cache.ttl" , "60");
This means sixty seconds and you should adapt to your needs.
One solution is not to restart your Service: a Service resolves the Pods by IPs and watches the Pods by selectors, so you don't need to restart the Service when you restart your Pods.
Now likely what is happening is that your application is resolving the Service at start up, and it then caches the IP. When you restart the Service it likely gets a new IP which messes up your application's behavior. You need to check how you can reset this cache or initiate some sort of restart of that App when the pods/services are changes.
If you don't restart the Service, the IP won't change, but it will still proxy to the Pods that are restarted.