I am using hadoop2.4.0 for my testing purpose. I have to configure hadoop in my machine such that I will able to run hadoop in pseudo distributed mode , so that I can test independently on My machine. Also I want to make my machine as part of cluster.
But As I think issues will arise when datanodes and few other services which have there default port try to run on same port. So can any body guide me , how can I achive this.
Thanks
Change following settings in hdfs-site.xml :
dfs.datanode.address (for example: 0.0.0.0:50010)
dfs.datanode.ipc.address
dfs.datanode.http.address
dfs.datanode.https.address
Related
I have two development machines, both running Ignite in server mode on same network. Actually I started the server in the first machine and then started another machine. When the other machine starts, it is getting automatically added to the first one's topology.
Note:
when starting I've removed the work folder in both machines.
In config, I never mentioned any Ips of other machines.
Can anyone tell me what's wrong with this? My intention is each machine should've separate topology.
As described in discovery documentation, Apache Ignite will employ multicast to find all nodes in a local network, forming a cluster. This is default mode of operation.
Please note that we don't really recommend using this mode for either development or production deployment, use static discovery instead (see the same documentation).
I have three Java applications that will connect to the same Ignite node (running on a particular VM) to access the same cache store.
Is there a step-by-step procedure on how to run a node outside Java application (from command prompt, may be) and connect my Java apps to it?
Your Java applications should serve as client nodes in your cluster. More information about client/sever mode can be found in the documentation. Server node(s) could be started from command line, it's described here. Information about running with a custom configuration could be found there as well. You need to set up discovery in order to make the entire thing work. It should be done on every node (incl. client nodes). I'd recommend you to use static IP finder in the configuration.
We are trying to use Application level clustering using the Akka Clustering for our distributed application which runs within docker containers across multiple nodes. We plan to run the docker container in the "host" mode networking.
When the dockerized application comes up for the first time, the Akka Clustering does seem to work and we do not see any Gossip messages being exchanged between the cluster nodes. This gets resolved only when we remove the file "/var/lib/docker/network/files/local-kv.db” and restart the docker service. This is not an acceptable solution for the production deployment and so we are trying to do an RCA and provide a proper solution.
Any help here would be really appreciated.
Tried removing file "/var/lib/docker/network/files/local-kv.db” and restarting docker service worked. But this workaround is unacceptable in the production deployment
Tried using the bridge network mode for the dockerized container. That helps, but our current requirement requires us to run the container in "host" mode.
application.conf has the following settings for the host and port currently.
hostname = "" port = 2551 bind-hostname = "0.0.0.0" bind-port = 2551
No gossip messages are exchanged between the akka cluster nodes. Whereas we see those messages after applying the mentioned workaround
Now we try to use H2o to construct training cluster. It is easy to use by running java -jar ./h2o.jar and we can setup the cluster with simple flatfile.txt which contain multiple ip and ports.
But we found that it is impossible to setup the h2o cluster within docker containers. Although we can start multiple containers to run java -jar ./h2o.jar and add the prepared flatfile.txt, the h2o process will try to bind local(container's eth0) ip which is different from the one in flatfile.txt. We can java -jar ./h2o.jar -ip $ip to set the one which is in flatfile.txt but h2o instance is not able to run without this "external" ip.
If you use use "docker run --network=host ..." it will work.
See my response to a similar issue here. I describe how it is possible to start an H2O cluster using a flatfile and docker swarm. Basically, you have to run a script in each service before starting H2O, to identify the correct IP addresses for the cluster. This is because docker assigns two IPs to each service. The flatfile needs to use the $HOSTNAME IP for each cluster member, which is difficult to determine in advance.
I'm trying to write a simple spark application, and when i run it locally it works with setting the master as
.master("local[2]")
But after configuring spark cluster on AWS (EMR) i can't connet to the master url:
.master("spark://<master url>:7077")
Is this the way to do it? am i missing something here?
The cluster is up and running, and when i tried adding my application as a step jar, so it will run directly in the cluster it worked. But i want to be able to run it from a remote machine.
would appreciate some help here,
Thanks
To run from a remote machine, you will need to open the appropriate ports in the Security Group assigned to your EMR master node. You will need to add at least 7077.
If by "remote" you mean one that isn't in your AWS environment, you will also need to setup a way to route traffic to it from the outside.