spark worker cannot connect to master - java

My spark worker cannot connect to master. I've tried:
adding spark-env.sh like some answers I found, set SPARK_MASTER_HOST=IP
restarted both the master and worker
None of them worked.
Here's what I did:
start spark master
starting org.apache.spark.deploy.master.Master, logging to /<MY-PATH>/spark-3.1.2-bin-hadoop3.2/logs/spark-<MY-USERNAME>-org.apache.spark.deploy.master.Master-1-<SPARK_ID>.out
start spark worker: spark://<SPARK_ID>:7077
starting org.apache.spark.deploy.worker.Worker, logging to /<MY-PATH>/spark-3.1.2-bin-hadoop3.2/logs/spark-<MY-USERNAME>-org.apache.spark.deploy.worker.Worker-1-<SPARK_ID>.out
The spark master starts and I can get the master's URL. The worker also starts. However, it cannot connect to the master because:
21/08/06 15:33:07 INFO Worker: Connecting to master <SPARK_ID>:7077...
21/08/06 15:33:10 WARN Worker: Failed to connect to master <SPARK_ID>:7077
org.apache.spark.SparkException: Exception thrown in awaitResult:
at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:301)
at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:101)
at org.apache.spark.rpc.RpcEnv.setupEndpointRef(RpcEnv.scala:109)
at org.apache.spark.deploy.worker.Worker$$anon$1.run(Worker.scala:298)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: Failed to connect to <SPARK_ID>/192.168.1.175:7077
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:287)
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:218)
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:230)
at org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:204)
at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:202)
at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:198)
... 4 more
Caused by: io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: <SPARK_ID>/192.168.1.175:7077
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:715)
at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:330)
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:334)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:702)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:650)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:576)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493)
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:748)
21/08/06 15:33:12 INFO Worker: Retrying connection to master (attempt # 4)
21/08/06 15:33:12 INFO Worker: Connecting to master <SPARK_ID>:7077...
21/08/06 15:33:13 ERROR Worker: RECEIVED SIGNAL TERM

Check if master is running by checking log files under $SPARK_HOME/logs.
You should see logs as shown below:
21/08/07 02:56:04 INFO Utils: Successfully started service 'sparkMaster' on port 7077.
21/08/07 02:56:04 INFO Master: Starting Spark master at spark://<your host>:7077
21/08/07 02:56:05 INFO Master: Running Spark version 3.1.2
21/08/07 02:56:05 INFO Utils: Successfully started service 'MasterUI' on port 8080.
21/08/07 02:56:05 INFO MasterWebUI: Bound MasterWebUI to 0.0.0.0, and started at http://<your IP>:8080
21/08/07 02:56:05 INFO Master: I have been elected leader! New state: ALIVE
while starting worker pass master url which you get from above log.

Related

Error while connecting spring to MongoDb (Docker) [duplicate]

I have a jar which runs fine on my host; specifically, when I run
java -jar myjar.jar
I get the expected output:
[2018-12-05 16:46:53.917] boot - 21252 INFO [main] --- Application: No active profile set, falling back to default profiles: default
[2018-12-05 16:47:00.855] boot - 21252 INFO [main] --- Application: Started Application in 8.176 seconds (JVM running for 9.106)
This is the Core Data Micro Service.
[2018-12-05 16:47:00.856] boot - 21252 INFO [main] --- Application: Registering to queue for events
[2018-12-05 16:47:00.857] boot - 21252 INFO [main] --- ZeroMQEventSubscriber: Getting subscriber, listening to tcp://localhost:5565
[2018-12-05 16:47:00.915] boot - 21252 INFO [main] --- ZeroMQEventSubscriber: Watching for new Event messages...
But then, I try to run the same jar inside a docker container. So I create the image like this:
FROM openjdk:8-jdk-alpine
COPY myjar.jar /opt/spring-cloud/lib/
ENTRYPOINT ["/usr/bin/java"]
CMD ["-jar", "/opt/spring-cloud/lib/myjar.jar"]
EXPOSE 48080
and run it:
sudo docker run [ID]
but this time, I get this exception from the container logs (this is only a part of the exception because it is too big, but I can show it all if needed):
[2018-12-07 08:30:31.447] boot - 1 INFO [main] --- Application: No active profile set, falling back to default profiles: default
[2018-12-07 08:32:35.423] boot - 1 ERROR [main] --- SpringApplication: Application startup failed
...
...
org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'readingControllerImpl': Injection of autowired dependencies failed; nested exception is org.springframework.beans.factory.BeanCreationException: Could not autowire field: org.edgexfoundry.dao.ValueDescriptorRepository org.edgexfoundry.controller.impl.ReadingControllerImpl.valDescRepos; nested exception is org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'valueDescriptorRepository': Invocation of init method failed; nested exception is org.springframework.dao.DataAccessResourceFailureException: Timed out after 120000 ms while waiting for a server that matches AnyServerSelector{}. Client view of cluster state is {type=Unknown, servers=[{address=localhost:27017, type=Unknown, state=Connecting, exception={com.mongodb.MongoException$Network: Exception opening the socket}, caused by {java.net.ConnectException: Connection refused (Connection refused)}}]; nested exception is com.mongodb.MongoTimeoutException: Timed out after 120000 ms while waiting for a server that matches AnyServerSelector{}. Client view of cluster state is {type=Unknown, servers=[{address=localhost:27017, type=Unknown, state=Connecting, exception={com.mongodb.MongoException$Network: Exception opening the socket}, caused by {java.net.ConnectException: Connection refused (Connection refused)}}]
at org.springframework.beans.factory.annotation.AutowiredAnnotationBeanPostProcessor.postProcessPropertyValues(AutowiredAnnotationBeanPostProcessor.java:334)
...
...
Caused by: com.mongodb.MongoTimeoutException: Timed out after 120000 ms while waiting for a server that matches AnyServerSelector{}. Client view of cluster state is {type=Unknown, servers=[{address=localhost:27017, type=Unknown, state=Connecting, exception={com.mongodb.MongoException$Network: Exception opening the socket}, caused by {java.net.ConnectException: Connection refused (Connection refused)}}]
at com.mongodb.BaseCluster.getServer(BaseCluster.java:82)
at com.mongodb.DBTCPConnector.getServer(DBTCPConnector.java:664)
at com.mongodb.DBTCPConnector.access$500(DBTCPConnector.java:40)
at com.mongodb.DBTCPConnector$MyPort.getConnection(DBTCPConnector.java:513)
at com.mongodb.DBTCPConnector$MyPort.get(DBTCPConnector.java:456)
at com.mongodb.DBTCPConnector.getPrimaryPort(DBTCPConnector.java:415)
at com.mongodb.DBCollectionImpl.createIndex(DBCollectionImpl.java:378)
at com.mongodb.DBCollection.createIndex(DBCollection.java:597)
at org.springframework.data.mongodb.core.index.MongoPersistentEntityIndexCreator.createIndex(MongoPersistentEntityIndexCreator.java:142)
... 57 more
Mongo has been started in another container through docker-compose (together with other services in other containers):
ps aux | grep mongo
root 16226 0.0 0.0 4340 768 ? Ss 10:27 0:00 /bin/sh -c /edgex/mongo/config/launch-edgex-mongo.sh
root 16292 0.0 0.0 4340 764 ? S 10:27 0:00 /bin/sh /edgex/mongo/config/launch-edgex-mongo.sh
root 16293 0.5 0.3 961168 61400 ? SLl 10:27 0:05 mongod --smallfiles
This is the docker-compose file:
version: '3'
services:
volume:
image: edgexfoundry/docker-edgex-volume:0.6.0
container_name: edgex-files
networks:
- edgex-network
volumes:
- db-data:/data/db
- log-data:/edgex/logs
- consul-config:/consul/config
- consul-data:/consul/data
mongo:
image: edgexfoundry/docker-edgex-mongo:0.6.0
ports:
- "27017:27017"
container_name: edgex-mongo
hostname: edgex-mongo
networks:
- edgex-network
volumes:
- db-data:/data/db
- log-data:/edgex/logs
- consul-config:/consul/config
- consul-data:/consul/data
depends_on:
- volume
.... more services...
networks:
edgex-network:
driver: "bridge
And the mongo db configuration properties:
spring.data.mongodb.username=core
spring.data.mongodb.password=password
spring.data.mongodb.database=coredata
#change to localhost when running locally during development
# (or set hosts to point edgex-mongo to the mongo host
spring.data.mongodb.host=localhost
#spring.data.mongodb.host=edgex-mongo
spring.data.mongodb.port=27017
spring.data.mongodb.connectTimeout=120000
spring.data.mongodb.socketTimeout=60000
spring.data.mongodb.maxWaitTime=120000
spring.data.mongodb.socketKeepAlive=true
Any ideas what may be going wrong?
There are two things going wrong here, first of all spring tries to connect to your mongodb on localhost, within docker this does not work since localhost references to the current container where of course no mongodb is available. To fix this you have to comment out this line and uncomment the next line which lists the host as edgex-mongo which corresponds with the hostname of your mongodb container, so spring knows to connect to that container.
However when you would do this you would run into the issue that it would not recognize edgex-mongo since it has no connection to this container. edgex-mongo is inside a bridged network which requires you to add the spring container to this network by using the following command:
docker run --network edgex--network [image]
I hope this helps you

Connecting to MongoDB from Maven exec fails but not from the jar

The command below resolves all required dependencies to execute my program:
mvn exec:java -Dexec.mainClass='com.cnrg.cdproc.App'
but I can't connect to MongoDB with it, when it's possible to execute the same code from the jar using this command: java -jar target/cdproc-1.0.0-SNAPSHOT-shaded.jar.
Here's the error I'm getting starting the app via maven:
[Thread-1] INFO org.mongodb.driver.cluster - Cluster created with settings {hosts=[localhost:9032], mode=SINGLE, requiredClusterType=UNKNOWN, serverSelectionTimeout='30000 ms'}
[cluster-ClusterId{value='5fc969c67d495439ce9a3796', description='null'}-localhost:9032] INFO org.mongodb.driver.cluster - Exception in monitor thread while connecting to server localhost:9032
com.mongodb.MongoInterruptedException: Interrupted acquiring a permit to retrieve an item from the pool
at com.mongodb.internal.connection.ConcurrentPool.acquirePermit(ConcurrentPool.java:203)
at com.mongodb.internal.connection.ConcurrentPool.get(ConcurrentPool.java:140)
at com.mongodb.internal.connection.ConcurrentPool.get(ConcurrentPool.java:123)
at com.mongodb.internal.connection.PowerOfTwoBufferPool.getByteBuffer(PowerOfTwoBufferPool.java:82)
at com.mongodb.internal.connection.PowerOfTwoBufferPool.getBuffer(PowerOfTwoBufferPool.java:77)
at com.mongodb.internal.connection.SocketStream.getBuffer(SocketStream.java:93)
at com.mongodb.internal.connection.InternalStreamConnection.getBuffer(InternalStreamConnection.java:684)
at com.mongodb.internal.connection.ByteBufferBsonOutput.getByteBufferAtIndex(ByteBufferBsonOutput.java:93)
at com.mongodb.internal.connection.ByteBufferBsonOutput.getCurrentByteBuffer(ByteBufferBsonOutput.java:82)
at com.mongodb.internal.connection.ByteBufferBsonOutput.writeByte(ByteBufferBsonOutput.java:77)
at org.bson.io.OutputBuffer.write(OutputBuffer.java:150)
at org.bson.io.OutputBuffer.writeInt32(OutputBuffer.java:56)
at com.mongodb.internal.connection.RequestMessage.writeMessagePrologue(RequestMessage.java:158)
at com.mongodb.internal.connection.RequestMessage.encode(RequestMessage.java:137)
at com.mongodb.internal.connection.CommandMessage.encode(CommandMessage.java:59)
at com.mongodb.internal.connection.InternalStreamConnection.sendAndReceive(InternalStreamConnection.java:269)
at com.mongodb.internal.connection.CommandHelper.sendAndReceive(CommandHelper.java:83)
at com.mongodb.internal.connection.CommandHelper.executeCommand(CommandHelper.java:33)
at com.mongodb.internal.connection.InternalStreamConnectionInitializer.initializeConnectionDescription(InternalStreamConnectionInitializer.java:107)
at com.mongodb.internal.connection.InternalStreamConnectionInitializer.initialize(InternalStreamConnectionInitializer.java:62)
at com.mongodb.internal.connection.InternalStreamConnection.open(InternalStreamConnection.java:144)
at com.mongodb.internal.connection.DefaultServerMonitor$ServerMonitorRunnable.lookupServerDescription(DefaultServerMonitor.java:188)
at com.mongodb.internal.connection.DefaultServerMonitor$ServerMonitorRunnable.run(DefaultServerMonitor.java:144)
at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: java.lang.InterruptedException
at java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1343)
at java.base/java.util.concurrent.Semaphore.acquire(Semaphore.java:318)
at com.mongodb.internal.connection.ConcurrentPool.acquirePermit(ConcurrentPool.java:199)
... 23 more
What's the difference between starting the program through the jar and starting the program through maven? How to troubleshoot that issue if possible and start the program from maven?

Wildfly suspending intermittently my web application

Wildfly are suspending intermittently my java web application. This occur often one time in the week.
I'm running my service with the following configuration:
Ubuntu 16 8GB
Intel(R) Xeon(R) CPU E5-2640 v3 # 2.60GHz
Web Application:
Wildfly 14
Java 11
JavaEE 8
Postgresql
2019-08-08 07:35:14,619 ERROR [org.jboss.as.server.deployment.scanner] (DeploymentScanner-threads - 1) WFLYDS0012: Scan of /opt/wildfly-14.0.1.Final/standalone/deployments threw Exception: java.lang.RuntimeException: WFLYDS0032: Failed to list files in directory /opt/wildfly-14.0.1.Final/standalone/deployments. Check that the contents of the directory are readable.
at org.jboss.as.deployment-scanner#6.0.2.Final//org.jboss.as.server.deployment.scanner.FileSystemDeploymentService.listDirectoryChildren(FileSystemDeploymentService.java:1365)
at org.jboss.as.deployment-scanner#6.0.2.Final//org.jboss.as.server.deployment.scanner.FileSystemDeploymentService.scanDirectory(FileSystemDeploymentService.java:846)
at org.jboss.as.deployment-scanner#6.0.2.Final//org.jboss.as.server.deployment.scanner.FileSystemDeploymentService.scan(FileSystemDeploymentService.java:598)
at org.jboss.as.deployment-scanner#6.0.2.Final//org.jboss.as.server.deployment.scanner.FileSystemDeploymentService.scan(FileSystemDeploymentService.java:493)
at org.jboss.as.deployment-scanner#6.0.2.Final//org.jboss.as.server.deployment.scanner.FileSystemDeploymentService$DeploymentScanRunnable.run(FileSystemDeploymentService.java:255)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
at org.jboss.threads#2.3.2.Final//org.jboss.threads.JBossThread.run(JBossThread.java:485)
Caused by: java.nio.file.FileSystemException: /opt/wildfly-14.0.1.Final/standalone/deployments: Too many open files
at java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:100)
at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111)
at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:116)
at java.base/sun.nio.fs.UnixFileSystemProvider.newDirectoryStream(UnixFileSystemProvider.java:428)
at java.base/java.nio.file.Files.newDirectoryStream(Files.java:603)
at org.jboss.as.deployment-scanner#6.0.2.Final//org.jboss.as.server.deployment.scanner.FileSystemDeploymentService.listDirectoryChildren(FileSystemDeploymentService.java:1358)
... 11 more
2019-08-08 07:55:11,261 INFO [org.jboss.as.server] (Thread-1) WFLYSRV0236: Suspending server with no timeout.
2019-08-08 07:55:11,272 INFO [org.jboss.as.ejb3] (Thread-1) WFLYEJB0493: EJB subsystem suspension complete
2019-08-08 07:55:11,285 INFO [org.jboss.as.server] (Thread-1) WFLYSRV0220: Server shutdown has been requested via an OS signal
2019-08-08 07:55:11,324 INFO [org.jboss.as.mail.extension] (MSC service thread 1-8) WFLYMAIL0002: Unbound mail session [java:jboss/mail/Default]
2019-08-08 07:55:11,327 INFO [org.wildfly.extension.undertow] (ServerService Thread Pool -- 72) WFLYUT0022: Unregistered web context: '/service' from server 'default-server'
Looks like some process may be your web server process is keeping files open and it is reaching beyond ulimit.
To find root cause, you can simply use lsof linux command to see all the open files and their respective process ids.
1) if you find you could control this count of open files, then it's good.
2) if it's unavoidable, then try increasing ulimit

Submitting Job to Remote Apache Spark Server

Apache Spark (v1.6.1) started as service on Ubuntu (10.10.0.102) machine, using ./start-all.sh.
Now need to submit job to this server remotely using Java API.
Following is Java client code running from different machine (10.10.0.95).
String mySqlConnectionUrl = "jdbc:mysql://localhost:3306/demo?user=sec&password=sec";
String jars[] = new String[] {"/home/.m2/repository/com/databricks/spark-csv_2.10/1.4.0/spark-csv_2.10-1.4.0.jar",
"/home/.m2/repository/org/apache/commons/commons-csv/1.1/commons-csv-1.1.jar",
"/home/.m2/repository/mysql/mysql-connector-java/6.0.2/mysql-connector-java-6.0.2.jar"};
SparkConf sparkConf = new SparkConf()
.setAppName("sparkCSVWriter")
.setMaster("spark://10.10.0.102:7077")
.setJars(jars);
JavaSparkContext javaSparkContext = new JavaSparkContext(sparkConf);
SQLContext sqlContext = new SQLContext(javaSparkContext);
Map<String, String> options = new HashMap<>();
options.put("driver", "com.mysql.jdbc.Driver");
options.put("url", mySqlConnectionUrl);
options.put("dbtable", "(select p.FIRST_NAME from person p) as firstName");
DataFrame dataFrame = sqlContext.read().format("jdbc").options(options).load();
dataFrame.write()
.format("com.databricks.spark.csv")
.option("header", "true")
.option("delimiter", "|")
.option("quote", "\"")
.option("quoteMode", QuoteMode.NON_NUMERIC.toString())
.option("escape", "\\")
.save("persons.csv");
Configuration hadoopConfiguration = javaSparkContext.hadoopConfiguration();
FileSystem hdfs = FileSystem.get(hadoopConfiguration);
FileUtil.copyMerge(hdfs, new Path("persons.csv"), hdfs, new Path("\home\persons1.csv"), true, hadoopConfiguration, new String());
As per code need to convert RDBMS data to csv/json using Spark. But when I run this client application, able to connect to remote spark server but in console continuously receiving following WARN message
WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
And at server side on Spark UI in running applications > executor summary > stderr log, received following error.
Exception in thread "main" java.io.IOException: Failed to connect to /192.168.56.1:53112
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:216)
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:167)
at org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:200)
at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:187)
at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:183)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.ConnectException: Connection refused: /192.168.56.1:53112
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:224)
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:289)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:528)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
... 1 more
But there no any IP address configured as 192.168.56.1. So is there any configuration missing.
Actually my client machine(10.10.0.95) is Windows machine. When I tried to submit Spark job using another Ubuntu machine(10.10.0.155), I am able to run same Java client code successfully.
As I debugged in Windows client environment, when I submit spark job following log displayed,
INFO Remoting: Starting remoting
INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem#192.168.56.1:61552]
INFO Utils: Successfully started service 'sparkDriverActorSystem' on port 61552.
INFO MemoryStore: MemoryStore started with capacity 2.4 GB
INFO SparkEnv: Registering OutputCommitCoordinator
INFO Utils: Successfully started service 'SparkUI' on port 4044.
INFO SparkUI: Started SparkUI at http://192.168.56.1:4044
As per log line number 2, its register client with 192.168.56.1.
Elsewhere, in Ubuntu client
INFO Remoting: Starting remoting
INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem#10.10.0.155:42786]
INFO Utils: Successfully started service 'sparkDriverActorSystem' on port 42786.
INFO MemoryStore: MemoryStore started with capacity 511.1 MB
INFO SparkEnv: Registering OutputCommitCoordinator
INFO Utils: Successfully started service 'SparkUI' on port 4040.
INFO SparkUI: Started SparkUI at http://10.10.0.155:4040
As per log line number 2, its register client with 10.10.0.155 same as actual IP address.
If anybody find what is the problem with Windows client, let community know.
[UPDATE]
I am running this whole environment in Virtual Box. Windows machine is my host and Ubuntu is guest. And Spark installed in Ubuntu machine. In Virtual box environment Virtual box installing Ethernet adapter VirtualBox Host-Only Netwotk with IPv4 address : 192.168.56.1. And Spark registering this IP as client IP instead of actual IP address 10.10.0.95.

spark standalone cluster slave unable to connect slave to master

i have the bin for spark-1.6.0-bin-hadoop2.6 im having issue trying to connect the slave to the master
so far i have tried(on ubuntu 14.04 live usb):
apt-get purge and install openssh-client and server on both systems
I have stated explicitly the ip address of the master in the spark url for the worker
spark://< master ip>:7077 and also tried changing the SPARK_MASTER_IP in /conf/spark-env.sh the worker executes but the log has the following error
im assuming there must be some ssh setup involved but i have tried ssh-keygen and ssh-copy-id # it also doesnt give any reuslts
16/02/22 07:49:16 INFO Worker: Connecting to master 192.168.0.208:7077...
16/02/22 07:49:16 WARN Worker: Failed to connect to master 192.168.0.208:7077
java.io.IOException: Failed to connect to /192.168.0.208:7077
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:216)
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:167)
at org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:200)
at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:187)
at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:183)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.ConnectException: Connection refused: /192.168.0.208:7077
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:740)
at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:224)
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:289)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:528)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
... 1 more
16/02/22 07:49:27 INFO Worker: Retrying connection to master (attempt # 2)
i am however able to open the master webUI by typing :8080 on my browser . i am also able to access the webUI of the slave from the master . im almost at the point of giving in so please helllppp.
Make sure that each master and worker has a firewall exception to allow connections for all other workers and masters.
Here's a simplified example from one of our master machines (master0):
$iptables -L
...
ACCEPT all -- worker0.company.com master0.company.com
ACCEPT all -- worker1.company.com master0.company.com
ACCEPT all -- master1.company.com master0.company.com
...
Of course you can also use IPs instead of hostnames.

Categories