I have 3 node spark cluster
node1 , node2 and node 3
I running below command on node 1 for deploying driver
/usr/local/spark-1.2.1-bin-hadoop2.4/bin/spark-submit --class com.fst.firststep.aggregator.FirstStepMessageProcessor --master spark://ec2-xx-xx-xx-xx.compute-1.amazonaws.com:7077 --deploy-mode cluster --supervise file:///home/xyz/sparkstreaming-0.0.1-SNAPSHOT.jar /home/xyz/config.properties
driver gets launched on node 2 in cluster. but getting exception on node 2 that it is trying to bind to node 1 ip.
2015-02-26 08:47:32 DEBUG AkkaUtils:63 - In createActorSystem, requireCookie is: off
2015-02-26 08:47:32 INFO Slf4jLogger:80 - Slf4jLogger started
2015-02-26 08:47:33 ERROR NettyTransport:65 - failed to bind to ec2-xx.xx.xx.xx.compute-1.amazonaws.com/xx.xx.xx.xx:0, shutting down Netty transport
2015-02-26 08:47:33 WARN Utils:71 - Service 'Driver' could not bind on port 0. Attempting port 1.
2015-02-26 08:47:33 DEBUG AkkaUtils:63 - In createActorSystem, requireCookie is: off
2015-02-26 08:47:33 ERROR Remoting:65 - Remoting error: [Startup failed] [
akka.remote.RemoteTransportException: Startup failed
at akka.remote.Remoting.akka$remote$Remoting$$notifyError(Remoting.scala:136)
at akka.remote.Remoting.start(Remoting.scala:201)
at akka.remote.RemoteActorRefProvider.init(RemoteActorRefProvider.scala:184)
at akka.actor.ActorSystemImpl.liftedTree2$1(ActorSystem.scala:618)
at akka.actor.ActorSystemImpl._start$lzycompute(ActorSystem.scala:615)
at akka.actor.ActorSystemImpl._start(ActorSystem.scala:615)
at akka.actor.ActorSystemImpl.start(ActorSystem.scala:632)
at akka.actor.ActorSystem$.apply(ActorSystem.scala:141)
at akka.actor.ActorSystem$.apply(ActorSystem.scala:118)
at org.apache.spark.util.AkkaUtils$.org$apache$spark$util$AkkaUtils$$doCreateActorSystem(AkkaUtils.scala:121)
at org.apache.spark.util.AkkaUtils$$anonfun$1.apply(AkkaUtils.scala:54)
at org.apache.spark.util.AkkaUtils$$anonfun$1.apply(AkkaUtils.scala:53)
at org.apache.spark.util.Utils$$anonfun$startServiceOnPort$1.apply$mcVI$sp(Utils.scala:1765)
at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)
at org.apache.spark.util.Utils$.startServiceOnPort(Utils.scala:1756)
at org.apache.spark.util.AkkaUtils$.createActorSystem(AkkaUtils.scala:56)
at org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:33)
at org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala)
Caused by: org.jboss.netty.channel.ChannelException: Failed to bind to: ec2-xx-xx-xx.compute-1.amazonaws.com/xx.xx.xx.xx:0
at org.jboss.netty.bootstrap.ServerBootstrap.bind(ServerBootstrap.java:272)
at akka.remote.transport.netty.NettyTransport$$anonfun$listen$1.apply(NettyTransport.scala:393)
at akka.remote.transport.netty.NettyTransport$$anonfun$listen$1.apply(NettyTransport.scala:389)
at scala.util.Success$$anonfun$map$1.apply(Try.scala:206)
at scala.util.Try$.apply(Try.scala:161)
at scala.util.Success.map(Try.scala:206)
kindly suggest
Thanks
after spending lot more time.i got the answer.i did below changes
remove entry of SPARK_LOCAL_IP and SPARK_MASTER_IP
add host name and private ip address of each other nodes in etc/hosts.
use --deploy-mode cluster --supervise
thats all and it works perfectly with fully HA components(Master,Slaves and Driver)
Thanks
Cluster mode is not supported in EC2 1.2 instances where it creates a standalone cluster. Hence you can try removing
--deploy-mode cluster --supervise
Related
I have a jar which runs fine on my host; specifically, when I run
java -jar myjar.jar
I get the expected output:
[2018-12-05 16:46:53.917] boot - 21252 INFO [main] --- Application: No active profile set, falling back to default profiles: default
[2018-12-05 16:47:00.855] boot - 21252 INFO [main] --- Application: Started Application in 8.176 seconds (JVM running for 9.106)
This is the Core Data Micro Service.
[2018-12-05 16:47:00.856] boot - 21252 INFO [main] --- Application: Registering to queue for events
[2018-12-05 16:47:00.857] boot - 21252 INFO [main] --- ZeroMQEventSubscriber: Getting subscriber, listening to tcp://localhost:5565
[2018-12-05 16:47:00.915] boot - 21252 INFO [main] --- ZeroMQEventSubscriber: Watching for new Event messages...
But then, I try to run the same jar inside a docker container. So I create the image like this:
FROM openjdk:8-jdk-alpine
COPY myjar.jar /opt/spring-cloud/lib/
ENTRYPOINT ["/usr/bin/java"]
CMD ["-jar", "/opt/spring-cloud/lib/myjar.jar"]
EXPOSE 48080
and run it:
sudo docker run [ID]
but this time, I get this exception from the container logs (this is only a part of the exception because it is too big, but I can show it all if needed):
[2018-12-07 08:30:31.447] boot - 1 INFO [main] --- Application: No active profile set, falling back to default profiles: default
[2018-12-07 08:32:35.423] boot - 1 ERROR [main] --- SpringApplication: Application startup failed
...
...
org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'readingControllerImpl': Injection of autowired dependencies failed; nested exception is org.springframework.beans.factory.BeanCreationException: Could not autowire field: org.edgexfoundry.dao.ValueDescriptorRepository org.edgexfoundry.controller.impl.ReadingControllerImpl.valDescRepos; nested exception is org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'valueDescriptorRepository': Invocation of init method failed; nested exception is org.springframework.dao.DataAccessResourceFailureException: Timed out after 120000 ms while waiting for a server that matches AnyServerSelector{}. Client view of cluster state is {type=Unknown, servers=[{address=localhost:27017, type=Unknown, state=Connecting, exception={com.mongodb.MongoException$Network: Exception opening the socket}, caused by {java.net.ConnectException: Connection refused (Connection refused)}}]; nested exception is com.mongodb.MongoTimeoutException: Timed out after 120000 ms while waiting for a server that matches AnyServerSelector{}. Client view of cluster state is {type=Unknown, servers=[{address=localhost:27017, type=Unknown, state=Connecting, exception={com.mongodb.MongoException$Network: Exception opening the socket}, caused by {java.net.ConnectException: Connection refused (Connection refused)}}]
at org.springframework.beans.factory.annotation.AutowiredAnnotationBeanPostProcessor.postProcessPropertyValues(AutowiredAnnotationBeanPostProcessor.java:334)
...
...
Caused by: com.mongodb.MongoTimeoutException: Timed out after 120000 ms while waiting for a server that matches AnyServerSelector{}. Client view of cluster state is {type=Unknown, servers=[{address=localhost:27017, type=Unknown, state=Connecting, exception={com.mongodb.MongoException$Network: Exception opening the socket}, caused by {java.net.ConnectException: Connection refused (Connection refused)}}]
at com.mongodb.BaseCluster.getServer(BaseCluster.java:82)
at com.mongodb.DBTCPConnector.getServer(DBTCPConnector.java:664)
at com.mongodb.DBTCPConnector.access$500(DBTCPConnector.java:40)
at com.mongodb.DBTCPConnector$MyPort.getConnection(DBTCPConnector.java:513)
at com.mongodb.DBTCPConnector$MyPort.get(DBTCPConnector.java:456)
at com.mongodb.DBTCPConnector.getPrimaryPort(DBTCPConnector.java:415)
at com.mongodb.DBCollectionImpl.createIndex(DBCollectionImpl.java:378)
at com.mongodb.DBCollection.createIndex(DBCollection.java:597)
at org.springframework.data.mongodb.core.index.MongoPersistentEntityIndexCreator.createIndex(MongoPersistentEntityIndexCreator.java:142)
... 57 more
Mongo has been started in another container through docker-compose (together with other services in other containers):
ps aux | grep mongo
root 16226 0.0 0.0 4340 768 ? Ss 10:27 0:00 /bin/sh -c /edgex/mongo/config/launch-edgex-mongo.sh
root 16292 0.0 0.0 4340 764 ? S 10:27 0:00 /bin/sh /edgex/mongo/config/launch-edgex-mongo.sh
root 16293 0.5 0.3 961168 61400 ? SLl 10:27 0:05 mongod --smallfiles
This is the docker-compose file:
version: '3'
services:
volume:
image: edgexfoundry/docker-edgex-volume:0.6.0
container_name: edgex-files
networks:
- edgex-network
volumes:
- db-data:/data/db
- log-data:/edgex/logs
- consul-config:/consul/config
- consul-data:/consul/data
mongo:
image: edgexfoundry/docker-edgex-mongo:0.6.0
ports:
- "27017:27017"
container_name: edgex-mongo
hostname: edgex-mongo
networks:
- edgex-network
volumes:
- db-data:/data/db
- log-data:/edgex/logs
- consul-config:/consul/config
- consul-data:/consul/data
depends_on:
- volume
.... more services...
networks:
edgex-network:
driver: "bridge
And the mongo db configuration properties:
spring.data.mongodb.username=core
spring.data.mongodb.password=password
spring.data.mongodb.database=coredata
#change to localhost when running locally during development
# (or set hosts to point edgex-mongo to the mongo host
spring.data.mongodb.host=localhost
#spring.data.mongodb.host=edgex-mongo
spring.data.mongodb.port=27017
spring.data.mongodb.connectTimeout=120000
spring.data.mongodb.socketTimeout=60000
spring.data.mongodb.maxWaitTime=120000
spring.data.mongodb.socketKeepAlive=true
Any ideas what may be going wrong?
There are two things going wrong here, first of all spring tries to connect to your mongodb on localhost, within docker this does not work since localhost references to the current container where of course no mongodb is available. To fix this you have to comment out this line and uncomment the next line which lists the host as edgex-mongo which corresponds with the hostname of your mongodb container, so spring knows to connect to that container.
However when you would do this you would run into the issue that it would not recognize edgex-mongo since it has no connection to this container. edgex-mongo is inside a bridged network which requires you to add the spring container to this network by using the following command:
docker run --network edgex--network [image]
I hope this helps you
I am trying to set up a Zookeeper cluster for Pulsar. I am following the instructions here, but I keep failing.
In my setup, I have two nodes, that should be part of the cluster. Since I need to deploy bookie to the same nodes, I executed
$ PULSAR_EXTRA_OPTS="-Dstats_server_port=8001" bin/pulsar-daemon start zookeeper
to start zookeeper. Afterwards, I am trying to init the cluster using this command:
bin/pulsar initialize-cluster-metadata \
--cluster pulsar-cluster-1 \
--zookeeper 10.100.100.77:2181 \
--configuration-store 10.100.100.77:2181 \
--web-service-url http://10.100.100.77:8080 \
--broker-service-url pulsar://10.100.100.77:6650 \
But I keep getting this error:
17:12:24.146 [main-SendThread(10.100.100.77:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket error occurred: 10.100.100.77/10.100.100.77:2181: Verbindungsaufbau abgelehnt
17:12:25.251 [main-SendThread(10.100.100.77:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server 10.100.100.77/10.100.100.77:2181. Will not attempt to authenticate using SASL (unknown error)
I read here that I need to have an odd number of nodes, so I added a virtual machine on one of the nodes. When I start Zookeeper on it, it doesn't print an error message, but but shows:
$ PULSAR_EXTRA_OPTS="-Dstats_server_port=8001" bin/pulsar-daemon start zookeeper
doing start zookeeper ...
starting zookeeper, logging to /home/host1/apache-pulsar-2.4.0/logs/pulsar-zookeeper-host1-VirtualBox.log
OpenJDK 64-Bit Server VM warning: Option AggressiveOpts was deprecated in version 11.0 and will likely be removed in a future release.
[AppClassLoader#27c170f0] info AspectJ Weaver Version 1.9.2 built on Wednesday Oct 24, 2018 at 15:43:33 GMT
[AppClassLoader#27c170f0] info register classloader jdk.internal.loader.ClassLoaders$AppClassLoader#27c170f0
[AppClassLoader#27c170f0] info using configuration file:/home/host1/apache-pulsar-2.4.0/lib/org.apache.pulsar-pulsar-zookeeper-utils-2.4.0.jar!/META-INF/aop.xml
[AppClassLoader#27c170f0] info using configuration file:/home/host1/apache-pulsar-2.4.0/lib/org.apache.pulsar-pulsar-zookeeper-2.4.0.jar!/META-INF/aop.xml
[AppClassLoader#27c170f0] info register aspect org.apache.pulsar.zookeeper.SerializeUtilsAspect
[AppClassLoader#27c170f0] info register aspect org.apache.pulsar.broker.zookeeper.aspectj.ClientCnxnAspect
However the Zookeeper service is not started, even if the setup is very similar to its host and I can't make up why.
Any Ideas how I could proceed from here? Thanks in advance!
The first error you posted seems to indicate that the connection to 10.100.100.77:2181 is refused "Verbindungsaufbau abgelehnt", and therefore the ZK server isn't listening on that server and port. You should first confirm that ZK is up and running and check the ZK log for any errors.
HTH
I found the soulution. The original error was indeed caused by having an odd number of nodes. The third (virtual) one wouldn't start, because of a mislocation of Zookepers data-directory. Now that the third server started, also the configuration passed successfully.
Using win7-64, jdk8, sparks1.6.2.
I have spark running, winutils, HADOOP_HOME, etc
Per documentation Note: The launch scripts do not currently support Windows. To run a Spark cluster on Windows, start the master and workers by hand. But does not say how?
How do I launch spark master on windows?
Tried running sh start-master.sh thru git bash : failed to launch org.apache.spark.deploy.master.Master: Even though it prints out Master --ip Sam-Toshiba --port 7077 --webui-port 8080 - So I don't know what all this means.
But when I try spark-submit --class " " --master spark://Sam-Toshiba:7077 target/ .jar -
I get errors:
WARN AbstractLifeCycle: FAILED SelectChannelConnector#0.0.0.0:
4040: java.net.BindException: Address already in use: bind
java.net.BindException: Address already in use
WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
17/01/12 14:44:29 WARN AppClient$ClientEndpoint: Failed to connect to master Sam-Toshiba:7077
java.io.IOException: Failed to connect to Sam-Toshiba/192.168.137.1:7077
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:216)
Also tried spark://localhost:7077 - same errors
On Windows you can launch Master using below command. Open command prompt and go to Spark bin folder and execute
spark-class.cmd org.apache.spark.deploy.master.Master
Above command will print like Master: Starting Spark master at spark://192.168.99.1:7077 in console as per IP of your machine. You can check the UI at http://192.168.99.1:8080/
If you want to launch worker once your master is up you can use below command. This will use all the available cores of your machine.
spark-class.cmd org.apache.spark.deploy.worker.Worker spark://192.168.99.1:7077
If you want to utilize 2 cores of your 4 cores of machine then use
spark-class.cmd org.apache.spark.deploy.worker.Worker -c 2 spark://192.168.99.1:7077
I want to connect to HBase running in standalone in a docker, using Java and the HBase API
I use this code to connect :
Configuration config = HBaseConfiguration.create();
config.set("hbase.zookeeper.quorum", "163.172.142.199");
config.set("hbase.zookeeper.property.clientPort","2181");
HBaseAdmin.checkHBaseAvailable(config);
Here is my /etc/hosts file
127.0.0.1 localhost
XXX.XXX.XXX.XXX hbase-srv
Here is the /etc/hosts file from my docker (named hbase-srv)
XXX.XXX.XXX.XXX hbase-srv
With this configuration, I get a connection refused error :
INFO | Initiating client connection, connectString=163.172.142.199:2181 sessionTimeout=90000 watcher=hconnection-0x6aba2b860x0, quorum=163.172.142.199:2181, baseZNode=/hbase
INFO | Opening socket connection to server 163.172.142.199/163.172.142.199:2181. Will not attempt to authenticate using SASL (unknown error)
INFO | Socket connection established to 163.172.142.199/163.172.142.199:2181, initiating session
INFO | Session establishment complete on server 163.172.142.199/163.172.142.199:2181, sessionid = 0x15602f8d8dc0002, negotiated timeout = 40000
INFO | Closing zookeeper sessionid=0x15602f8d8dc0002
INFO | Session: 0x15602f8d8dc0002 closed
INFO | EventThread shut down
org.apache.hadoop.hbase.MasterNotRunningException: com.google.protobuf.ServiceException: java.net.ConnectException: Connection refused
at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation$StubMaker.makeStub(ConnectionManager.java:1560)
at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation$MasterServiceStubMaker.makeStub(ConnectionManager.java:1580)
at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getKeepAliveMasterService(ConnectionManager.java:1737)
at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.isMasterRunning(ConnectionManager.java:948)
at org.apache.hadoop.hbase.client.HBaseAdmin.checkHBaseAvailable(HBaseAdmin.java:3159)
at hbase.Benchmark.main(Benchmark.java:26)
However, if I remove the lines XXX.XXX.XXX.XXX hbase-srv from both /etc/hosts files I get the error unknown host : hbase-srv
I have also checked, I can successfully telnet to my hbase docker on the client port.
On the docker, all the ports used by HBase are opened and binded to the same number (60000 on 60000, 2181 on 2181, etc).
I also wanted to add that all was fine when I used this configuration on localhost.
If you can't give me an answer to my problem, could you at least give me a procedure to deploy a standalone hbase on a docker.
UPDATE : Here is my Docker file
FROM java:openjdk-8
ADD hbase-1.2.1 /hbase-1.2.1
WORKDIR /hbase-1.2.1
# ZooKeeper
EXPOSE 2181
# HMaster
EXPOSE 60000
# HMaster Web
EXPOSE 60010
# RegionServer
EXPOSE 60020
# RegionServer Web
EXPOSE 60030
EXPOSE 16010
RUN chmod 755 /hbase-1.2.1/bin/start-hbase.sh
CMD ["/hbase-1.2.1/bin/start-hbase.sh"]
My HBase shell is working, I also tried to open the port using iptables for tcp and udp but still the same problem
There are two problems with your Dockerfile:
use hbase master start instead of start-hbase.sh
regionserver is actually not running on 60020
The 2nd problem is not so easy to solve. If run hbase standalone with version >= 1.2.0 (not sure, I'm running 1.2.0), hbase will use ephemeral port instead of the default port or the port you provide in hbase-site.xml which makes it very hard to provide hbase service in docker using the original version.
I add a property named hbase.localcluster.port.ephemeral and managed to build a standalone hbase in docker, which you can reference here.
I've been trying to run examples from HBase-The definitve guide and i've been encountering with this error and i'm not able to get past it. I'm running in Stand alone mode if that helps.
Exception in thread "main" org.apache.hadoop.hbase.MasterNotRunningException: �
17136#ubuntulocalhost,32992,1373877731444
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getMaster(HConnectionManager.java:615)
at org.apache.hadoop.hbase.client.HBaseAdmin.<init>(HBaseAdmin.java:94)
at util.HBaseHelper.<init>(HBaseHelper.java:29)
at util.HBaseHelper.getHelper(HBaseHelper.java:33)
at client.PutExample.main(PutExample.java:22)
But my HMaster process is running:
hduser#ubuntu:/home/ubuntu/hbase-book/ch03$ jps
17602 Jps
8709 NameNode
8929 DataNode
9472 TaskTracker
9252 JobTracker
9172 SecondaryNameNode
17136 HMaster
This is my hbase-site.xml file:
<configuration>
<property>
<name>hbase.rootdir</name>
<value>file:///usr/local/hbase/hbase-data/</value>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/usr/local/hbase/zookeeper-data/</value>
This is my /etc/hosts file:
127.0.0.1 localhost
127.0.1.1 ubuntu
127.0.0.1 ubuntu.ubuntu-domain ubuntu
Specifically, i'm trying to run the 3rd chapter examples and i'm just not understanding why my setup is not running..
Any idea where i'm going wrong?
Edit: Here are the logs:
2013-07-15 03:56:32,663 INFO org.apache.zookeeper.server.NIOServerCnxnFactory: Accepted socket connection from /127.0.0.1:60119
2013-07-15 03:56:32,672 WARN org.apache.zookeeper.server.ZooKeeperServer: Connection request from old client /127.0.0.1:60119; will be dropped if server is in r-o mode
2013-07-15 03:56:32,672 INFO org.apache.zookeeper.server.ZooKeeperServer: Client attempting to establish new session at /127.0.0.1:60119
2013-07-15 03:56:32,674 INFO org.apache.zookeeper.server.ZooKeeperServer: Established session 0x13fe17e7f1d0006 with negotiated timeout 40000 for client /127.0.0.1:60119
2013-07-15 03:57:11,653 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Stats: total=1.17 MB, free=247.24 MB, max=248.41 MB, blocks=2, accesses=68, hits=55, hitRatio=80.88%, , cachingAccesses=61, cachingHits=53, cachingHitsRatio=86.88%, , evictions=0, evicted=6, evictedPerRun=Infinity
2013-07-15 03:57:14,333 WARN org.apache.zookeeper.server.NIOServerCnxn: caught end of stream exception
EndOfStreamException: Unable to read additional data from client sessionid 0x13fe17e7f1d0006, likely client has closed socket
at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:220)
at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
at java.lang.Thread.run(Thread.java:724)
2013-07-15 03:57:14,334 INFO org.apache.zookeeper.server.NIOServerCnxn: Closed socket connection for client /127.0.0.1:60119 which had sessionid 0x13fe17e7f1d0006
2013-07-15 03:57:24,551 INFO org.apache.hadoop.hbase.master.LoadBalancer: Skipping load balancing because balanced cluster; servers=1 regions=1 average=1.0 mostloaded=1 leastloaded=1
2013-07-15 03:57:24,568 DEBUG org.apache.hadoop.hbase.client.MetaScanner: Scanning .META. starting at row= for max=2147483647 rows using org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation#189ddf
2013-07-15 03:57:24,578 DEBUG org.apache.hadoop.hbase.master.CatalogJanitor: Scanned 1 catalog row(s) and gc'd 0 unreferenced parent region(s)
It has nothing to do with standalone or distributed mode. Make sure your setup is working fine. I can see that RegionServer and Zookeeper are not running. Comment out the line 127.0.1.1 ubuntu in your /etc/hosts file and restart HBase. You might have to kill it.
P.S : Since you already have Hadoop configured and it is running fine, you can run HBase in pseudo-distributed setup.