Apache Spark (v1.6.1) started as service on Ubuntu (10.10.0.102) machine, using ./start-all.sh.
Now need to submit job to this server remotely using Java API.
Following is Java client code running from different machine (10.10.0.95).
String mySqlConnectionUrl = "jdbc:mysql://localhost:3306/demo?user=sec&password=sec";
String jars[] = new String[] {"/home/.m2/repository/com/databricks/spark-csv_2.10/1.4.0/spark-csv_2.10-1.4.0.jar",
"/home/.m2/repository/org/apache/commons/commons-csv/1.1/commons-csv-1.1.jar",
"/home/.m2/repository/mysql/mysql-connector-java/6.0.2/mysql-connector-java-6.0.2.jar"};
SparkConf sparkConf = new SparkConf()
.setAppName("sparkCSVWriter")
.setMaster("spark://10.10.0.102:7077")
.setJars(jars);
JavaSparkContext javaSparkContext = new JavaSparkContext(sparkConf);
SQLContext sqlContext = new SQLContext(javaSparkContext);
Map<String, String> options = new HashMap<>();
options.put("driver", "com.mysql.jdbc.Driver");
options.put("url", mySqlConnectionUrl);
options.put("dbtable", "(select p.FIRST_NAME from person p) as firstName");
DataFrame dataFrame = sqlContext.read().format("jdbc").options(options).load();
dataFrame.write()
.format("com.databricks.spark.csv")
.option("header", "true")
.option("delimiter", "|")
.option("quote", "\"")
.option("quoteMode", QuoteMode.NON_NUMERIC.toString())
.option("escape", "\\")
.save("persons.csv");
Configuration hadoopConfiguration = javaSparkContext.hadoopConfiguration();
FileSystem hdfs = FileSystem.get(hadoopConfiguration);
FileUtil.copyMerge(hdfs, new Path("persons.csv"), hdfs, new Path("\home\persons1.csv"), true, hadoopConfiguration, new String());
As per code need to convert RDBMS data to csv/json using Spark. But when I run this client application, able to connect to remote spark server but in console continuously receiving following WARN message
WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
And at server side on Spark UI in running applications > executor summary > stderr log, received following error.
Exception in thread "main" java.io.IOException: Failed to connect to /192.168.56.1:53112
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:216)
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:167)
at org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:200)
at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:187)
at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:183)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.ConnectException: Connection refused: /192.168.56.1:53112
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:224)
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:289)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:528)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
... 1 more
But there no any IP address configured as 192.168.56.1. So is there any configuration missing.
Actually my client machine(10.10.0.95) is Windows machine. When I tried to submit Spark job using another Ubuntu machine(10.10.0.155), I am able to run same Java client code successfully.
As I debugged in Windows client environment, when I submit spark job following log displayed,
INFO Remoting: Starting remoting
INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem#192.168.56.1:61552]
INFO Utils: Successfully started service 'sparkDriverActorSystem' on port 61552.
INFO MemoryStore: MemoryStore started with capacity 2.4 GB
INFO SparkEnv: Registering OutputCommitCoordinator
INFO Utils: Successfully started service 'SparkUI' on port 4044.
INFO SparkUI: Started SparkUI at http://192.168.56.1:4044
As per log line number 2, its register client with 192.168.56.1.
Elsewhere, in Ubuntu client
INFO Remoting: Starting remoting
INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem#10.10.0.155:42786]
INFO Utils: Successfully started service 'sparkDriverActorSystem' on port 42786.
INFO MemoryStore: MemoryStore started with capacity 511.1 MB
INFO SparkEnv: Registering OutputCommitCoordinator
INFO Utils: Successfully started service 'SparkUI' on port 4040.
INFO SparkUI: Started SparkUI at http://10.10.0.155:4040
As per log line number 2, its register client with 10.10.0.155 same as actual IP address.
If anybody find what is the problem with Windows client, let community know.
[UPDATE]
I am running this whole environment in Virtual Box. Windows machine is my host and Ubuntu is guest. And Spark installed in Ubuntu machine. In Virtual box environment Virtual box installing Ethernet adapter VirtualBox Host-Only Netwotk with IPv4 address : 192.168.56.1. And Spark registering this IP as client IP instead of actual IP address 10.10.0.95.
Related
My spark worker cannot connect to master. I've tried:
adding spark-env.sh like some answers I found, set SPARK_MASTER_HOST=IP
restarted both the master and worker
None of them worked.
Here's what I did:
start spark master
starting org.apache.spark.deploy.master.Master, logging to /<MY-PATH>/spark-3.1.2-bin-hadoop3.2/logs/spark-<MY-USERNAME>-org.apache.spark.deploy.master.Master-1-<SPARK_ID>.out
start spark worker: spark://<SPARK_ID>:7077
starting org.apache.spark.deploy.worker.Worker, logging to /<MY-PATH>/spark-3.1.2-bin-hadoop3.2/logs/spark-<MY-USERNAME>-org.apache.spark.deploy.worker.Worker-1-<SPARK_ID>.out
The spark master starts and I can get the master's URL. The worker also starts. However, it cannot connect to the master because:
21/08/06 15:33:07 INFO Worker: Connecting to master <SPARK_ID>:7077...
21/08/06 15:33:10 WARN Worker: Failed to connect to master <SPARK_ID>:7077
org.apache.spark.SparkException: Exception thrown in awaitResult:
at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:301)
at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:101)
at org.apache.spark.rpc.RpcEnv.setupEndpointRef(RpcEnv.scala:109)
at org.apache.spark.deploy.worker.Worker$$anon$1.run(Worker.scala:298)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: Failed to connect to <SPARK_ID>/192.168.1.175:7077
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:287)
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:218)
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:230)
at org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:204)
at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:202)
at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:198)
... 4 more
Caused by: io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: <SPARK_ID>/192.168.1.175:7077
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:715)
at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:330)
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:334)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:702)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:650)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:576)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493)
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:748)
21/08/06 15:33:12 INFO Worker: Retrying connection to master (attempt # 4)
21/08/06 15:33:12 INFO Worker: Connecting to master <SPARK_ID>:7077...
21/08/06 15:33:13 ERROR Worker: RECEIVED SIGNAL TERM
Check if master is running by checking log files under $SPARK_HOME/logs.
You should see logs as shown below:
21/08/07 02:56:04 INFO Utils: Successfully started service 'sparkMaster' on port 7077.
21/08/07 02:56:04 INFO Master: Starting Spark master at spark://<your host>:7077
21/08/07 02:56:05 INFO Master: Running Spark version 3.1.2
21/08/07 02:56:05 INFO Utils: Successfully started service 'MasterUI' on port 8080.
21/08/07 02:56:05 INFO MasterWebUI: Bound MasterWebUI to 0.0.0.0, and started at http://<your IP>:8080
21/08/07 02:56:05 INFO Master: I have been elected leader! New state: ALIVE
while starting worker pass master url which you get from above log.
I want to connect to HBase running in standalone in a docker, using Java and the HBase API
I use this code to connect :
Configuration config = HBaseConfiguration.create();
config.set("hbase.zookeeper.quorum", "163.172.142.199");
config.set("hbase.zookeeper.property.clientPort","2181");
HBaseAdmin.checkHBaseAvailable(config);
Here is my /etc/hosts file
127.0.0.1 localhost
XXX.XXX.XXX.XXX hbase-srv
Here is the /etc/hosts file from my docker (named hbase-srv)
XXX.XXX.XXX.XXX hbase-srv
With this configuration, I get a connection refused error :
INFO | Initiating client connection, connectString=163.172.142.199:2181 sessionTimeout=90000 watcher=hconnection-0x6aba2b860x0, quorum=163.172.142.199:2181, baseZNode=/hbase
INFO | Opening socket connection to server 163.172.142.199/163.172.142.199:2181. Will not attempt to authenticate using SASL (unknown error)
INFO | Socket connection established to 163.172.142.199/163.172.142.199:2181, initiating session
INFO | Session establishment complete on server 163.172.142.199/163.172.142.199:2181, sessionid = 0x15602f8d8dc0002, negotiated timeout = 40000
INFO | Closing zookeeper sessionid=0x15602f8d8dc0002
INFO | Session: 0x15602f8d8dc0002 closed
INFO | EventThread shut down
org.apache.hadoop.hbase.MasterNotRunningException: com.google.protobuf.ServiceException: java.net.ConnectException: Connection refused
at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation$StubMaker.makeStub(ConnectionManager.java:1560)
at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation$MasterServiceStubMaker.makeStub(ConnectionManager.java:1580)
at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getKeepAliveMasterService(ConnectionManager.java:1737)
at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.isMasterRunning(ConnectionManager.java:948)
at org.apache.hadoop.hbase.client.HBaseAdmin.checkHBaseAvailable(HBaseAdmin.java:3159)
at hbase.Benchmark.main(Benchmark.java:26)
However, if I remove the lines XXX.XXX.XXX.XXX hbase-srv from both /etc/hosts files I get the error unknown host : hbase-srv
I have also checked, I can successfully telnet to my hbase docker on the client port.
On the docker, all the ports used by HBase are opened and binded to the same number (60000 on 60000, 2181 on 2181, etc).
I also wanted to add that all was fine when I used this configuration on localhost.
If you can't give me an answer to my problem, could you at least give me a procedure to deploy a standalone hbase on a docker.
UPDATE : Here is my Docker file
FROM java:openjdk-8
ADD hbase-1.2.1 /hbase-1.2.1
WORKDIR /hbase-1.2.1
# ZooKeeper
EXPOSE 2181
# HMaster
EXPOSE 60000
# HMaster Web
EXPOSE 60010
# RegionServer
EXPOSE 60020
# RegionServer Web
EXPOSE 60030
EXPOSE 16010
RUN chmod 755 /hbase-1.2.1/bin/start-hbase.sh
CMD ["/hbase-1.2.1/bin/start-hbase.sh"]
My HBase shell is working, I also tried to open the port using iptables for tcp and udp but still the same problem
There are two problems with your Dockerfile:
use hbase master start instead of start-hbase.sh
regionserver is actually not running on 60020
The 2nd problem is not so easy to solve. If run hbase standalone with version >= 1.2.0 (not sure, I'm running 1.2.0), hbase will use ephemeral port instead of the default port or the port you provide in hbase-site.xml which makes it very hard to provide hbase service in docker using the original version.
I add a property named hbase.localcluster.port.ephemeral and managed to build a standalone hbase in docker, which you can reference here.
I have setup a cluster on 12 machine and the spark workers on slave machines could disassociate with the master everyday. That means they could looks work well for a period of time during a day, but then the slaves would all disassociate and then be shutdowned.
The log of the worker would look like below:
16/03/07 12:45:34.828 INFO Worker: Retrying connection to master (attempt # 1)
16/03/07 12:45:34.830 INFO Worker: Connecting to master host1:7077...
16/03/07 12:45:34.826 INFO Worker: Retrying connection to master (attempt # 2)
16/03/07 12:45:45.830 ERROR SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[sparkWorker-akka.actor.default-dispatcher-2,5,main]
java.util.concurrent.RejectedExecutionException: Task java.util.concurrent.FutureTask#1c5651e9 rejected from java.util.concurrent.ThreadPoolExecutor#671ba687[Running, pool size = 1, active threads = 1, queued tasks = 0, completed tasks = 2]
at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2048)
...
16/03/07 12:45:45.853 Info ExecutorRunner: Killing process!
16/03/07 12:45:45.855 INFO ShutdownHookManager: Shutdown hook called
The log of the master would look like below:
16/03/07 12:45:45.878 INFO Master: 10.126.217.11:51502 got disassociated, removing it.
16/03/07 12:45:45.878 INFO Master: Removing worker worker-20160303035822-10.126.217.11-51502 on 10.126.217.11:51502
Information for the machines:
40 cores and 256GB memory per machine
spark version: 1.5.1
java version: 1.8.0_45
The spark application runs on this cluster with configuration below:
spark.cores.max=360
spark.executor.memory=32g
Is it a memory issue on slave or on master machine?
Or is it a network issue between slaves and master machine?
Or any other problems?
Please advise.
thanks
Spark version : 1.4.1
I am running below driver program and try to create a Spark Context for a remote spark standalone cluster. At the time remote spark cluster is not available following errors are reporting by the spark. What would be the reason for exceptions (
ERROR OneForOneStrategy:
java.lang.NullPointerException
and
ERROR SparkContext: Error initializing SparkContext.
java.lang.IllegalStateException: Cannot call methods on a stopped SparkContext
) and how I could eliminate them ?
Note : when I am using running local or remote standalone cluster, my application is working fine.
Java Program
public class SparkContextTest {
/**
* #param args the command line arguments
*/
public static void main(String[] args) {
JavaSparkContext sc = null;
try {
SparkConf conf = new SparkConf();
conf.setAppName("testABC");
conf.set("spark.scheduler.mode", "FAIR");
conf.setMaster("spark://remote-server:7077")
.set("spark.driver.allowMultipleContexts", "false")
.set("spark.executor.memory", "1g")
.set("spark.driver.maxResultSize", "1g");
sc = new JavaSparkContext(conf);
} catch(Exception e){
e.printStackTrace();
} finally {
if(null != sc){
sc.clearCallSite();
sc.close();
}
}
}
}
Spark log
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/data/downloads/spark-1.4.1-bin-hadoop2.6/lib/spark-assembly-1.4.1-hadoop2.6.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/data/downloads/spark-1.4.1-bin-hadoop2.6/lib/spark-examples-1.4.1-hadoop2.6.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
15/12/09 17:11:23 INFO SparkContext: Running Spark version 1.4.1
15/12/09 17:11:24 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/12/09 17:11:24 WARN Utils: Your hostname, pesamara-mobl-vm1 resolves to a loopback address: 127.0.0.1; using 10.30.9.107 instead (on interface eth0)
15/12/09 17:11:24 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
15/12/09 17:11:25 INFO SecurityManager: Changing view acls to: pes
15/12/09 17:11:25 INFO SecurityManager: Changing modify acls to: pes
15/12/09 17:11:25 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(pes); users with modify permissions: Set(pes)
15/12/09 17:11:26 INFO Slf4jLogger: Slf4jLogger started
15/12/09 17:11:27 INFO Remoting: Starting remoting
15/12/09 17:11:27 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver#10.30.9.107:55740]
15/12/09 17:11:27 INFO Utils: Successfully started service 'sparkDriver' on port 55740.
15/12/09 17:11:27 INFO SparkEnv: Registering MapOutputTracker
15/12/09 17:11:27 INFO SparkEnv: Registering BlockManagerMaster
15/12/09 17:11:27 INFO DiskBlockManager: Created local directory at /tmp/spark-30d61b03-0b1c-4250-b68e-c2404c7884a8/blockmgr-3226ed7e-f8e5-40a2-bfb1-ffabb51cd0e0
15/12/09 17:11:28 INFO MemoryStore: MemoryStore started with capacity 491.5 MB
15/12/09 17:11:28 INFO HttpFileServer: HTTP File server directory is /tmp/spark-30d61b03-0b1c-4250-b68e-c2404c7884a8/httpd-7f2572c2-5677-446e-a80a-6f9d05ee2891
15/12/09 17:11:28 INFO HttpServer: Starting HTTP Server
15/12/09 17:11:28 INFO Utils: Successfully started service 'HTTP file server' on port 45047.
15/12/09 17:11:28 INFO SparkEnv: Registering OutputCommitCoordinator
15/12/09 17:11:28 INFO Utils: Successfully started service 'SparkUI' on port 4040.
15/12/09 17:11:28 INFO SparkUI: Started SparkUI at http://10.30.9.107:4040
15/12/09 17:11:29 INFO FairSchedulableBuilder: Created default pool default, schedulingMode: FIFO, minShare: 0, weight: 1
15/12/09 17:11:29 INFO AppClient$ClientActor: Connecting to master akka.tcp://sparkMaster#localhost2:7077/user/Master...
15/12/09 17:11:29 WARN AppClient$ClientActor: Could not connect to akka.tcp://sparkMaster#localhost2:7077: akka.remote.InvalidAssociation: Invalid address: akka.tcp://sparkMaster#localhost2:7077
15/12/09 17:11:29 WARN Remoting: Tried to associate with unreachable remote address [akka.tcp://sparkMaster#localhost2:7077]. Address is now gated for 5000 ms, all messages to this address will be delivered to dead letters. Reason: localhost2: unknown error
15/12/09 17:11:49 INFO AppClient$ClientActor: Connecting to master akka.tcp://sparkMaster#localhost2:7077/user/Master...
15/12/09 17:11:49 WARN AppClient$ClientActor: Could not connect to akka.tcp://sparkMaster#localhost2:7077: akka.remote.InvalidAssociation: Invalid address: akka.tcp://sparkMaster#localhost2:7077
15/12/09 17:11:49 WARN Remoting: Tried to associate with unreachable remote address [akka.tcp://sparkMaster#localhost2:7077]. Address is now gated for 5000 ms, all messages to this address will be delivered to dead letters. Reason: localhost2: unknown error
15/12/09 17:12:09 INFO AppClient$ClientActor: Connecting to master akka.tcp://sparkMaster#localhost2:7077/user/Master...
15/12/09 17:12:09 WARN AppClient$ClientActor: Could not connect to akka.tcp://sparkMaster#localhost2:7077: akka.remote.InvalidAssociation: Invalid address: akka.tcp://sparkMaster#localhost2:7077
15/12/09 17:12:09 WARN Remoting: Tried to associate with unreachable remote address [akka.tcp://sparkMaster#localhost2:7077]. Address is now gated for 5000 ms, all messages to this address will be delivered to dead letters. Reason: localhost2: unknown error
15/12/09 17:12:29 ERROR SparkDeploySchedulerBackend: Application has been killed. Reason: All masters are unresponsive! Giving up.
15/12/09 17:12:29 WARN SparkDeploySchedulerBackend: Application ID is not initialized yet.
15/12/09 17:12:29 INFO SparkUI: Stopped Spark web UI at http://10.30.9.107:4040
15/12/09 17:12:29 INFO DAGScheduler: Stopping DAGScheduler
15/12/09 17:12:29 INFO SparkDeploySchedulerBackend: Shutting down all executors
15/12/09 17:12:29 INFO SparkDeploySchedulerBackend: Asking each executor to shut down
15/12/09 17:12:29 ERROR OneForOneStrategy:
java.lang.NullPointerException
at org.apache.spark.deploy.client.AppClient$ClientActor$$anonfun$receiveWithLogging$1.applyOrElse(AppClient.scala:160)
at scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
at org.apache.spark.util.ActorLogReceive$$anon$1.apply(ActorLogReceive.scala:59)
at org.apache.spark.util.ActorLogReceive$$anon$1.apply(ActorLogReceive.scala:42)
at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118)
at org.apache.spark.util.ActorLogReceive$$anon$1.applyOrElse(ActorLogReceive.scala:42)
at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
at org.apache.spark.deploy.client.AppClient$ClientActor.aroundReceive(AppClient.scala:61)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
at akka.actor.ActorCell.invoke(ActorCell.scala:487)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
at akka.dispatch.Mailbox.run(Mailbox.scala:220)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
15/12/09 17:12:29 INFO AppClient$ClientActor: Connecting to master akka.tcp://sparkMaster#localhost2:7077/user/Master...
15/12/09 17:12:29 WARN AppClient$ClientActor: Could not connect to akka.tcp://sparkMaster#localhost2:7077: akka.remote.InvalidAssociation: Invalid address: akka.tcp://sparkMaster#localhost2:7077
15/12/09 17:12:29 WARN Remoting: Tried to associate with unreachable remote address [akka.tcp://sparkMaster#localhost2:7077]. Address is now gated for 5000 ms, all messages to this address will be delivered to dead letters. Reason: localhost2: unknown error
15/12/09 17:12:29 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 54184.
15/12/09 17:12:29 INFO NettyBlockTransferService: Server created on 54184
15/12/09 17:12:29 INFO BlockManagerMaster: Trying to register BlockManager
15/12/09 17:12:29 INFO BlockManagerMasterEndpoint: Registering block manager 10.30.9.107:54184 with 491.5 MB RAM, BlockManagerId(driver, 10.30.9.107, 54184)
15/12/09 17:12:29 INFO BlockManagerMaster: Registered BlockManager
15/12/09 17:12:30 ERROR SparkContext: Error initializing SparkContext.
java.lang.IllegalStateException: Cannot call methods on a stopped SparkContext
at org.apache.spark.SparkContext.org$apache$spark$SparkContext$$assertNotStopped(SparkContext.scala:103)
at org.apache.spark.SparkContext.getSchedulingMode(SparkContext.scala:1503)
at org.apache.spark.SparkContext.postEnvironmentUpdate(SparkContext.scala:2007)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:543)
at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:61)
at org.apache.spark.examples.sql.SparkContextTest.main(SparkContextTest.java:32)
15/12/09 17:12:30 INFO SparkContext: SparkContext already stopped.
java.lang.IllegalStateException: Cannot call methods on a stopped SparkContext
at org.apache.spark.SparkContext.org$apache$spark$SparkContext$$assertNotStopped(SparkContext.scala:103)
at org.apache.spark.SparkContext.getSchedulingMode(SparkContext.scala:1503)
at org.apache.spark.SparkContext.postEnvironmentUpdate(SparkContext.scala:2007)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:543)
at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:61)
at org.apache.spark.examples.sql.SparkContextTest.main(SparkContextTest.java:32)
15/12/09 17:12:30 INFO DiskBlockManager: Shutdown hook called
15/12/09 17:12:30 INFO Utils: path = /tmp/spark-30d61b03-0b1c-4250-b68e-c2404c7884a8/blockmgr-3226ed7e-f8e5-40a2-bfb1-ffabb51cd0e0, already present as root for deletion.
15/12/09 17:12:30 INFO Utils: Shutdown hook called
15/12/09 17:12:30 INFO Utils: Deleting directory /tmp/spark-30d61b03-0b1c-4250-b68e-c2404c7884a8/httpd-7f2572c2-5677-446e-a80a-6f9d05ee2891
15/12/09 17:12:30 INFO Utils: Deleting directory /tmp/spark-30d61b03-0b1c-4250-b68e-c2404c7884a8
I'm running a single node application with Spark on a machine with 32 GB RAM.
More than 12GB of the memory is available at the time I'm running the applicaton.
But From the spark UI and logs, I see that it using 3.8GB of RAM (which is gradually decreased as the jobs run).
At this time this is logged, 5GB more memory is avilable. Where as Spark is using 3.8GB
UPDATE
I set these parameters in conf/spark-env.sh but still each time I run the application It is using exactly 3.8 GB
export SPARK_WORKER_MEMORY=6g
export SPARK_MEM=6g
export SPARK_DAEMON_MEMORY=6g
Log
2015-11-19 13:05:41,701 INFO org.apache.spark.SparkEnv.logInfo:59 - Registering MapOutputTracker
2015-11-19 13:05:41,716 INFO org.apache.spark.SparkEnv.logInfo:59 - Registering BlockManagerMaster
2015-11-19 13:05:41,735 INFO org.apache.spark.storage.DiskBlockManager.logInfo:59 - Created local directory at /usr/local/TC_SPARCDC_COM/temp/blockmgr-8513cd3b-ac03-4c0a-b291-65aba4cbc395
2015-11-19 13:05:41,746 INFO org.apache.spark.storage.MemoryStore.logInfo:59 - MemoryStore started with capacity 3.8 GB
2015-11-19 13:05:41,777 INFO org.apache.spark.HttpFileServer.logInfo:59 - HTTP File server directory is /usr/local/TC_SPARCDC_COM/temp/spark-b86380c2-4cbd-43d6-a3b7-aa03d9a05a84/httpd-ceaffbd0-eac4-447e-9d3f-c452627a28cb
2015-11-19 13:05:41,781 INFO org.apache.spark.HttpServer.logInfo:59 - Starting HTTP Server
2015-11-19 13:05:41,842 INFO org.spark-project.jetty.server.Server.doStart:272 - jetty-8.y.z-SNAPSHOT
2015-11-19 13:05:41,854 INFO org.spark-project.jetty.server.AbstractConnector.doStart:338 - Started SocketConnector#0.0.0.0:5279
2015-11-19 13:05:41,855 INFO org.apache.spark.util.Utils.logInfo:59 - Successfully started service 'HTTP file server' on port 5279.
2015-11-19 13:05:41,867 INFO org.apache.spark.SparkEnv.logInfo:59 - Registering OutputCommitCoordinator
2015-11-19 13:05:42,013 INFO org.spark-project.jetty.server.Server.doStart:272 - jetty-8.y.z-SNAPSHOT
2015-11-19 13:05:42,039 INFO org.spark-project.jetty.server.AbstractConnector.doStart:338 - Started SelectChannelConnector#0.0.0.0:4040
2015-11-19 13:05:42,039 INFO org.apache.spark.util.Utils.logInfo:59 - Successfully started service 'SparkUI' on port 4040.
2015-11-19 13:05:42,041 INFO org.apache.spark.ui.SparkUI.logInfo:59 - Started SparkUI at http://103.252.184.181:4040
2015-11-19 13:05:42,114 WARN org.apache.spark.metrics.MetricsSystem.logWarning:71 - Using default name DAGScheduler for source because spark.app.id is not set.
2015-11-19 13:05:42,117 INFO org.apache.spark.executor.Executor.logInfo:59 - Starting executor ID driver on host localhost
2015-11-19 13:05:42,307 INFO org.apache.spark.util.Utils.logInfo:59 - Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 31334.
2015-11-19 13:05:42,308 INFO org.apache.spark.network.netty.NettyBlockTransferService.logInfo:59 - Server created on 31334
2015-11-19 13:05:42,309 INFO org.apache.spark.storage.BlockManagerMaster.logInfo:59 - Trying to register BlockManager
2015-11-19 13:05:42,312 INFO org.apache.spark.storage.BlockManagerMasterEndpoint.logInfo:59 - Registering block manager localhost:31334 with 3.8 GB RAM, BlockManagerId(driver, localhost, 31334)
2015-11-19 13:05:42,313 INFO org.apache.spark.storage.BlockManagerMaster.logInfo:59 - Registered BlockManager
If you are using SparkSubmit you can use the --executor-memory and --driver-memory flags. Otherwise, change these configurations spark.executor.memory and spark.driver.memory either directly in your program or in spark-defaults.
Note that you should not set memory too high. As a rule of thumb, aim for ~75% of available memory. That will leave enough memory for other processes (like your OS) running on your machines.
It is correctly stated by #Glennie Helles Sindholt but setting driver flags while submitting jobs on a standalone machine won't affect the usage as the JVM has been already been initialized. Checkout this link of discussion:
How to set Apache Spark Executor memory
If you are using Spark submit command to submit a job following is an example for how to set parameters while submitting the job:
spark-submit --master spark://127.0.0.1:7077 \
--num-executors 2 \
--executor-cores 8 \
--executor-memory 3g \
--class <Class name> \
$JAR_FILE_NAME or path \
/path-to-input \
/path-to-output \
By varying the number of parameters in this you can see and understand how the usage of RAM is changing. Also, there is a utility named htop on Linux. It is useful to instantaneous usage of memory, CPU cores and Swap space to have an understanding of what is happening. To install htop, use the following:
sudo apt-get install htop
It will look something like this:
htop utility
For more information you can check out the following links:
https://spark.apache.org/docs/latest/configuration.html