I'm trying to submit a teragen job to YARN like this:
yarn jar $YARN_EXAMPLES/hadoop-mapreduce-examples-3.3.1.jar teragen 1000 /teragen
It all goes well until it errors out:
2021-11-04 23:45:20,540 INFO mapreduce.Job: Running job: job_1636069364859_0003
2021-11-04 23:45:25,629 INFO mapreduce.Job: Job job_1636069364859_0003 running in uber mode : false
2021-11-04 23:45:25,630 INFO mapreduce.Job: map 0% reduce 0%
2021-11-04 23:45:27,658 INFO mapreduce.Job: Task Id : attempt_1636069364859_0003_m_000000_0, Status : FAILED
[2021-11-04 23:45:26.200]Exception from container-launch.
Container id: container_1636069364859_0003_01_000002
Exit code: 127
[2021-11-04 23:45:26.201]Container exited with a non-zero exit code 127. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
/bin/bash: line 1: m: command not found
I have no clue what the problem is. I've tried looking into the logs, especially the prelaunch.err file but it is empty. The stderr file has:
/bin/bash: line 1: m: command not found
Checking the node manager logs, I found this:
2021-11-04 23:44:05,765 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: s3a-file-system metrics system started
2021-11-04 23:44:06,423 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: Container container_1636069364859_0001_01_000002 transitioned from LOCALIZING to SCHEDULED
2021-11-04 23:44:06,423 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.scheduler.ContainerScheduler: Starting container [container_1636069364859_0001_01_000002]
2021-11-04 23:44:06,453 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: Container container_1636069364859_0001_01_000002 transitioned from SCHEDULED to RUNNING
2021-11-04 23:44:06,453 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Starting resource-monitoring for container_1636069364859_0001_01_000002
2021-11-04 23:44:06,457 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: launchContainer: [bash, /opt/yarn/local/usercache/vagrant/appcache/application_1636069364859_0001/container_1636069364859_0001_01_000002/default_container_executor.sh]
2021-11-04 23:44:06,477 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit code from container container_1636069364859_0001_01_000002 is : 127
2021-11-04 23:44:06,478 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exception from container-launch with container ID: container_1636069364859_0001_01_000002 and exit code: 127
ExitCodeException exitCode=127:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:1008)
at org.apache.hadoop.util.Shell.run(Shell.java:901)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1213)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:309)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.launchContainer(ContainerLaunch.java:585)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:373)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:103)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2021-11-04 23:44:06,479 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exception from container-launch.
2021-11-04 23:44:06,479 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Container id: container_1636069364859_0001_01_000002
2021-11-04 23:44:06,479 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exit code: 127
2021-11-04 23:44:06,479 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Container launch failed : Container exited with a non-zero exit code 127.
2021-11-04 23:44:06,501 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: Container container_1636069364859_0001_01_000002 transitioned from RUNNING to EXITED_WITH_FAILURE
2021-11-04 23:44:06,503 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerCleanup: Cleaning up container container_1636069364859_0001_01_000002
2021-11-04 23:44:06,515 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping s3a-file-system metrics system...
2021-11-04 23:44:06,515 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: s3a-file-system metrics system stopped.
2021-11-04 23:44:06,515 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: s3a-file-system metrics system shutdown complete.
2021-11-04 23:44:06,525 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Deleting absolute path : /opt/yarn/local/usercache/vagrant/appcache/application_1636069364859_0001/container_1636069364859_0001_01_000002
2021-11-04 23:44:06,526 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: Container container_1636069364859_0001_01_000002 transitioned from EXITED_WITH_FAILURE to DONE
2021-11-04 23:44:06,526 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl: Removing container_1636069364859_0001_01_000002 from application application_1636069364859_0001
2021-11-04 23:44:06,526 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Stopping resource-monitoring for container_1636069364859_0001_01_000002
I've read other responses and when they mention that Java is missing or JAVA_HOME is not set. That's not my case, my JAVA_HOME is set to /usr/lib/jvm/java-8-openjdk-amd64.
Any idea what could be going on here? Thanks :)
The problem was with the memory allocated for each container. Some containers were not living long enough to actually log the error apparently.
But after several attempts I actually got an error that looked like this:
Error occurred during initialization of VM
Too small initial heap
For some reason, the memory configuration for YARN and MapReduce jobs I was using was not correct. I ended up using Ambari's HDP yarn-util.py to get the appropriate values for my setup.
Related
I am trying Hadoop map-reduce in Linux (Ubuntu Virtual Machine) by following the link
I ran the wordcount example on a sample file. The process gets killed unexpectedly. How can I debug this ?
Initially I was getting an insufficient memory error on large data set.
15/11/28 19:24:27 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ]
15/11/28 19:24:27 INFO mapred.MapTask: Processing split: hdfs://localhost:54310/user/hduser/eg2/a.txt:0+1538
Java HotSpot(TM) 64-Bit Server VM warning: INFO: os::commit_memory(0x00000000e6093000, 104861696, 0) failed; error='Cannot allocate memory' (errno=12)
#
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (malloc) failed to allocate 104861696 bytes for committing reserved memory.
# An error report file with more information is saved as:
# /usr/local/hadoop/hs_err_pid7516.log
So I reduced the size of my files and tried again which resulted in unexpected termination.
hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar wordcount /user/hduser/eg2/ /user/hduser/eg2/eg2-output2
......
......
15/11/28 18:55:44 INFO mapred.LocalJobRunner: Waiting for map tasks
15/11/28 18:55:44 INFO mapred.LocalJobRunner: Starting task: attempt_local1996683170_0001_m_000000_0
15/11/28 18:55:44 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ]
15/11/28 18:55:44 INFO mapred.MapTask: Processing split: hdfs://localhost:54310/user/hduser/eg2/a.txt:0+1538
15/11/28 18:55:45 INFO mapreduce.Job: Job job_local1996683170_0001 running in uber mode : false
15/11/28 18:55:45 INFO mapreduce.Job: map 0% reduce 0%
Killed
Why is the process getting terminated ?
Try:
Hadoop job -list
Kill all jobs and rerun it:
Hadoop job –kill <JobID>
Try checking the logs of job tracker for error
http://localhost:50070/ – web UI of the NameNode daemon
http://localhost:50030/ – web UI of the JobTracker daemon
http://localhost:50060/ – web UI of the TaskTracker daemon
The size of the data set didn't matter. Hadoop didn't have enough memory to start. I tried increasing the memory of my VM and the issue got fixed.
I installed a cloudera cluster with a vagrant box.
I get an error when I launch the following example:
hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar grep input output23 'dfs[a-z.]+'
I went to check the log in /var/log/hadoop-yarn.
There several log file, in yarn-yarn-nodemanager-cdh-master.log, there is the following stackstrace:
2015-06-17 11:42:42,398 INFO SecurityLogger.org.apache.hadoop.ipc.Server: Auth successful for appattempt_1434535025160_0001_000001 (auth:SIMPLE)
2015-06-17 11:42:42,597 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Start request for container_1434535025160_0001_01_
000001 by user vagrant
2015-06-17 11:42:42,762 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Creating a new application reference for app appli
cation_1434535025160_0001
2015-06-17 11:42:42,776 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Application application_1434535025160_0001 tran
sitioned from NEW to INITING
2015-06-17 11:42:42,778 INFO org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=vagrant IP=10.10.50.5 OPERATION=Start Container Request
TARGET=ContainerManageImpl RESULT=SUCCESS APPID=application_1434535025160_0001 CONTAINERID=container_1434535025160_0001_01_000001
2015-06-17 11:42:43,997 FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: Error in dispatcher thread
java.lang.IllegalArgumentException: Wrong FS: hdfs://var/log/hadoop-yarn, expected: hdfs://cdh-master:8020
at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:645)
at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:193)
at org.apache.hadoop.hdfs.DistributedFileSystem.access$000(DistributedFileSystem.java:105)
at org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1128)
at org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1124)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1124)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.verifyAndCreateRemoteLogDir(LogAggregationService.java:192)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.initApp(LogAggregationService.java:319)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:443)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:67)
at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
at java.lang.Thread.run(Thread.java:744)
2015-06-17 11:42:44,000 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Adding container_1434535025160_0001_01_000001 t
o application application_1434535025160_0001
2015-06-17 11:42:44,001 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Exiting, bbye..
2015-06-17 11:42:44,034 INFO org.mortbay.log: Stopped HttpServer2$SelectChannelConnectorWithSafeStartup#0.0.0.0:8042
2015-06-17 11:42:44,035 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Applications still running : [application_14345350
I've seen this error
java.lang.IllegalArgumentException: Wrong FS:
hdfs://var/log/hadoop-yarn, expected: hdfs://cdh-master:8020
in the following post: Failed to start Jobtracker and Tasktracker in CDH pseudo cluster, but this did not helped me much.
Does anyone has an idea?
Thx
Change the property yarn.nodemanager.remote-app-log-dir in yarn-site.xml config file either to:
<property>
<name>yarn.nodemanager.remote-app-log-dir</name>
<value>hdfs://cdh-master:8020/var/log/hadoop-yarn/apps</value>
</property>
or
<property>
<name>yarn.nodemanager.remote-app-log-dir</name>
<value>/var/log/hadoop-yarn/apps</value>
</property>
The second option will use the default filesystem that should be set to HDFS anyway.
I successfully Build Tez-0.6.0 against Hadoop-2.5.2
Then I configured Tez-0.6.0 as like in http://tez.apache.org/install.html
Moved Tez lib package to HDFS location and updated my tez-site.xml
<property>
<name>tez.lib.uris</name>
<value>${fs.default.name}/apps/Tez/,${fs.default.name}/apps/Tez/lib/</value>
</property>
After that I tried the sample test for tez
hadoop jar tez-examples-0.6.0.jar orderedwordcount <input> <output>
But I face following error while running this command
Running OrderedWordCount
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/C:/Hadoop/
share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBind
er.class]
SLF4J: Found binding in [jar:file:/C:/Tez/lib
/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
15/04/15 10:47:57 INFO client.TezClient: Tez Client Version: [ component=tez-api
, version=0.6.0, revision=${buildNumber}, SCM-URL=scm:git:https://git-wip-us.apa
che.org/repos/asf/tez.git, buildTime=2015-04-15T01:13:02Z ]
15/04/15 10:48:00 INFO client.TezClient: Submitting DAG application with id: app
lication_1429073725727_0005
15/04/15 10:48:00 INFO Configuration.deprecation: fs.default.name is deprecated.
Instead, use fs.defaultFS
15/04/15 10:48:00 INFO client.TezClientUtils: Using tez.lib.uris value from conf
iguration: hdfs://HA-Cluster/apps/Tez/,hdfs://HA-Cluster/apps/Tez/lib/
15/04/15 10:48:01 INFO client.TezClient: Stage directory /tmp/app/tez/sta
ging doesn't exist and is created
15/04/15 10:48:01 INFO client.TezClient: Tez system stage directory hdfs://HA-cluster
/tmp/app/tez/staging/.tez/application_1429073725727_0005 doesn't ex
ist and is created
15/04/15 10:48:02 INFO client.TezClient: Submitting DAG to YARN, applicationId=a
pplication_1429073725727_0005, dagName=OrderedWordCount
15/04/15 10:48:03 INFO impl.YarnClientImpl: Submitted application application_14
29073725727_0005
15/04/15 10:48:03 INFO client.TezClient: The url to track the Tez AM: http://syn
cserver34:8088/proxy/application_1429073725727_0005/
15/04/15 10:48:03 INFO client.DAGClientImpl: Waiting for DAG to start running
15/04/15 10:48:09 INFO client.DAGClientImpl: DAG completed. FinalState=FAILED
OrderedWordCount failed with diagnostics: [Application application_1429073725727
_0005 failed 2 times due to AM Container for appattempt_1429073725727_0005_00000
2 exited with exitCode: -1073741515 due to: Exception from container-launch: Ex
itCodeException exitCode=-1073741515:
ExitCodeException exitCode=-1073741515:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
at org.apache.hadoop.util.Shell.run(Shell.java:455)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:
702)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.la
unchContainer(DefaultContainerExecutor.java:195)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.C
ontainerLaunch.call(ContainerLaunch.java:300)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.C
ontainerLaunch.call(ContainerLaunch.java:81)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.
java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
.java:615)
at java.lang.Thread.run(Thread.java:744)
1 file(s) moved.
Container exited with a non-zero exit code -1073741515
.Failing this attempt.. Failing the application.]
While Seeing at Resourcemanager log:
15/04/15 12:56:15 ERROR scheduler.SchedulerApplicationAttempt: Error trying to a
ssign container token and NM token to an allocated container container_142908227
1173_0001_01_000001
java.lang.IllegalArgumentException: java.net.UnknownHostException: MasterNode
at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUti
l.java:373)
at org.apache.hadoop.yarn.server.utils.BuilderUtils.newContainerToken(Bu
ilderUtils.java:247)
at org.apache.hadoop.yarn.server.resourcemanager.security.RMContainerTok
enSecretManager.createContainerToken(RMContainerTokenSecretManager.java:199)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerAppl
icationAttempt.pullNewlyAllocatedContainersAndNMTokens(SchedulerApplicationAttem
pt.java:425)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.F
iCaSchedulerApp.getAllocation(FiCaSchedulerApp.java:248)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.Capa
cityScheduler.allocate(CapacityScheduler.java:736)
at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAtte
mptImpl$AMContainerAllocatedTransition.transition(RMAppAttemptImpl.java:816)
at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAtte
mptImpl$AMContainerAllocatedTransition.transition(RMAppAttemptImpl.java:809)
at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.
doTransition(StateMachineFactory.java:385)
at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMa
chineFactory.java:302)
at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMach
ineFactory.java:46)
at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine
.doTransition(StateMachineFactory.java:448)
at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAtte
mptImpl.handle(RMAppAttemptImpl.java:649)
at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAtte
mptImpl.handle(RMAppAttemptImpl.java:104)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$Applica
tionAttemptEventDispatcher.handle(ResourceManager.java:761)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$Applica
tionAttemptEventDispatcher.handle(ResourceManager.java:742)
at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher
.java:173)
at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.ja
va:106)
at java.lang.Thread.run(Thread.java:744)
Caused by: java.net.UnknownHostException: MasterNode
... 19 more
Problem might be while connecting to nodemanager it unable to handshake with ResourceManager.
If I try in single node hadoop cluster mean It working correctly.
Try to add property
yarn.nodemanager.delete.debug-delay-sec
1200
One thing while running the "launchcontainer.cmd" located in hadoop \tmp..\appcache location.It arise an issue in accessing the Dll for running mapreduce on windows platform, ie MSVCR100.dll is missing to handle the Tez job.As bellow
"The program can't start because MSCVR100.dll is missing from your
computer. Try reinstalling the program to fix this issue"
Provide Full privilege to hadoop-tmp directory and try to replaced/Moved msvcr100.dll(C:\Windows\System32) file in windows machine to run the mapreduce program for TEZ job.
I have a Spark cluster setup with one master and 3 workers.
I Use vagrant and Docker to start a cluster.
I'm trying to submit a Spark work from my local eclipse which would connect to the master, and allow me to execute it. So, here is the Spark Conf :
SparkConf conf = new SparkConf().setAppName("Simple Application").setMaster("spark://scale1.docker:7077");
When I run my application from eclipse on Master's UI, I can see one running application. All the workers are ALIVE, have 4 / 4 cores used, and have allocated 512 MB to the application.
The eclipse console will just print the same warning:
15/03/04 15:39:27 INFO BlockManagerMaster: Updated info of block broadcast_1_piece0
15/03/04 15:39:27 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:838
15/03/04 15:39:27 INFO DAGScheduler: Submitting 2 missing tasks from Stage 0 (MappedRDD[2] at mapToPair at CountLines.java:35)
15/03/04 15:39:27 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
15/03/04 15:39:42 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory
15/03/04 15:39:57 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory
15/03/04 15:40:04 INFO AppClient$ClientActor: Executor updated: app-20150304143926-0001/1 is now EXITED (Command exited with code 1)
15/03/04 15:40:04 INFO SparkDeploySchedulerBackend: Executor app-20150304143926-0001/1 removed: Command exited with code 1
15/03/04 15:40:04 ERROR SparkDeploySchedulerBackend: Asked to remove non-existent executor 1
15/03/04 15:40:04 INFO AppClient$ClientActor: Executor added: app-20150304143926-0001/2 on worker-20150304140319-scale3.docker-55425 (scale3.docker:55425) with 4 cores
15/03/04 15:40:04 INFO SparkDeploySchedulerBackend: Granted executor ID app-20150304143926-0001/2 on hostPort scale3.docker:55425 with 4 cores, 512.0 MB RAM
15/03/04 15:40:04 INFO AppClient$ClientActor: Executor updated: app-20150304143926-0001/2 is now RUNNING
15/03/04 15:40:04 INFO AppClient$ClientActor: Executor updated: app-20150304143926-0001/2 is now LOADING
15/03/04 15:40:04 INFO AppClient$ClientActor: Executor updated: app-20150304143926-0001/0 is now EXITED (Command exited with code 1)
15/03/04 15:40:04 INFO SparkDeploySchedulerBackend: Executor app-20150304143926-0001/0 removed: Command exited with code 1
15/03/04 15:40:04 ERROR SparkDeploySchedulerBackend: Asked to remove non-existent executor 0
15/03/04 15:40:04 INFO AppClient$ClientActor: Executor added: app-20150304143926-0001/3 on worker-20150304140317-scale2.docker-60646 (scale2.docker:60646) with 4 cores
15/03/04 15:40:04 INFO SparkDeploySchedulerBackend: Granted executor ID app-20150304143926-0001/3 on hostPort scale2.docker:60646 with 4 cores, 512.0 MB RAM
15/03/04 15:40:04 INFO AppClient$ClientActor: Executor updated: app-20150304143926-0001/3 is now RUNNING
15/03/04 15:40:04 INFO AppClient$ClientActor: Executor updated: app-20150304143926-0001/3 is now LOADING
Reading Spark Documentation of Spark I have find this:
Because the driver schedules tasks on the cluster, it should be run
close to the worker nodes, preferably on the same local area network.
If you’d like to send requests to the cluster remotely, it’s better to
open an RPC to the driver and have it submit operations from nearby
than to run a driver far away from the worker nodes.
I think the problem is due to the driver that runs locally on my machine.
I am using Spark 1.2.0.
Is it possible to run application in eclipse and submit it to remote cluster using local driver? If so, what can I do?
remote dubugging is quite possible and it works fine with below option executed on edge node.
--driver-java-options -agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=5005
Debugging Spark Applications
you need not to speicify the master or anything. here is the sample command.
spark-submit --master yarn-client --class org.hkt.spark.jstest.javascalawordcount.JavaWordCount --driver-java-options -agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=5005 javascalawordcount-0.0.1-SNAPSHOT.jar
Cloudera CDH5.2 Quickstart VM
Cloudera Manager showing all nodes state = GREEN
I've jared on Eclipse a MR job including all relevant cloudera jars in the Build Path:
avro-1.7.6-cdh5.2.0.jar,
avro-mapred-1.7.6-cdh5.2.0-hadoop2.jar,
hadoop-common-2.5.0-cdh5.2.0.jar,
hadoop-mapreduce-client-core-2.5.0-cdh5.2.0.jar
I've run the following job
hadoop jar jproject1.jar avro00.AvroUserPrefCount -libjars ${LIBJARS} avro/00/in avro/00/out
I get the following error, is it a Java heap problem, any comments ? Thank you in advance
14/11/14 01:02:40 INFO client.RMProxy: Connecting to ResourceManager at quickstart.cloudera/127.0.0.1:8032
14/11/14 01:02:43 INFO input.FileInputFormat: Total input paths to process : 1
14/11/14 01:02:43 INFO mapreduce.JobSubmitter: number of splits:1
14/11/14 01:02:44 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1415950730849_0001
14/11/14 01:02:45 INFO impl.YarnClientImpl: Submitted application application_1415950730849_0001
14/11/14 01:02:45 INFO mapreduce.Job: The url to track the job: http://quickstart.cloudera:8088/proxy/application_1415950730849_0001/
14/11/14 01:02:45 INFO mapreduce.Job: Running job: job_1415950730849_0001
14/11/14 01:03:04 INFO mapreduce.Job: Job job_1415950730849_0001 running in uber mode : false
14/11/14 01:03:04 INFO mapreduce.Job: map 0% reduce 0%
14/11/14 01:03:11 INFO mapreduce.Job: Task Id : attempt_1415950730849_0001_m_000000_0, Status : FAILED
Error: java.io.IOException: Unable to initialize any output collector
at org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:412)
at org.apache.hadoop.mapred.MapTask.access$100(MapTask.java:81)
at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:695)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:767)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
...
...
Checking the full task log of the failed attempt attempt_1415950730849_0001_m_000000_0 will help tell why you ran into the given exception.
The most common reason of observing such an error is a misconfigured value of io.sort.mb in your job. Its value must never be anywhere close to (or higher than) the configured map task heap size, and must also not currently exceed ~2000 MB (Java array maximum size).
An upstream improvement of making the error more clear on the true failure was also filed and resolved recently, via MAPREDUCE-6194.
I encountered the same issue yesterday. I checked the syslog for the particular map task which was failing, which suggested that I was getting another exception in that task which was triggering this error. In my case this was an invalid parsing, and when I corrected that issue, this error was fixed.
Closer examination of the log for the failed task should give you the root cause for the issue.