This is my first time posting to stackoverflow, so I apologize if I did something wrong.
I recently set up a new hadoop cluster, and this is my first time trying to use Hadoop 2 and YARN. I currently get the following error when I submit my job.
java.io.IOException: Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses.
at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:120)
at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:82)
at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:75)
at org.apache.hadoop.mapreduce.Job$9.run(Job.java:1255)
at org.apache.hadoop.mapreduce.Job$9.run(Job.java:1251)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
at org.apache.hadoop.mapreduce.Job.connect(Job.java:1250)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1279)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1303)
Here are my configuration files:
mapred-site.xml
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
hdfs-site.xml
<configuration>
<property>
<name>dfs.name.dir</name>
<value>/temp1/nn,/temp2/nn</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/temp1/dn,/temp2/dn</value>
</property>
<property>
<name>fs.checkpoint.dir</name>
<value>/temp1/snn</value>
</property>
<property>
<name>dfs.permissions.supergroup</name>
<value>hrdbms</value>
</property>
<property>
<name>dfs.block.size</name>
<value>268435456</value>
</property>
</configuration>
yarn-site.xml
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>172.31.20.99</value>
</property>
<property>
<name>yarn.nodemanager.local-dirs</name>
<value>/temp1/y1,/temp2/y1</value>
</property>
<property>
<name>yarn.nodemanager.log-dirs</name>
<value>/temp1/y2,/temp2/y2</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
Here is my java code:
Configuration conf = new Configuration();
conf.set("fs.hdfs.impl", org.apache.hadoop.hdfs.DistributedFileSystem.class.getName());
conf.set("fs.file.impl", org.apache.hadoop.fs.LocalFileSystem.class.getName());
conf.setBoolean("mapred.compress.map.output",true);
conf.addResource(new org.apache.hadoop.fs.Path("/usr/local/hadoop-2.5.1/etc/hadoop/core-site.xml"));
conf.addResource(new org.apache.hadoop.fs.Path("/usr/local/hadoop-2.5.1/etc/hadoop/hdfs-site.xml"));
conf.addResource(new org.apache.hadoop.fs.Path("/usr/local/hadoop-2.5.1/etc/hadoop/yarn-site.xml"));
conf.set("mapreduce.framework.name", "yarn");
conf.setClass("mapred.map.output.compression.codec", org.apache.hadoop.io.compress.SnappyCodec.class, CompressionCodec.class);
Job job = new Job(conf);
job.setJarByClass(LoadMapper.class);
job.setJobName("Load " + schema + "." + table);
job.setMapperClass(LoadMapper.class);
job.setReducerClass(LoadReducer.class);
job.setOutputKeyClass(IntWritable.class);
job.setOutputValueClass(ALOWritable.class);
job.setMapOutputKeyClass(IntWritable.class);
job.setMapOutputValueClass(ALOWritable.class);
job.setNumReduceTasks(workerNodes.size());
job.setOutputFormatClass(LoadOutputFormat.class);
job.setReduceSpeculativeExecution(false);
job.setMapSpeculativeExecution(false);
String glob2 = glob.substring(6);
FileInputFormat.addInputPath(job, new org.apache.hadoop.fs.Path(glob2));
HRDBMSWorker.logger.debug("Submitting MR job");
boolean allOK = job.waitForCompletion(true);
Here are all of the environment variables that are in place when I start the JVM
HADOOP_DATANODE_OPTS=-Dhadoop.security.logger=ERROR,RFAS
HOSTNAME=ip-172-31-20-103
HADOOP_IDENT_STRING=hrdbms
SHELL=/bin/bash
TERM=xterm
HADOOP_HOME=/usr/local/hadoop-2.5.1
HISTSIZE=1000
HADOOP_PID_DIR=
YARN_HOME=/usr/local/hadoop-2.5.1
USER=hrdbms
LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=01;05;37;41:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arj=01;31:*.taz=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.dz=01;31:*.gz=01;31:*.lz=01;31:*.xz=01;31:*.bz2=01;31:*.tbz=01;31:*.tbz2=01;31:*.bz=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.rar=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.jpg=01;35:*.jpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.axv=01;35:*.anx=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=01;36:*.au=01;36:*.flac=01;36:*.mid=01;36:*.midi=01;36:*.mka=01;36:*.mp3=01;36:*.mpc=01;36:*.ogg=01;36:*.ra=01;36:*.wav=01;36:*.axa=01;36:*.oga=01;36:*.spx=01;36:*.xspf=01;36:
HADOOP_SECURE_DN_PID_DIR=
HADOOP_SECURE_DN_LOG_DIR=/
MAIL=/var/spool/mail/hrdbms
PATH=/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hrdbms/bin
HADOOP_HDFS_HOME=/usr/local/hadoop-2.5.1
HADOOP_CLIENT_OPTS=-Xmx512m
HADOOP_COMMON_HOME=/usr/local/hadoop-2.5.1
PWD=/home/hrdbms
JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.55.x86_64/jre
HADOOP_CLASSPATH=/home/hrdbms/HRDBMS.jar:/contrib/capacity-scheduler/*.jar
HADOOP_CONF_DIR=/etc/hadoop
LANG=en_US.UTF-8
HADOOP_PORTMAP_OPTS=-Xmx512m
HADOOP_OPTS= -Djava.net.preferIPv4Stack=true
HADOOP_SECONDARYNAMENODE_OPTS=-Dhadoop.security.logger=INFO,RFAS -Dhdfs.audit.logger=INFO,NullAppender
HISTCONTROL=ignoredups
SHLVL=1
HOME=/home/hrdbms
YARN_CONF_DIR=/etc/hadoop
HADOOP_SECURE_DN_USER=
HADOOP_NAMENODE_OPTS=-Dhadoop.security.logger=INFO,RFAS -Dhdfs.audit.logger=INFO,NullAppender
HADOOP_MAPRED_HOME=/usr/local/hadoop-2.5.1
LOGNAME=hrdbms
HADOOP_NFS3_OPTS=
LESSOPEN=|/usr/bin/lesspipe.sh %s
HADOOP_YARN_USER=hrdbms
G_BROKEN_FILENAMES=1
_=/bin/env
Here is a list of all jars in the client classpath
activation-1.1.jar
antlr-4.2.1-complete.jar
aopalliance-1.0.jar
apacheds-i18n-2.0.0-M15.jar
apacheds-kerberos-codec-2.0.0-M15.jar
api-asn1-api-1.0.0-M20.jar
api-util-1.0.0-M20.jar
asm-3.2.jar
avro-1.7.4.jar
commons-beanutils-1.7.0.jar
commons-beanutils-core-1.8.0.jar
commons-cli-1.2.jar
commons-codec-1.3.jar
commons-codec-1.4.jar
commons-collections-3.2.1.jar
commons-compress-1.4.1.jar
commons-configuration-1.6.jar
commons-daemon-1.0.13.jar
commons-digester-1.8.jar
commons-el-1.0.jar
commons-httpclient-3.1.jar
commons-io-2.4.jar
commons-lang-2.6.jar
commons-logging-1.1.3.jar
commons-math3-3.1.1.jar
commons-net-3.1.jar
guava-11.0.2.jar
guice-3.0.jar
guice-servlet-3.0.jar
hadoop-annotations-2.5.1.jar
hadoop-archives-2.5.1.jar
hadoop-auth-2.5.1.jar
hadoop-common-2.5.1-tests.jar
hadoop-common-2.5.1.jar
hadoop-datajoin-2.5.1.jar
hadoop-distcp-2.5.1.jar
hadoop-extras-2.5.1.jar
hadoop-gridmix-2.5.1.jar
hadoop-hdfs-2.5.1-tests.jar
hadoop-hdfs-2.5.1.jar
hadoop-hdfs-nfs-2.5.1.jar
hadoop-mapreduce-client-app-2.5.1.jar
hadoop-mapreduce-client-common-2.5.1.jar
hadoop-mapreduce-client-core-2.5.1.jar
hadoop-mapreduce-client-hs-2.5.1.jar
hadoop-mapreduce-client-hs-plugins-2.5.1.jar
hadoop-mapreduce-client-jobclient-2.5.1-tests.jar
hadoop-mapreduce-client-jobclient-2.5.1.jar
hadoop-mapreduce-client-shuffle-2.5.1.jar
hadoop-mapreduce-examples-2.5.1.jar
hadoop-nfs-2.5.1.jar
hadoop-openstack-2.5.1.jar
hadoop-rumen-2.5.1.jar
hadoop-sls-2.5.1.jar
hadoop-streaming-2.5.1.jar
hadoop-yarn-api-2.5.1.jar
hadoop-yarn-applications-distributedshell-2.5.1.jar
hadoop-yarn-applications-unmanaged-am-launcher-2.5.1.jar
hadoop-yarn-client-2.5.1.jar
hadoop-yarn-common-2.5.1.jar
hadoop-yarn-server-applicationhistoryservice-2.5.1.jar
hadoop-yarn-server-common-2.5.1.jar
hadoop-yarn-server-nodemanager-2.5.1.jar
hadoop-yarn-server-resourcemanager-2.5.1.jar
hadoop-yarn-server-tests-2.5.1.jar
hadoop-yarn-server-web-proxy-2.5.1.jar
hamcrest-core-1.3.jar
httpclient-4.2.5.jar
httpcore-4.2.5.jar
jackson-core-asl-1.9.13.jar
jackson-jaxrs-1.9.13.jar
jackson-mapper-asl-1.9.13.jar
jackson-xc-1.9.13.jar
jasper-compiler-5.5.23.jar
jasper-runtime-5.5.23.jar
java-xmlbuilder-0.4.jar
javax.inject-1.jar
jaxb-api-2.2.2.jar
jaxb-impl-2.2.3-1.jar
jersey-client-1.9.jar
jersey-core-1.9.jar
jersey-guice-1.9.jar
jersey-json-1.9.jar
jersey-server-1.9.jar
jets3t-0.9.0.jar
jettison-1.1.jar
jetty-6.1.26.jar
jetty-util-6.1.26.jar
jline-0.9.94.jar
jsch-0.1.50.jar
jsp-api-2.1.jar
jsr305-1.3.9.jar
junit-4.11.jar
leveldbjni-all-1.8.jar
log4j-1.2.17.jar
metrics-core-3.0.0.jar
mockito-all-1.8.5.jar
netty-3.6.2.Final.jar
paranamer-2.3.jar
preflight-app-1.8.7.jar
protobuf-java-2.5.0.jar
servlet-api-2.5.jar
slf4j-api-1.7.5.jar
slf4j-log4j12-1.7.5.jar
snappy-java-1.0.4.1.jar
stax-api-1.0-2.jar
xmlenc-0.52.jar
zookeeper-3.4.6.jar
Please help! Thanks!
EDIT: I just found these debug log messages.
2014-10-27 19:31:21,789 DEBUG Cluster: Trying ClientProtocolProvider : org.apache.hadoop.mapred.LocalClientProtocolProvider
2014-10-27 19:31:21,789 DEBUG Cluster: Cannot pick org.apache.hadoop.mapred.LocalClientProtocolProvider as the ClientProtocolProvider - returned null protocol
I have run into similar issues today. In my case I was building an über jar, where some dependency (I have not found the culprit yet) was bringing in a META-INF/services/org.apache.hadoop.mapreduce.protocol.ClientProtocolProvider with the contents:
org.apache.hadoop.mapred.LocalClientProtocolProvider
I provided my own in the project (e.g. put it on the classpath) with the following:
org.apache.hadoop.mapred.YarnClientProtocolProvider
and the correct one is picked up. I suspect you are seeing similar. To fix, please create the file described above, and put it on the classpath. If I find the culprit Jar, I will update the answer.
Turn off security in your cluster (if that is ok with you in your environment obviously).
i.e. turn off this HDFS setting, i.e. :
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
it is on by default.
In Cloudera manager it is accessible from the Configuration panel.
I ran into the same issue trying to run a MR job from Eclipse with the latest CDH5.5 Hadoop 2.6 distribution.
I am not sure what specifically was the issue , it might very well be a classloading issue that #timrobertson100 mentioned .... but in my case, I was able to overcome this by adding all jars from the paths below to the Eclipse project's classpath:
.../hadoop-2.6.0-cdh5.5.1/share/hadoop/common/hadoop-common-2.6.0-cdh5.5.1.jar
.../hadoop-2.6.0-cdh5.5.1/share/hadoop/common/lib/*
.../hadoop-2.6.0-cdh5.5.1/share/hadoop/mapreduce2/*
.../hadoop-2.6.0-cdh5.5.1/share/hadoop/yarn/*
Marina
Related
I am using hadoop 2.9.1 and hbase 2.1.0 at stand-alone local mode.
When I tried staring HBase 2.1.0 using sudo start-hbase.sh at bin folder, I got below error:
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/htrace/core/HTraceConfiguration
at org.apache.hadoop.hbase.master.HMasterCommandLine.startMaster(HMasterCommandLine.java:153)
at org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:140)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:149)
at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:2983)
Caused by: java.lang.ClassNotFoundException: org.apache.htrace.core.HTraceConfiguration
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
This is my hbase-site.xml
<configuration>
<property>
<name>hbase.rootdir</name>
<value>/home/niyazmohamed/bigdata/upgraded_versions/hbase-2.1.0/hbasedir</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>localhost</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>hbase.zookeeper.property.clientPort</name>
<value>2181</value>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/home/niyazmohamed/bigdata/upgraded_versions/hbase-2.1.0/zookeeper</value>
</property>
</configuration>
When I tried to start HBase version 1.2.0 , it started successfully and hbase shell was also accessible and CRUD operations were successful.
Hadoop and HBase path are set. Only by that , I was able to run HBase-1.2.0.
Only with HBase-2.1.0, this problem occurs.
Any help appreciated! Thanks in advance!
Related:
Starting HBASE, java.lang.ClassNotFoundException: org.apache.htrace.SamplerBuilder
htrace-core-*-incubating.jar was missing from some early versions of HBase 2.x
If the htrace-core jar is in $HBASE_HOME/lib/client-facing-thirdparty
copy the jar to $HBASE_HOME/lib, otherwise
Download the Jar from Maven here
and place into $HBASE_HOME/lib
You can see in HBase pom.xml for version hbase 2.1 that htrace 4.2.0 is the correct version of the dependency.
https://github.com/apache/hbase/blob/rel/2.1.0/pom.xml#L1364
Goodluck.
I created a MapReduce job and I'm testing in a multi-cluster environment, but I'm getting the following error:
Exception in thread "main" java.io.FileNotFoundException: File does not exist: hdfs://bigcluster:9000/opt/hadoop/share/hadoop/common/hadoop-common-2.6.0.jar
at org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1122)
at org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1114)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1114)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:288)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:224)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:93)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestampsAndCacheVisibilities(ClientDistributedCacheManager.java:57)
at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:269)
at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:390)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:483)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1296)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1293)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1293)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1314)
at com.company.hbase.mapreduce.message.maestro.threadIndex.fakecolum.MockTestThreadIndexData.run(MockTestThreadIndexData.java:47)
at com.company.hbase.mapreduce.MaestroUpdateJob.main(MaestroUpdateJob.java:49)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
I see that hadoop-common-2.6.0.jar jar is missing on hdfs://bigcluster:9000/opt/hadoop/share/hadoop/common
The jar file exists on /opt/hadoop/share/hadoop/common, but my job is looking for inside HDFS.
If I copy all the jars (there are a lot of them) to HDFS, it worked. But the problem is, I want to understand, is it really necessary? Someone can explain to me WHY?
If I want to run it in production, do I need to make this? Is that correct?
Also, I see the answer Why do I need to keep hbase/lib folder in hdfs? and yes, if I change the MapReduce framework to YARN, it works also. But I don't want to work with YARN and I just want to understand why I have to move all Hadoop libs to HDFS to run a MapReduce job.
Updated
Here is how I instantiate the jobconf
Job job = Job.getInstance(config, "MyJob");
Scan scan = createScan();
Filter filter = createMyFilter();
FilterList filters = createMyFilter();
scan.setFilter(filters);
TableMapReduceUtil.initTableMapperJob(
MY_TABLE,
scan,
MyMapper.class,
null,
null,
job
);
TableMapReduceUtil.initTableReducerJob(
MY_TABLE,
null,
job
);
job.setNumReduceTasks(0);
Here is my mapred-site.xml
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>myhost:9001</value>
</property>
<property>
<name>hadoop.ssl.enabled</name>
<value>true</value>
</property>
<property>
<name>hadoop.ssl.require.client.cert</name>
<value>false</value>
<final>true</final>
</property>
<property>
<name>hadoop.ssl.hostname.verifier</name>
<value>DEFAULT</value>
<final>true</final>
</property>
<property>
<name>hadoop.ssl.keystores.factory.class</name>
<value>org.apache.hadoop.security.ssl.FileBasedKeyStoresFactory</value>
<final>true</final>
</property>
<property>
<name>hadoop.ssl.server.conf</name>
<value>ssl-server.xml</value>
<final>true</final>
</property>
<property>
<name>hadoop.ssl.client.conf</name>
<value>ssl-client.xml</value>
<final>true</final>
</property>
</configuration>
How I run this:
HADOOP_CLASSPATH=`/opt/hbase/bin/hbase classpath` /opt/hadoop/bin/hadoop jar /tmp/mymapred-1.0-SNAPSHOT-jar-with-dependencies.jar
Solution
Finally, I got the answer from this comment: https://stackoverflow.com/a/31950822/13305602
Inside core-site.xml, there are two properties to configure the default File System inside Hadoop.
<property>
<name>fs.defaultFS</name>
<value>hdfs://myhost.mycompany.com:9000</value>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://myhost.mycompany.com:9000</value>
</property>
The default value of these two properties is file:// see here: https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/core-default.xml
You can change this property on core-site.xml or if you are in an environment where you don't have access to that, you can do it only in job context setting on jobConf.
Configuration configuration = new Configuration();
configuration.set("fs.defaultFS", "file:///");
configuration.set("fs.default.name", "file:///");
Job job = Job.getInstance(configuration, "MyJob");
I am trying to launch Hadoop on my computer but when I execute any relating command in CMD such as hadoop version or hdfs namenode -format I get an error (exact as next):
Error: Could not find or load main class Name
The OS is Windows 10.
Hadoop version 2.7.1.
JDK 1.8.0.131.
I have the following user variables:
HADOOP_HOME = C:\hadoop-2.7.1\bin
HAVA_HOME = C:\Progra~2\Java\jdk1.8.0_131
And within the system variable PATH there are two locations set:
%JAVA_HOME%\bin;C:\hadoop-2.7.1\bin
In hadoop-env.cmd there is variable:
JAVA_HOME = %JAVA_HOME%
Among core-site.xml, mapred-site.xml, hdfs-site.xml and yarn-site.xml links to directories are set only in hdfs-site.xml. The full configuration tag in this file is the next:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/c:/hadoop-2.7.1/data/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/c:/hadoop-2.7.1/data/datanode</value>
</property>
</configuration>
I am trying to run a Map-reduce using hadoop 2.4.0
My code has some dependencies on third-party jar's, So I created a FAT jar using eclipse export->runnable Jar option.
Now when I run the FAT jar using
hadoop jar ~/Documents/job.jar
I get the exception
java.lang.reflect.InvocationTargetException
The above exception is caused by this:
Caused by: java.io.IOException: Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses.
at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:120)
at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:82)
at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:75)
at org.apache.hadoop.mapreduce.Job$9.run(Job.java:1255)
at org.apache.hadoop.mapreduce.Job$9.run(Job.java:1251)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.hadoop.mapreduce.Job.connect(Job.java:1250)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1279)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1303)
at imgProc.MasterClass.main(MasterClass.java:84)
... 10 more
hadoop classpath
hduser#livingstream:/usr/local/hadoop$ hadoop classpath
/usr/local/hadoop-2.4.0/etc/hadoop:/usr/local/hadoop-`2.4.0/share/hadoop/common/lib/*:/usr/local/hadoop-2.4.0/share/hadoop/common/*:/usr/local/hadoop-2.4.0/share/hadoop/hdfs:/usr/local/hadoop-2.4.0/share/hadoop/hdfs/lib/*:/usr/local/hadoop-2.4.0/share/hadoop/hdfs/*:/usr/local/hadoop-2.4.0/share/hadoop/yarn/lib/*:/usr/local/hadoop-2.4.0/share/hadoop/yarn/*:/usr/local/hadoop-2.4.0/share/hadoop/mapreduce/lib/*:/usr/local/hadoop-2.4.0/share/hadoop/mapreduce/*:/usr/local/hadoop/contrib/capacity-scheduler/*.jar`
My configuration files
mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
core-site.xml
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:54310</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/hadoop/data</value>
</property>
</configuration>
yarn-site.xml
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>localhost:8025</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>localhost:8030</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>localhost:8050</value>
</property>
</configuration>
I am not really sure what is going on now , Is it because of the JARs or my config files.
Anybody has anyidea , anything is appreciated ! :)
Error message cleary states that,
Caused by: java.io.IOException: Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses.
You have to configure the mapred-site.xml and core-site.xml and many other configurations you should do..
For step by step, you refer this link Hadoop V2 setup
Hope this helps you.
The question is linked to my previous question All the daemons are running, jps shows:
6663 JobHistoryServer
7213 ResourceManager
9235 Jps
6289 DataNode
6200 NameNode
7420 NodeManager
but the wordcount example keeps on failing with the following exception:
ERROR security.UserGroupInformation: PriviledgedActionException as:root (auth:SIMPLE) cause:java.io.IOException: Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses.
Exception in thread "main" java.io.IOException: Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses.
at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:120)
at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:82)
at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:75)
at org.apache.hadoop.mapreduce.Job$9.run(Job.java:1238)
at org.apache.hadoop.mapreduce.Job$9.run(Job.java:1234)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
at org.apache.hadoop.mapreduce.Job.connect(Job.java:1233)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1262)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1286)
at WordCount.main(WordCount.java:80)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Since it says the problem is in configuration, I am posting the configuration files here. The intention is to create a single node cluster.
yarn-site.xml
<?xml version="1.0"?>
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
</configuration>
core-site.xml
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/home/hduser/yarn/yarn_data/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/home/hduser/yarn/yarn_data/hdfs/datanode</value>
</property>
</configuration>
mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>Yarn</value>
</property>
</configuration>
Please tell what is missing or what am I doing wrong.
I was having similar issue, but yarn was not the issue.
After adding following jars into my classpath issue got resolved:
hadoop-mapreduce-client-jobclient-2.2.0.2.0.6.0-76
hadoop-mapreduce-client-common-2.2.0.2.0.6.0-76
hadoop-mapreduce-client-shuffle-2.2.0.2.0.6.0-76
You have uppercased Yarn, which is probably why it can not resolve it. Try the lowercase version that is suggested in the official documentation.
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
Looks like i had a lucky day and went with this exception through 'all' of those causes. Summary:
wrong mapreduce.framework.name (see above)
missing mapreduce job-client jars (see above)
wrong version (see Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses-submiting job2remoteClustr )
my configured 'yarn.ipc.client.factory.class' wasn't in the classpath of the yarn server's (just on the client)
In my case i was trying to use sqoop and ran into this error.
Turns out that i was pointing to the latest version of hadoop 2.0 available from CDH repo for which sqoop was not supported.
The version of cloudera was 2.0.0-cdh4.4.0 which had yarn support build in.
When i used 2.0.0-cdh4.4.0 under hadoop-0.20 the problem went away.
Hope this helps.
changing mapreduce_shuffle to mapreduce.shuffle made it work in my case