Why do I need hadoop lib jars into HDFS? - java

I created a MapReduce job and I'm testing in a multi-cluster environment, but I'm getting the following error:
Exception in thread "main" java.io.FileNotFoundException: File does not exist: hdfs://bigcluster:9000/opt/hadoop/share/hadoop/common/hadoop-common-2.6.0.jar
at org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1122)
at org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1114)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1114)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:288)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:224)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:93)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestampsAndCacheVisibilities(ClientDistributedCacheManager.java:57)
at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:269)
at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:390)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:483)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1296)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1293)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1293)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1314)
at com.company.hbase.mapreduce.message.maestro.threadIndex.fakecolum.MockTestThreadIndexData.run(MockTestThreadIndexData.java:47)
at com.company.hbase.mapreduce.MaestroUpdateJob.main(MaestroUpdateJob.java:49)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
I see that hadoop-common-2.6.0.jar jar is missing on hdfs://bigcluster:9000/opt/hadoop/share/hadoop/common
The jar file exists on /opt/hadoop/share/hadoop/common, but my job is looking for inside HDFS.
If I copy all the jars (there are a lot of them) to HDFS, it worked. But the problem is, I want to understand, is it really necessary? Someone can explain to me WHY?
If I want to run it in production, do I need to make this? Is that correct?
Also, I see the answer Why do I need to keep hbase/lib folder in hdfs? and yes, if I change the MapReduce framework to YARN, it works also. But I don't want to work with YARN and I just want to understand why I have to move all Hadoop libs to HDFS to run a MapReduce job.
Updated
Here is how I instantiate the jobconf
Job job = Job.getInstance(config, "MyJob");
Scan scan = createScan();
Filter filter = createMyFilter();
FilterList filters = createMyFilter();
scan.setFilter(filters);
TableMapReduceUtil.initTableMapperJob(
MY_TABLE,
scan,
MyMapper.class,
null,
null,
job
);
TableMapReduceUtil.initTableReducerJob(
MY_TABLE,
null,
job
);
job.setNumReduceTasks(0);
Here is my mapred-site.xml
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>myhost:9001</value>
</property>
<property>
<name>hadoop.ssl.enabled</name>
<value>true</value>
</property>
<property>
<name>hadoop.ssl.require.client.cert</name>
<value>false</value>
<final>true</final>
</property>
<property>
<name>hadoop.ssl.hostname.verifier</name>
<value>DEFAULT</value>
<final>true</final>
</property>
<property>
<name>hadoop.ssl.keystores.factory.class</name>
<value>org.apache.hadoop.security.ssl.FileBasedKeyStoresFactory</value>
<final>true</final>
</property>
<property>
<name>hadoop.ssl.server.conf</name>
<value>ssl-server.xml</value>
<final>true</final>
</property>
<property>
<name>hadoop.ssl.client.conf</name>
<value>ssl-client.xml</value>
<final>true</final>
</property>
</configuration>
How I run this:
HADOOP_CLASSPATH=`/opt/hbase/bin/hbase classpath` /opt/hadoop/bin/hadoop jar /tmp/mymapred-1.0-SNAPSHOT-jar-with-dependencies.jar
Solution
Finally, I got the answer from this comment: https://stackoverflow.com/a/31950822/13305602
Inside core-site.xml, there are two properties to configure the default File System inside Hadoop.
<property>
<name>fs.defaultFS</name>
<value>hdfs://myhost.mycompany.com:9000</value>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://myhost.mycompany.com:9000</value>
</property>
The default value of these two properties is file:// see here: https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/core-default.xml
You can change this property on core-site.xml or if you are in an environment where you don't have access to that, you can do it only in job context setting on jobConf.
Configuration configuration = new Configuration();
configuration.set("fs.defaultFS", "file:///");
configuration.set("fs.default.name", "file:///");
Job job = Job.getInstance(configuration, "MyJob");

Related

java.lang.NoClassDefFoundError: org/apache/htrace/core/HTraceConfiguration

I am using hadoop 2.9.1 and hbase 2.1.0 at stand-alone local mode.
When I tried staring HBase 2.1.0 using sudo start-hbase.sh at bin folder, I got below error:
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/htrace/core/HTraceConfiguration
at org.apache.hadoop.hbase.master.HMasterCommandLine.startMaster(HMasterCommandLine.java:153)
at org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:140)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:149)
at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:2983)
Caused by: java.lang.ClassNotFoundException: org.apache.htrace.core.HTraceConfiguration
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
This is my hbase-site.xml
<configuration>
<property>
<name>hbase.rootdir</name>
<value>/home/niyazmohamed/bigdata/upgraded_versions/hbase-2.1.0/hbasedir</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>localhost</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>hbase.zookeeper.property.clientPort</name>
<value>2181</value>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/home/niyazmohamed/bigdata/upgraded_versions/hbase-2.1.0/zookeeper</value>
</property>
</configuration>
When I tried to start HBase version 1.2.0 , it started successfully and hbase shell was also accessible and CRUD operations were successful.
Hadoop and HBase path are set. Only by that , I was able to run HBase-1.2.0.
Only with HBase-2.1.0, this problem occurs.
Any help appreciated! Thanks in advance!
Related:
Starting HBASE, java.lang.ClassNotFoundException: org.apache.htrace.SamplerBuilder
htrace-core-*-incubating.jar was missing from some early versions of HBase 2.x
If the htrace-core jar is in $HBASE_HOME/lib/client-facing-thirdparty
copy the jar to $HBASE_HOME/lib, otherwise
Download the Jar from Maven here
and place into $HBASE_HOME/lib
You can see in HBase pom.xml for version hbase 2.1 that htrace 4.2.0 is the correct version of the dependency.
https://github.com/apache/hbase/blob/rel/2.1.0/pom.xml#L1364
Goodluck.

FATAL [main] org.apache.hadoop.mapred.YarnChild: Error running child : java.lang.OutOfMemoryError: Java heap space

when i try to implement hadoop mapreduce program in python, the mapreduce executes fine when the size of input file is less.if i try to implement the same mapreduce for input files size more than ~50mb. i get the following error in log files.
FATAL [main] org.apache.hadoop.mapred.YarnChild: Error running child :
java.lang.OutOfMemoryError: Java heap space at
java.util.Arrays.copyOf(Arrays.java:3236) at
org.apache.hadoop.io.Text.setCapacity(Text.java:266) at
org.apache.hadoop.io.Text.append(Text.java:236) at
org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:243)
at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174)
at
org.apache.hadoop.mapred.LineRecordReader.skipUtfByteOrderMark(LineRecordReader.java:206)
at
org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:244)
at
org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:47)
at
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:199)
at
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:185)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:52) at
org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at
org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164) at
java.security.AccessController.doPrivileged(Native Method) at
javax.security.auth.Subject.doAs(Subject.java:422) at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
with reference to this stackoverflow question i tried the following configuration changes in mapred-site.xml,
<property>
<name>mapreduce.map.memory.mb</name>
<value>1974</value>
</property>
<property>
<name>mapreduce.reduce.memory.mb</name>
<value>7896</value>
</property>
<property>
<name>mapreduce.map.java.opts</name>
<value>-Xmx1580m</value>
</property>
<property>
<name>mapreduce.reduce.java.opts</name>
<value>-Xmx6320m</value>
</property>
<property>
<name>mapred.child.java.opts</name>
<value> -Xmx524288000</value>
</property>
<property>
<name>mapred.job.shuffle.input.buffer.percent</name>
<value>0.20</value>
</property>
but after configuration changes only mapping does execute,and reducer does not start.
so, what could i modify, to implement mapreduce for large input files to run with out problem.

IOException: Cannot initialize Cluster | hadoop 2.4.0

I am trying to run a Map-reduce using hadoop 2.4.0
My code has some dependencies on third-party jar's, So I created a FAT jar using eclipse export->runnable Jar option.
Now when I run the FAT jar using
hadoop jar ~/Documents/job.jar
I get the exception
java.lang.reflect.InvocationTargetException
The above exception is caused by this:
Caused by: java.io.IOException: Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses.
at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:120)
at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:82)
at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:75)
at org.apache.hadoop.mapreduce.Job$9.run(Job.java:1255)
at org.apache.hadoop.mapreduce.Job$9.run(Job.java:1251)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.hadoop.mapreduce.Job.connect(Job.java:1250)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1279)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1303)
at imgProc.MasterClass.main(MasterClass.java:84)
... 10 more
hadoop classpath
hduser#livingstream:/usr/local/hadoop$ hadoop classpath
/usr/local/hadoop-2.4.0/etc/hadoop:/usr/local/hadoop-`2.4.0/share/hadoop/common/lib/*:/usr/local/hadoop-2.4.0/share/hadoop/common/*:/usr/local/hadoop-2.4.0/share/hadoop/hdfs:/usr/local/hadoop-2.4.0/share/hadoop/hdfs/lib/*:/usr/local/hadoop-2.4.0/share/hadoop/hdfs/*:/usr/local/hadoop-2.4.0/share/hadoop/yarn/lib/*:/usr/local/hadoop-2.4.0/share/hadoop/yarn/*:/usr/local/hadoop-2.4.0/share/hadoop/mapreduce/lib/*:/usr/local/hadoop-2.4.0/share/hadoop/mapreduce/*:/usr/local/hadoop/contrib/capacity-scheduler/*.jar`
My configuration files
mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
core-site.xml
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:54310</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/hadoop/data</value>
</property>
</configuration>
yarn-site.xml
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>localhost:8025</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>localhost:8030</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>localhost:8050</value>
</property>
</configuration>
I am not really sure what is going on now , Is it because of the JARs or my config files.
Anybody has anyidea , anything is appreciated ! :)
Error message cleary states that,
Caused by: java.io.IOException: Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses.
You have to configure the mapred-site.xml and core-site.xml and many other configurations you should do..
For step by step, you refer this link Hadoop V2 setup
Hope this helps you.

java.io.IOException: Cannot initialize Cluster in Hadoop2 with YARN

This is my first time posting to stackoverflow, so I apologize if I did something wrong.
I recently set up a new hadoop cluster, and this is my first time trying to use Hadoop 2 and YARN. I currently get the following error when I submit my job.
java.io.IOException: Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses.
at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:120)
at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:82)
at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:75)
at org.apache.hadoop.mapreduce.Job$9.run(Job.java:1255)
at org.apache.hadoop.mapreduce.Job$9.run(Job.java:1251)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
at org.apache.hadoop.mapreduce.Job.connect(Job.java:1250)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1279)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1303)
Here are my configuration files:
mapred-site.xml
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
hdfs-site.xml
<configuration>
<property>
<name>dfs.name.dir</name>
<value>/temp1/nn,/temp2/nn</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/temp1/dn,/temp2/dn</value>
</property>
<property>
<name>fs.checkpoint.dir</name>
<value>/temp1/snn</value>
</property>
<property>
<name>dfs.permissions.supergroup</name>
<value>hrdbms</value>
</property>
<property>
<name>dfs.block.size</name>
<value>268435456</value>
</property>
</configuration>
yarn-site.xml
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>172.31.20.99</value>
</property>
<property>
<name>yarn.nodemanager.local-dirs</name>
<value>/temp1/y1,/temp2/y1</value>
</property>
<property>
<name>yarn.nodemanager.log-dirs</name>
<value>/temp1/y2,/temp2/y2</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
Here is my java code:
Configuration conf = new Configuration();
conf.set("fs.hdfs.impl", org.apache.hadoop.hdfs.DistributedFileSystem.class.getName());
conf.set("fs.file.impl", org.apache.hadoop.fs.LocalFileSystem.class.getName());
conf.setBoolean("mapred.compress.map.output",true);
conf.addResource(new org.apache.hadoop.fs.Path("/usr/local/hadoop-2.5.1/etc/hadoop/core-site.xml"));
conf.addResource(new org.apache.hadoop.fs.Path("/usr/local/hadoop-2.5.1/etc/hadoop/hdfs-site.xml"));
conf.addResource(new org.apache.hadoop.fs.Path("/usr/local/hadoop-2.5.1/etc/hadoop/yarn-site.xml"));
conf.set("mapreduce.framework.name", "yarn");
conf.setClass("mapred.map.output.compression.codec", org.apache.hadoop.io.compress.SnappyCodec.class, CompressionCodec.class);
Job job = new Job(conf);
job.setJarByClass(LoadMapper.class);
job.setJobName("Load " + schema + "." + table);
job.setMapperClass(LoadMapper.class);
job.setReducerClass(LoadReducer.class);
job.setOutputKeyClass(IntWritable.class);
job.setOutputValueClass(ALOWritable.class);
job.setMapOutputKeyClass(IntWritable.class);
job.setMapOutputValueClass(ALOWritable.class);
job.setNumReduceTasks(workerNodes.size());
job.setOutputFormatClass(LoadOutputFormat.class);
job.setReduceSpeculativeExecution(false);
job.setMapSpeculativeExecution(false);
String glob2 = glob.substring(6);
FileInputFormat.addInputPath(job, new org.apache.hadoop.fs.Path(glob2));
HRDBMSWorker.logger.debug("Submitting MR job");
boolean allOK = job.waitForCompletion(true);
Here are all of the environment variables that are in place when I start the JVM
HADOOP_DATANODE_OPTS=-Dhadoop.security.logger=ERROR,RFAS
HOSTNAME=ip-172-31-20-103
HADOOP_IDENT_STRING=hrdbms
SHELL=/bin/bash
TERM=xterm
HADOOP_HOME=/usr/local/hadoop-2.5.1
HISTSIZE=1000
HADOOP_PID_DIR=
YARN_HOME=/usr/local/hadoop-2.5.1
USER=hrdbms
LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=01;05;37;41:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arj=01;31:*.taz=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.dz=01;31:*.gz=01;31:*.lz=01;31:*.xz=01;31:*.bz2=01;31:*.tbz=01;31:*.tbz2=01;31:*.bz=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.rar=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.jpg=01;35:*.jpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.axv=01;35:*.anx=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=01;36:*.au=01;36:*.flac=01;36:*.mid=01;36:*.midi=01;36:*.mka=01;36:*.mp3=01;36:*.mpc=01;36:*.ogg=01;36:*.ra=01;36:*.wav=01;36:*.axa=01;36:*.oga=01;36:*.spx=01;36:*.xspf=01;36:
HADOOP_SECURE_DN_PID_DIR=
HADOOP_SECURE_DN_LOG_DIR=/
MAIL=/var/spool/mail/hrdbms
PATH=/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hrdbms/bin
HADOOP_HDFS_HOME=/usr/local/hadoop-2.5.1
HADOOP_CLIENT_OPTS=-Xmx512m
HADOOP_COMMON_HOME=/usr/local/hadoop-2.5.1
PWD=/home/hrdbms
JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.55.x86_64/jre
HADOOP_CLASSPATH=/home/hrdbms/HRDBMS.jar:/contrib/capacity-scheduler/*.jar
HADOOP_CONF_DIR=/etc/hadoop
LANG=en_US.UTF-8
HADOOP_PORTMAP_OPTS=-Xmx512m
HADOOP_OPTS= -Djava.net.preferIPv4Stack=true
HADOOP_SECONDARYNAMENODE_OPTS=-Dhadoop.security.logger=INFO,RFAS -Dhdfs.audit.logger=INFO,NullAppender
HISTCONTROL=ignoredups
SHLVL=1
HOME=/home/hrdbms
YARN_CONF_DIR=/etc/hadoop
HADOOP_SECURE_DN_USER=
HADOOP_NAMENODE_OPTS=-Dhadoop.security.logger=INFO,RFAS -Dhdfs.audit.logger=INFO,NullAppender
HADOOP_MAPRED_HOME=/usr/local/hadoop-2.5.1
LOGNAME=hrdbms
HADOOP_NFS3_OPTS=
LESSOPEN=|/usr/bin/lesspipe.sh %s
HADOOP_YARN_USER=hrdbms
G_BROKEN_FILENAMES=1
_=/bin/env
Here is a list of all jars in the client classpath
activation-1.1.jar
antlr-4.2.1-complete.jar
aopalliance-1.0.jar
apacheds-i18n-2.0.0-M15.jar
apacheds-kerberos-codec-2.0.0-M15.jar
api-asn1-api-1.0.0-M20.jar
api-util-1.0.0-M20.jar
asm-3.2.jar
avro-1.7.4.jar
commons-beanutils-1.7.0.jar
commons-beanutils-core-1.8.0.jar
commons-cli-1.2.jar
commons-codec-1.3.jar
commons-codec-1.4.jar
commons-collections-3.2.1.jar
commons-compress-1.4.1.jar
commons-configuration-1.6.jar
commons-daemon-1.0.13.jar
commons-digester-1.8.jar
commons-el-1.0.jar
commons-httpclient-3.1.jar
commons-io-2.4.jar
commons-lang-2.6.jar
commons-logging-1.1.3.jar
commons-math3-3.1.1.jar
commons-net-3.1.jar
guava-11.0.2.jar
guice-3.0.jar
guice-servlet-3.0.jar
hadoop-annotations-2.5.1.jar
hadoop-archives-2.5.1.jar
hadoop-auth-2.5.1.jar
hadoop-common-2.5.1-tests.jar
hadoop-common-2.5.1.jar
hadoop-datajoin-2.5.1.jar
hadoop-distcp-2.5.1.jar
hadoop-extras-2.5.1.jar
hadoop-gridmix-2.5.1.jar
hadoop-hdfs-2.5.1-tests.jar
hadoop-hdfs-2.5.1.jar
hadoop-hdfs-nfs-2.5.1.jar
hadoop-mapreduce-client-app-2.5.1.jar
hadoop-mapreduce-client-common-2.5.1.jar
hadoop-mapreduce-client-core-2.5.1.jar
hadoop-mapreduce-client-hs-2.5.1.jar
hadoop-mapreduce-client-hs-plugins-2.5.1.jar
hadoop-mapreduce-client-jobclient-2.5.1-tests.jar
hadoop-mapreduce-client-jobclient-2.5.1.jar
hadoop-mapreduce-client-shuffle-2.5.1.jar
hadoop-mapreduce-examples-2.5.1.jar
hadoop-nfs-2.5.1.jar
hadoop-openstack-2.5.1.jar
hadoop-rumen-2.5.1.jar
hadoop-sls-2.5.1.jar
hadoop-streaming-2.5.1.jar
hadoop-yarn-api-2.5.1.jar
hadoop-yarn-applications-distributedshell-2.5.1.jar
hadoop-yarn-applications-unmanaged-am-launcher-2.5.1.jar
hadoop-yarn-client-2.5.1.jar
hadoop-yarn-common-2.5.1.jar
hadoop-yarn-server-applicationhistoryservice-2.5.1.jar
hadoop-yarn-server-common-2.5.1.jar
hadoop-yarn-server-nodemanager-2.5.1.jar
hadoop-yarn-server-resourcemanager-2.5.1.jar
hadoop-yarn-server-tests-2.5.1.jar
hadoop-yarn-server-web-proxy-2.5.1.jar
hamcrest-core-1.3.jar
httpclient-4.2.5.jar
httpcore-4.2.5.jar
jackson-core-asl-1.9.13.jar
jackson-jaxrs-1.9.13.jar
jackson-mapper-asl-1.9.13.jar
jackson-xc-1.9.13.jar
jasper-compiler-5.5.23.jar
jasper-runtime-5.5.23.jar
java-xmlbuilder-0.4.jar
javax.inject-1.jar
jaxb-api-2.2.2.jar
jaxb-impl-2.2.3-1.jar
jersey-client-1.9.jar
jersey-core-1.9.jar
jersey-guice-1.9.jar
jersey-json-1.9.jar
jersey-server-1.9.jar
jets3t-0.9.0.jar
jettison-1.1.jar
jetty-6.1.26.jar
jetty-util-6.1.26.jar
jline-0.9.94.jar
jsch-0.1.50.jar
jsp-api-2.1.jar
jsr305-1.3.9.jar
junit-4.11.jar
leveldbjni-all-1.8.jar
log4j-1.2.17.jar
metrics-core-3.0.0.jar
mockito-all-1.8.5.jar
netty-3.6.2.Final.jar
paranamer-2.3.jar
preflight-app-1.8.7.jar
protobuf-java-2.5.0.jar
servlet-api-2.5.jar
slf4j-api-1.7.5.jar
slf4j-log4j12-1.7.5.jar
snappy-java-1.0.4.1.jar
stax-api-1.0-2.jar
xmlenc-0.52.jar
zookeeper-3.4.6.jar
Please help! Thanks!
EDIT: I just found these debug log messages.
2014-10-27 19:31:21,789 DEBUG Cluster: Trying ClientProtocolProvider : org.apache.hadoop.mapred.LocalClientProtocolProvider
2014-10-27 19:31:21,789 DEBUG Cluster: Cannot pick org.apache.hadoop.mapred.LocalClientProtocolProvider as the ClientProtocolProvider - returned null protocol
I have run into similar issues today. In my case I was building an über jar, where some dependency (I have not found the culprit yet) was bringing in a META-INF/services/org.apache.hadoop.mapreduce.protocol.ClientProtocolProvider with the contents:
org.apache.hadoop.mapred.LocalClientProtocolProvider
I provided my own in the project (e.g. put it on the classpath) with the following:
org.apache.hadoop.mapred.YarnClientProtocolProvider
and the correct one is picked up. I suspect you are seeing similar. To fix, please create the file described above, and put it on the classpath. If I find the culprit Jar, I will update the answer.
Turn off security in your cluster (if that is ok with you in your environment obviously).
i.e. turn off this HDFS setting, i.e. :
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
it is on by default.
In Cloudera manager it is accessible from the Configuration panel.
I ran into the same issue trying to run a MR job from Eclipse with the latest CDH5.5 Hadoop 2.6 distribution.
I am not sure what specifically was the issue , it might very well be a classloading issue that #timrobertson100 mentioned .... but in my case, I was able to overcome this by adding all jars from the paths below to the Eclipse project's classpath:
.../hadoop-2.6.0-cdh5.5.1/share/hadoop/common/hadoop-common-2.6.0-cdh5.5.1.jar
.../hadoop-2.6.0-cdh5.5.1/share/hadoop/common/lib/*
.../hadoop-2.6.0-cdh5.5.1/share/hadoop/mapreduce2/*
.../hadoop-2.6.0-cdh5.5.1/share/hadoop/yarn/*
Marina

Cannot initialize cluster exception while running job on Hadoop 2

The question is linked to my previous question All the daemons are running, jps shows:
6663 JobHistoryServer
7213 ResourceManager
9235 Jps
6289 DataNode
6200 NameNode
7420 NodeManager
but the wordcount example keeps on failing with the following exception:
ERROR security.UserGroupInformation: PriviledgedActionException as:root (auth:SIMPLE) cause:java.io.IOException: Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses.
Exception in thread "main" java.io.IOException: Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses.
at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:120)
at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:82)
at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:75)
at org.apache.hadoop.mapreduce.Job$9.run(Job.java:1238)
at org.apache.hadoop.mapreduce.Job$9.run(Job.java:1234)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
at org.apache.hadoop.mapreduce.Job.connect(Job.java:1233)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1262)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1286)
at WordCount.main(WordCount.java:80)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Since it says the problem is in configuration, I am posting the configuration files here. The intention is to create a single node cluster.
yarn-site.xml
<?xml version="1.0"?>
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
</configuration>
core-site.xml
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/home/hduser/yarn/yarn_data/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/home/hduser/yarn/yarn_data/hdfs/datanode</value>
</property>
</configuration>
mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>Yarn</value>
</property>
</configuration>
Please tell what is missing or what am I doing wrong.
I was having similar issue, but yarn was not the issue.
After adding following jars into my classpath issue got resolved:
hadoop-mapreduce-client-jobclient-2.2.0.2.0.6.0-76
hadoop-mapreduce-client-common-2.2.0.2.0.6.0-76
hadoop-mapreduce-client-shuffle-2.2.0.2.0.6.0-76
You have uppercased Yarn, which is probably why it can not resolve it. Try the lowercase version that is suggested in the official documentation.
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
Looks like i had a lucky day and went with this exception through 'all' of those causes. Summary:
wrong mapreduce.framework.name (see above)
missing mapreduce job-client jars (see above)
wrong version (see Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses-submiting job2remoteClustr )
my configured 'yarn.ipc.client.factory.class' wasn't in the classpath of the yarn server's (just on the client)
In my case i was trying to use sqoop and ran into this error.
Turns out that i was pointing to the latest version of hadoop 2.0 available from CDH repo for which sqoop was not supported.
The version of cloudera was 2.0.0-cdh4.4.0 which had yarn support build in.
When i used 2.0.0-cdh4.4.0 under hadoop-0.20 the problem went away.
Hope this helps.
changing mapreduce_shuffle to mapreduce.shuffle made it work in my case

Categories