Cannot initialize cluster exception while running job on Hadoop 2 - java

The question is linked to my previous question All the daemons are running, jps shows:
6663 JobHistoryServer
7213 ResourceManager
9235 Jps
6289 DataNode
6200 NameNode
7420 NodeManager
but the wordcount example keeps on failing with the following exception:
ERROR security.UserGroupInformation: PriviledgedActionException as:root (auth:SIMPLE) cause:java.io.IOException: Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses.
Exception in thread "main" java.io.IOException: Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses.
at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:120)
at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:82)
at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:75)
at org.apache.hadoop.mapreduce.Job$9.run(Job.java:1238)
at org.apache.hadoop.mapreduce.Job$9.run(Job.java:1234)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
at org.apache.hadoop.mapreduce.Job.connect(Job.java:1233)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1262)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1286)
at WordCount.main(WordCount.java:80)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Since it says the problem is in configuration, I am posting the configuration files here. The intention is to create a single node cluster.
yarn-site.xml
<?xml version="1.0"?>
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
</configuration>
core-site.xml
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/home/hduser/yarn/yarn_data/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/home/hduser/yarn/yarn_data/hdfs/datanode</value>
</property>
</configuration>
mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>Yarn</value>
</property>
</configuration>
Please tell what is missing or what am I doing wrong.

I was having similar issue, but yarn was not the issue.
After adding following jars into my classpath issue got resolved:
hadoop-mapreduce-client-jobclient-2.2.0.2.0.6.0-76
hadoop-mapreduce-client-common-2.2.0.2.0.6.0-76
hadoop-mapreduce-client-shuffle-2.2.0.2.0.6.0-76

You have uppercased Yarn, which is probably why it can not resolve it. Try the lowercase version that is suggested in the official documentation.
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>

Looks like i had a lucky day and went with this exception through 'all' of those causes. Summary:
wrong mapreduce.framework.name (see above)
missing mapreduce job-client jars (see above)
wrong version (see Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses-submiting job2remoteClustr )
my configured 'yarn.ipc.client.factory.class' wasn't in the classpath of the yarn server's (just on the client)

In my case i was trying to use sqoop and ran into this error.
Turns out that i was pointing to the latest version of hadoop 2.0 available from CDH repo for which sqoop was not supported.
The version of cloudera was 2.0.0-cdh4.4.0 which had yarn support build in.
When i used 2.0.0-cdh4.4.0 under hadoop-0.20 the problem went away.
Hope this helps.

changing mapreduce_shuffle to mapreduce.shuffle made it work in my case

Related

java.lang.NoClassDefFoundError: org/apache/htrace/core/HTraceConfiguration

I am using hadoop 2.9.1 and hbase 2.1.0 at stand-alone local mode.
When I tried staring HBase 2.1.0 using sudo start-hbase.sh at bin folder, I got below error:
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/htrace/core/HTraceConfiguration
at org.apache.hadoop.hbase.master.HMasterCommandLine.startMaster(HMasterCommandLine.java:153)
at org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:140)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:149)
at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:2983)
Caused by: java.lang.ClassNotFoundException: org.apache.htrace.core.HTraceConfiguration
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
This is my hbase-site.xml
<configuration>
<property>
<name>hbase.rootdir</name>
<value>/home/niyazmohamed/bigdata/upgraded_versions/hbase-2.1.0/hbasedir</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>localhost</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>hbase.zookeeper.property.clientPort</name>
<value>2181</value>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/home/niyazmohamed/bigdata/upgraded_versions/hbase-2.1.0/zookeeper</value>
</property>
</configuration>
When I tried to start HBase version 1.2.0 , it started successfully and hbase shell was also accessible and CRUD operations were successful.
Hadoop and HBase path are set. Only by that , I was able to run HBase-1.2.0.
Only with HBase-2.1.0, this problem occurs.
Any help appreciated! Thanks in advance!
Related:
Starting HBASE, java.lang.ClassNotFoundException: org.apache.htrace.SamplerBuilder
htrace-core-*-incubating.jar was missing from some early versions of HBase 2.x
If the htrace-core jar is in $HBASE_HOME/lib/client-facing-thirdparty
copy the jar to $HBASE_HOME/lib, otherwise
Download the Jar from Maven here
and place into $HBASE_HOME/lib
You can see in HBase pom.xml for version hbase 2.1 that htrace 4.2.0 is the correct version of the dependency.
https://github.com/apache/hbase/blob/rel/2.1.0/pom.xml#L1364
Goodluck.

Why do I need hadoop lib jars into HDFS?

I created a MapReduce job and I'm testing in a multi-cluster environment, but I'm getting the following error:
Exception in thread "main" java.io.FileNotFoundException: File does not exist: hdfs://bigcluster:9000/opt/hadoop/share/hadoop/common/hadoop-common-2.6.0.jar
at org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1122)
at org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1114)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1114)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:288)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:224)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:93)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestampsAndCacheVisibilities(ClientDistributedCacheManager.java:57)
at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:269)
at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:390)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:483)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1296)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1293)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1293)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1314)
at com.company.hbase.mapreduce.message.maestro.threadIndex.fakecolum.MockTestThreadIndexData.run(MockTestThreadIndexData.java:47)
at com.company.hbase.mapreduce.MaestroUpdateJob.main(MaestroUpdateJob.java:49)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
I see that hadoop-common-2.6.0.jar jar is missing on hdfs://bigcluster:9000/opt/hadoop/share/hadoop/common
The jar file exists on /opt/hadoop/share/hadoop/common, but my job is looking for inside HDFS.
If I copy all the jars (there are a lot of them) to HDFS, it worked. But the problem is, I want to understand, is it really necessary? Someone can explain to me WHY?
If I want to run it in production, do I need to make this? Is that correct?
Also, I see the answer Why do I need to keep hbase/lib folder in hdfs? and yes, if I change the MapReduce framework to YARN, it works also. But I don't want to work with YARN and I just want to understand why I have to move all Hadoop libs to HDFS to run a MapReduce job.
Updated
Here is how I instantiate the jobconf
Job job = Job.getInstance(config, "MyJob");
Scan scan = createScan();
Filter filter = createMyFilter();
FilterList filters = createMyFilter();
scan.setFilter(filters);
TableMapReduceUtil.initTableMapperJob(
MY_TABLE,
scan,
MyMapper.class,
null,
null,
job
);
TableMapReduceUtil.initTableReducerJob(
MY_TABLE,
null,
job
);
job.setNumReduceTasks(0);
Here is my mapred-site.xml
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>myhost:9001</value>
</property>
<property>
<name>hadoop.ssl.enabled</name>
<value>true</value>
</property>
<property>
<name>hadoop.ssl.require.client.cert</name>
<value>false</value>
<final>true</final>
</property>
<property>
<name>hadoop.ssl.hostname.verifier</name>
<value>DEFAULT</value>
<final>true</final>
</property>
<property>
<name>hadoop.ssl.keystores.factory.class</name>
<value>org.apache.hadoop.security.ssl.FileBasedKeyStoresFactory</value>
<final>true</final>
</property>
<property>
<name>hadoop.ssl.server.conf</name>
<value>ssl-server.xml</value>
<final>true</final>
</property>
<property>
<name>hadoop.ssl.client.conf</name>
<value>ssl-client.xml</value>
<final>true</final>
</property>
</configuration>
How I run this:
HADOOP_CLASSPATH=`/opt/hbase/bin/hbase classpath` /opt/hadoop/bin/hadoop jar /tmp/mymapred-1.0-SNAPSHOT-jar-with-dependencies.jar
Solution
Finally, I got the answer from this comment: https://stackoverflow.com/a/31950822/13305602
Inside core-site.xml, there are two properties to configure the default File System inside Hadoop.
<property>
<name>fs.defaultFS</name>
<value>hdfs://myhost.mycompany.com:9000</value>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://myhost.mycompany.com:9000</value>
</property>
The default value of these two properties is file:// see here: https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/core-default.xml
You can change this property on core-site.xml or if you are in an environment where you don't have access to that, you can do it only in job context setting on jobConf.
Configuration configuration = new Configuration();
configuration.set("fs.defaultFS", "file:///");
configuration.set("fs.default.name", "file:///");
Job job = Job.getInstance(configuration, "MyJob");

Hadoop: There are 0 datanodes running and no nodes & cannot connect to namenode

I'm having trouble setting up Hadoop. My setup consists of a nameNode VM and two seperate physical dataNodes that are connected to the same network.
IP configuration:
192.168.118.212 namenode-1
192.168.118.217 datanode-1
192.168.118.216 datanode-2
I keep getting the error that there are 0 datanodes running, but when I do JPS on my dataNode-1 machine or dataNode-2 machine, it shows up as running.
My nameNode log shows this:
File /user/hadoop/.bashrc_COPYING_ could only be replicated to 0 nodes instead of minReplication (=1). There are 0 datanode(s) running and no node(s)
are excluded in this operation.
The logs on my dataNode-1 machine tell me that it has trouble connecting to the nameNode.
WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Problem connecting to server: namenode-1/192.168.118.212:9000
Only weird part is that it can't connect, though it can start it? I can also SSH between all of them with no problems.
So my best guess would be that I've configured the one of the config files incorrectly, though I checked other questions on here and they seem to be correct.
core-site.xml
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://namenode-1:9000/</value>
</property>
</configuration>
hdfs-site.xml
<configuration>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/home/hadoop/hadoop_data/hdfs/datanode</value>
<final>true</final>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/home/hadoop/hadoop_data/hdfs/namenode</value>
<final>true</final>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
</configuration>
mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.job.tracker</name>
<value>namenode-1:9001</value>
</property>
</configuration>
The problem could be the fs.default.name. Try the ip adress as fs.default.name. And check if your /etc/hosts configuration points to the correct ip address. Most likely this is correct, since your datanode figured out the ip address.
The problem could also be the port number! Try 8020 or 50070 instead of 9000 and look what happens.
The problem was the firewall.
You can stop it by running systemctl stop firewalld.service
I found the answer here:
https://stackoverflow.com/a/37994066/8789361

IOException: Cannot initialize Cluster | hadoop 2.4.0

I am trying to run a Map-reduce using hadoop 2.4.0
My code has some dependencies on third-party jar's, So I created a FAT jar using eclipse export->runnable Jar option.
Now when I run the FAT jar using
hadoop jar ~/Documents/job.jar
I get the exception
java.lang.reflect.InvocationTargetException
The above exception is caused by this:
Caused by: java.io.IOException: Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses.
at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:120)
at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:82)
at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:75)
at org.apache.hadoop.mapreduce.Job$9.run(Job.java:1255)
at org.apache.hadoop.mapreduce.Job$9.run(Job.java:1251)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.hadoop.mapreduce.Job.connect(Job.java:1250)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1279)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1303)
at imgProc.MasterClass.main(MasterClass.java:84)
... 10 more
hadoop classpath
hduser#livingstream:/usr/local/hadoop$ hadoop classpath
/usr/local/hadoop-2.4.0/etc/hadoop:/usr/local/hadoop-`2.4.0/share/hadoop/common/lib/*:/usr/local/hadoop-2.4.0/share/hadoop/common/*:/usr/local/hadoop-2.4.0/share/hadoop/hdfs:/usr/local/hadoop-2.4.0/share/hadoop/hdfs/lib/*:/usr/local/hadoop-2.4.0/share/hadoop/hdfs/*:/usr/local/hadoop-2.4.0/share/hadoop/yarn/lib/*:/usr/local/hadoop-2.4.0/share/hadoop/yarn/*:/usr/local/hadoop-2.4.0/share/hadoop/mapreduce/lib/*:/usr/local/hadoop-2.4.0/share/hadoop/mapreduce/*:/usr/local/hadoop/contrib/capacity-scheduler/*.jar`
My configuration files
mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
core-site.xml
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:54310</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/hadoop/data</value>
</property>
</configuration>
yarn-site.xml
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>localhost:8025</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>localhost:8030</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>localhost:8050</value>
</property>
</configuration>
I am not really sure what is going on now , Is it because of the JARs or my config files.
Anybody has anyidea , anything is appreciated ! :)
Error message cleary states that,
Caused by: java.io.IOException: Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses.
You have to configure the mapred-site.xml and core-site.xml and many other configurations you should do..
For step by step, you refer this link Hadoop V2 setup
Hope this helps you.

java.io.IOException: Cannot initialize Cluster in Hadoop2 with YARN

This is my first time posting to stackoverflow, so I apologize if I did something wrong.
I recently set up a new hadoop cluster, and this is my first time trying to use Hadoop 2 and YARN. I currently get the following error when I submit my job.
java.io.IOException: Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses.
at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:120)
at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:82)
at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:75)
at org.apache.hadoop.mapreduce.Job$9.run(Job.java:1255)
at org.apache.hadoop.mapreduce.Job$9.run(Job.java:1251)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
at org.apache.hadoop.mapreduce.Job.connect(Job.java:1250)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1279)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1303)
Here are my configuration files:
mapred-site.xml
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
hdfs-site.xml
<configuration>
<property>
<name>dfs.name.dir</name>
<value>/temp1/nn,/temp2/nn</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/temp1/dn,/temp2/dn</value>
</property>
<property>
<name>fs.checkpoint.dir</name>
<value>/temp1/snn</value>
</property>
<property>
<name>dfs.permissions.supergroup</name>
<value>hrdbms</value>
</property>
<property>
<name>dfs.block.size</name>
<value>268435456</value>
</property>
</configuration>
yarn-site.xml
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>172.31.20.99</value>
</property>
<property>
<name>yarn.nodemanager.local-dirs</name>
<value>/temp1/y1,/temp2/y1</value>
</property>
<property>
<name>yarn.nodemanager.log-dirs</name>
<value>/temp1/y2,/temp2/y2</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
Here is my java code:
Configuration conf = new Configuration();
conf.set("fs.hdfs.impl", org.apache.hadoop.hdfs.DistributedFileSystem.class.getName());
conf.set("fs.file.impl", org.apache.hadoop.fs.LocalFileSystem.class.getName());
conf.setBoolean("mapred.compress.map.output",true);
conf.addResource(new org.apache.hadoop.fs.Path("/usr/local/hadoop-2.5.1/etc/hadoop/core-site.xml"));
conf.addResource(new org.apache.hadoop.fs.Path("/usr/local/hadoop-2.5.1/etc/hadoop/hdfs-site.xml"));
conf.addResource(new org.apache.hadoop.fs.Path("/usr/local/hadoop-2.5.1/etc/hadoop/yarn-site.xml"));
conf.set("mapreduce.framework.name", "yarn");
conf.setClass("mapred.map.output.compression.codec", org.apache.hadoop.io.compress.SnappyCodec.class, CompressionCodec.class);
Job job = new Job(conf);
job.setJarByClass(LoadMapper.class);
job.setJobName("Load " + schema + "." + table);
job.setMapperClass(LoadMapper.class);
job.setReducerClass(LoadReducer.class);
job.setOutputKeyClass(IntWritable.class);
job.setOutputValueClass(ALOWritable.class);
job.setMapOutputKeyClass(IntWritable.class);
job.setMapOutputValueClass(ALOWritable.class);
job.setNumReduceTasks(workerNodes.size());
job.setOutputFormatClass(LoadOutputFormat.class);
job.setReduceSpeculativeExecution(false);
job.setMapSpeculativeExecution(false);
String glob2 = glob.substring(6);
FileInputFormat.addInputPath(job, new org.apache.hadoop.fs.Path(glob2));
HRDBMSWorker.logger.debug("Submitting MR job");
boolean allOK = job.waitForCompletion(true);
Here are all of the environment variables that are in place when I start the JVM
HADOOP_DATANODE_OPTS=-Dhadoop.security.logger=ERROR,RFAS
HOSTNAME=ip-172-31-20-103
HADOOP_IDENT_STRING=hrdbms
SHELL=/bin/bash
TERM=xterm
HADOOP_HOME=/usr/local/hadoop-2.5.1
HISTSIZE=1000
HADOOP_PID_DIR=
YARN_HOME=/usr/local/hadoop-2.5.1
USER=hrdbms
LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=01;05;37;41:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arj=01;31:*.taz=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.dz=01;31:*.gz=01;31:*.lz=01;31:*.xz=01;31:*.bz2=01;31:*.tbz=01;31:*.tbz2=01;31:*.bz=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.rar=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.jpg=01;35:*.jpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.axv=01;35:*.anx=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=01;36:*.au=01;36:*.flac=01;36:*.mid=01;36:*.midi=01;36:*.mka=01;36:*.mp3=01;36:*.mpc=01;36:*.ogg=01;36:*.ra=01;36:*.wav=01;36:*.axa=01;36:*.oga=01;36:*.spx=01;36:*.xspf=01;36:
HADOOP_SECURE_DN_PID_DIR=
HADOOP_SECURE_DN_LOG_DIR=/
MAIL=/var/spool/mail/hrdbms
PATH=/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hrdbms/bin
HADOOP_HDFS_HOME=/usr/local/hadoop-2.5.1
HADOOP_CLIENT_OPTS=-Xmx512m
HADOOP_COMMON_HOME=/usr/local/hadoop-2.5.1
PWD=/home/hrdbms
JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.55.x86_64/jre
HADOOP_CLASSPATH=/home/hrdbms/HRDBMS.jar:/contrib/capacity-scheduler/*.jar
HADOOP_CONF_DIR=/etc/hadoop
LANG=en_US.UTF-8
HADOOP_PORTMAP_OPTS=-Xmx512m
HADOOP_OPTS= -Djava.net.preferIPv4Stack=true
HADOOP_SECONDARYNAMENODE_OPTS=-Dhadoop.security.logger=INFO,RFAS -Dhdfs.audit.logger=INFO,NullAppender
HISTCONTROL=ignoredups
SHLVL=1
HOME=/home/hrdbms
YARN_CONF_DIR=/etc/hadoop
HADOOP_SECURE_DN_USER=
HADOOP_NAMENODE_OPTS=-Dhadoop.security.logger=INFO,RFAS -Dhdfs.audit.logger=INFO,NullAppender
HADOOP_MAPRED_HOME=/usr/local/hadoop-2.5.1
LOGNAME=hrdbms
HADOOP_NFS3_OPTS=
LESSOPEN=|/usr/bin/lesspipe.sh %s
HADOOP_YARN_USER=hrdbms
G_BROKEN_FILENAMES=1
_=/bin/env
Here is a list of all jars in the client classpath
activation-1.1.jar
antlr-4.2.1-complete.jar
aopalliance-1.0.jar
apacheds-i18n-2.0.0-M15.jar
apacheds-kerberos-codec-2.0.0-M15.jar
api-asn1-api-1.0.0-M20.jar
api-util-1.0.0-M20.jar
asm-3.2.jar
avro-1.7.4.jar
commons-beanutils-1.7.0.jar
commons-beanutils-core-1.8.0.jar
commons-cli-1.2.jar
commons-codec-1.3.jar
commons-codec-1.4.jar
commons-collections-3.2.1.jar
commons-compress-1.4.1.jar
commons-configuration-1.6.jar
commons-daemon-1.0.13.jar
commons-digester-1.8.jar
commons-el-1.0.jar
commons-httpclient-3.1.jar
commons-io-2.4.jar
commons-lang-2.6.jar
commons-logging-1.1.3.jar
commons-math3-3.1.1.jar
commons-net-3.1.jar
guava-11.0.2.jar
guice-3.0.jar
guice-servlet-3.0.jar
hadoop-annotations-2.5.1.jar
hadoop-archives-2.5.1.jar
hadoop-auth-2.5.1.jar
hadoop-common-2.5.1-tests.jar
hadoop-common-2.5.1.jar
hadoop-datajoin-2.5.1.jar
hadoop-distcp-2.5.1.jar
hadoop-extras-2.5.1.jar
hadoop-gridmix-2.5.1.jar
hadoop-hdfs-2.5.1-tests.jar
hadoop-hdfs-2.5.1.jar
hadoop-hdfs-nfs-2.5.1.jar
hadoop-mapreduce-client-app-2.5.1.jar
hadoop-mapreduce-client-common-2.5.1.jar
hadoop-mapreduce-client-core-2.5.1.jar
hadoop-mapreduce-client-hs-2.5.1.jar
hadoop-mapreduce-client-hs-plugins-2.5.1.jar
hadoop-mapreduce-client-jobclient-2.5.1-tests.jar
hadoop-mapreduce-client-jobclient-2.5.1.jar
hadoop-mapreduce-client-shuffle-2.5.1.jar
hadoop-mapreduce-examples-2.5.1.jar
hadoop-nfs-2.5.1.jar
hadoop-openstack-2.5.1.jar
hadoop-rumen-2.5.1.jar
hadoop-sls-2.5.1.jar
hadoop-streaming-2.5.1.jar
hadoop-yarn-api-2.5.1.jar
hadoop-yarn-applications-distributedshell-2.5.1.jar
hadoop-yarn-applications-unmanaged-am-launcher-2.5.1.jar
hadoop-yarn-client-2.5.1.jar
hadoop-yarn-common-2.5.1.jar
hadoop-yarn-server-applicationhistoryservice-2.5.1.jar
hadoop-yarn-server-common-2.5.1.jar
hadoop-yarn-server-nodemanager-2.5.1.jar
hadoop-yarn-server-resourcemanager-2.5.1.jar
hadoop-yarn-server-tests-2.5.1.jar
hadoop-yarn-server-web-proxy-2.5.1.jar
hamcrest-core-1.3.jar
httpclient-4.2.5.jar
httpcore-4.2.5.jar
jackson-core-asl-1.9.13.jar
jackson-jaxrs-1.9.13.jar
jackson-mapper-asl-1.9.13.jar
jackson-xc-1.9.13.jar
jasper-compiler-5.5.23.jar
jasper-runtime-5.5.23.jar
java-xmlbuilder-0.4.jar
javax.inject-1.jar
jaxb-api-2.2.2.jar
jaxb-impl-2.2.3-1.jar
jersey-client-1.9.jar
jersey-core-1.9.jar
jersey-guice-1.9.jar
jersey-json-1.9.jar
jersey-server-1.9.jar
jets3t-0.9.0.jar
jettison-1.1.jar
jetty-6.1.26.jar
jetty-util-6.1.26.jar
jline-0.9.94.jar
jsch-0.1.50.jar
jsp-api-2.1.jar
jsr305-1.3.9.jar
junit-4.11.jar
leveldbjni-all-1.8.jar
log4j-1.2.17.jar
metrics-core-3.0.0.jar
mockito-all-1.8.5.jar
netty-3.6.2.Final.jar
paranamer-2.3.jar
preflight-app-1.8.7.jar
protobuf-java-2.5.0.jar
servlet-api-2.5.jar
slf4j-api-1.7.5.jar
slf4j-log4j12-1.7.5.jar
snappy-java-1.0.4.1.jar
stax-api-1.0-2.jar
xmlenc-0.52.jar
zookeeper-3.4.6.jar
Please help! Thanks!
EDIT: I just found these debug log messages.
2014-10-27 19:31:21,789 DEBUG Cluster: Trying ClientProtocolProvider : org.apache.hadoop.mapred.LocalClientProtocolProvider
2014-10-27 19:31:21,789 DEBUG Cluster: Cannot pick org.apache.hadoop.mapred.LocalClientProtocolProvider as the ClientProtocolProvider - returned null protocol
I have run into similar issues today. In my case I was building an über jar, where some dependency (I have not found the culprit yet) was bringing in a META-INF/services/org.apache.hadoop.mapreduce.protocol.ClientProtocolProvider with the contents:
org.apache.hadoop.mapred.LocalClientProtocolProvider
I provided my own in the project (e.g. put it on the classpath) with the following:
org.apache.hadoop.mapred.YarnClientProtocolProvider
and the correct one is picked up. I suspect you are seeing similar. To fix, please create the file described above, and put it on the classpath. If I find the culprit Jar, I will update the answer.
Turn off security in your cluster (if that is ok with you in your environment obviously).
i.e. turn off this HDFS setting, i.e. :
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
it is on by default.
In Cloudera manager it is accessible from the Configuration panel.
I ran into the same issue trying to run a MR job from Eclipse with the latest CDH5.5 Hadoop 2.6 distribution.
I am not sure what specifically was the issue , it might very well be a classloading issue that #timrobertson100 mentioned .... but in my case, I was able to overcome this by adding all jars from the paths below to the Eclipse project's classpath:
.../hadoop-2.6.0-cdh5.5.1/share/hadoop/common/hadoop-common-2.6.0-cdh5.5.1.jar
.../hadoop-2.6.0-cdh5.5.1/share/hadoop/common/lib/*
.../hadoop-2.6.0-cdh5.5.1/share/hadoop/mapreduce2/*
.../hadoop-2.6.0-cdh5.5.1/share/hadoop/yarn/*
Marina

Categories