eclipse hadoop connection refused - java

I'm running WordCount example of mapreduce on hadoop on my local.
When I run it with the command:
hadoop jar MRTest.jar example.WordCount /gutenberg /out333
It works.
in the command
MRTest.jar is the jar package which exported from my eclipse.
/gutenberg is the input path on my hdfs.
/out333 is the output path on my hdfs.
but when I run it on eclipse with arguments :
hdfs://localhost:9000/gutenberg hdfs://localhost:9000/output6
it throws the following exception:
log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Exception in thread "main" java.net.ConnectException: Call From SWD-LIUQIN00-D1/172.26.35.141 to localhost:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783)
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:730)
at org.apache.hadoop.ipc.Client.call(Client.java:1351)
at org.apache.hadoop.ipc.Client.call(Client.java:1300)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
at $Proxy9.getFileInfo(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at $Proxy9.getFileInfo(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:651)
at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1679)
at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1106)
at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1102)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1102)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1397)
at org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:145)
at org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:456)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:342)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1268)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1265)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1265)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1286)
at example.WordCount.main(WordCount.java:81)
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:529)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:493)
at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:547)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:642)
at org.apache.hadoop.ipc.Client$Connection.access$2600(Client.java:314)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1399)
at org.apache.hadoop.ipc.Client.call(Client.java:1318)
... 28 more
Here is some config on my local:
slaves:
hadoop-slave141
/etc/hosts:
127.0.0.1 hadoop-slave141 localhost
core-site.xml:
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/data/hadoop/hadoop/tmp</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000/</value>
</property>
<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoop-master:9000</value>
</property>
</configuration>
hdfs-site.xml:
<configuration>
<property>
<name>dfs.namenode.name.dir</name>
<value>/data/hadoop/dfs/hadoop2/name</value>
<description>Path on the local filesystem where the NameNode stores the namespace and transactions logs persistently.</description>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/data/hadoop/dfs/hadoop2/data</value>
<description>Comma separated list of paths on the local filesystem of a DataNode where it should store its blocks.</description>
</property>
</configuration>
mapred-site.xml:
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>hadoop-master:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>hadoop-master:19888</value>
</property>
<property>
<name>mapred.job.tracker</name> //JobTracker的主机(或者IP)和端口。
<value>hdfs://hadoop-master:9001</value>
</property>
</configuration>
yarn-site.xml:
<configuration>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>hadoop-master:8031</value>
<description>host is the hostname of the resource manager and port is the port on which the NodeManagers contact the Resource Manager. </description>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>hadoop-master:8030</value>
<description>host is the hostname of the resourcemanager and port is the port on which the Applications in the cluster talk to the Resource Manager. </description>
</property>
<property>
<name>yarn.resourcemanager.scheduler.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler</value>
<description>In case you do not want to use the default scheduler</description>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>0.0.0.0:8032</value>
<description>the host is the hostname of the ResourceManager and the port is the port on which the clients can talk to the Resource Manager. </description>
</property>
<property>
<name>yarn.nodemanager.local-dirs</name>
<value>${hadoop.tmp.dir}/nodemanager/local</value>
<description>the local directories used by the nodemanager</description>
</property>
<property>
<name>yarn.nodemanager.address</name>
<value>0.0.0.0:8034</value>
<description>the nodemanagers bind to this port</description>
</property>
<property>
<name>yarn.nodemanager.remote-app-log-dir</name>
<value>${hadoop.tmp.dir}/nodemanager/remote</value>
<description>directory on hdfs where the application logs are moved to </description>
</property>
<property>
<name>yarn.nodemanager.log-dirs</name>
<value>${hadoop.tmp.dir}/nodemanager/logs</value>
<description>the directories used by Nodemanagers as log directories</description>
</property>
<!-- Use mapreduce_shuffle instead of mapreduce.suffle (YARN-1229)-->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
</configuration>
hadoop fs -ls / :
14/04/22 10:24:30 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 9 items
-rw-r--r-- 3 hadoop supergroup 1195054 2014-04-18 08:27 /gutenberg
drwxr-xr-x - hadoop supergroup 0 2014-04-21 20:38 /out222
drwxr-xr-x - hadoop supergroup 0 2014-04-22 09:55 /out333
drwxr-xr-x - hadoop supergroup 0 2014-04-18 08:45 /output
drwxr-xr-x - hadoop supergroup 0 2014-04-18 11:23 /output2
drwxr-xr-x - hadoop supergroup 0 2014-04-18 11:28 /output3
drwxr-xr-x - hadoop supergroup 0 2014-04-18 13:38 /output3jps
drwxr-xr-x - hadoop supergroup 0 2014-04-21 17:48 /outt
drwx------ - hadoop supergroup 0 2014-04-18 08:44 /tmp
jps:
11149 Jps
7762 ResourceManager
8074 NodeManager
6901 NameNode
7204 DataNode
7565 SecondaryNameNode
I tested both eclipse helios and also kelper.
hadoop version is 2.2.0.
I am also wondering what is the diff between running with the command line and in eclipse.
I use hadoop user which is the user of my hadoop from start to end.
And Eclipse is also launched by hadoop user.
Somebody please help me, I 've stucked on this for 3 days.

Actually, you can use the HDFS path as argument.
Try to change the value of fs.defalutFS in "core-site.xml" to localhost:
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
And then restart the Hadoop.

Related

YARN job failed when running from Windows

I'm trying to run Spring Boot YARN sample on Windows.
On my VM does run single node Hadoop 2.7.1.
When I'm trying to run the app from Windows using java -jar ..., Spring Yarn deploys all jars successfully - I can browse and observe them in Hadoop FS.
While running program in cluster (host:8088/cluster) I can see that app is submitted, then runs container and after that app fails with next exception in logs:
Application application_1496328851344_0001 failed 2 times due to AM Container for appattempt_1496328851344_0001_000002 exited with exitCode: 1
For more detailed output, check application tracking page:http://host:8088/cluster/app/application_1496328851344_0001 Then, click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_1496328851344_0001_02_000001
Exit code: 1
Exception message: /bin/bash: line 0: fg: (null): no job control
Stack trace: ExitCodeException exitCode=1: /bin/bash: line 0: fg: (null): no job control
at org.apache.hadoop.util.Shell.runCommand(Shell.java:545)
at org.apache.hadoop.util.Shell.run(Shell.java:456)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
Container exited with a non-zero exit code 1
Failing this attempt. Failing the application.
But when I'm starting app on VM - everything works.
Here are my Hadoop config files:
core-site.xml:
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://192.168.0.106:9000</value>
</property>
</configuration>
hdfs-site.xml:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/usr/local/hadoop-2.7.1/data/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/usr/local/hadoop-2.7.1/data/datanode</value>
</property>
<property>
<name>dfs.client.use.datanode.hostname</name>
<value>true</value>
</property>
<property>
<name>dfs.permissions.enabled</name>
<value>false</value>
</property>
mapred-site.xml:
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
yarn-site.xml:
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.nodemanager.vmem-pmem-ratio</name>
<value>5</value>
</property>
</configuration>
UPD setting mapreduce.app-submission.cross-platform property to true doesn't help.

Spring Boot YARN doesn't run on Hadoop 2.8.0 client cannot access DataNode

I'm trying to run Spring Boot YARN sample (https://spring.io/guides/gs/yarn-basic/ on Windows). In application.yml I changed fsUri and resourceManagerHost to point to my VM's host 192.168....
But when I'm trying to run application Exceprion appears:
DFSClient: Exception in createBlockOutputStream
java.net.ConnectException: Connection timed out: no further information
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
at org.apache.hadoop.hdfs.DFSOutputStream.createSocketForPipeline(DFSOutputStream.java:1508)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1284)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1237)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:449)
[2017-05-27 19:59:49.570] boot - 7728 INFO [Thread-5] --- DFSClient: Abandoning BP-646365587-10.0.2.15-1495898351938:blk_1073741830_1006
[2017-05-27 19:59:49.602] boot - 7728 INFO [Thread-5] --- DFSClient: Excluding datanode DatanodeInfoWithStorage[10.0.2.15:50010,DS-f909ec7a-8374-4cdd-9cfc-0e778810d98c,DISK]
[2017-05-27 19:59:49.647] boot - 7728 WARN [Thread-5] --- DFSClient: DataStreamer Exception
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /app/gs-yarn-basic/gs-yarn-basic-container-0.1.0.jar could only be replicated to 0 nodes instead of minReplication (=1). There are 1 datanode(s) running and 1 node(s) are excluded in this operation.
It means that DataNode isn't accessible from my host machine. For that reason I added to hdfs-site.xml
<property>
<name>dfs.client.use.datanode.hostname</name>
<value>true</value>
<description>Whether clients should use datanode hostnames when
connecting to datanodes.
</description>
</property>
But it still throws that exception.
I've got Hadoop 2.8.0 running on my VM. Here's conf. files:
core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://0.0.0.0:9000</value>
</property>
</configuration>
hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/usr/local/hadoop/hadoop-2.8.0/data/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/usr/local/hadoop/hadoop-2.8.0/data/datanode</value>
</property>
<property>
<name>dfs.permissions.enabled</name>
<value>false</value>
</property>
<property>
<name>dfs.client.use.datanode.hostname</name>
<value>true</value>
<description>Whether clients should use datanode hostnames when
connecting to datanodes.
</description>
</property>
</configuration>
mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>8192</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>8192</value>
</property>
<property>
<name>yarn.nodemanager.disk-health-checker.max-disk-utilization-per-
disk-percentage</name>
<value>99</value>
</property>
</configuration>
Your core-site.xml should point to Namenode address but currently its pointing to 0.0.0.0 which means all addresses on the local machine. This will create ambiguous result as each machine shall be treated as Namenode.
Namenode should be only one in a hadoop cluster.
Replacing the 0.0.0.0 with the Namenode's ip or hostname should resolve the issue you are facing.
Spring connected to YARN after changed 0.0.0.0:9000 to [VM's IP]:9000 in core-site.xml. Thanks to #RameshMaharjan

Hadoop example not working after installed

Hi I recently installed hadoop 2.7.2 in a distributed mode, with the namenode being hadoop and datanode being hadoop1 and hadoop2. When I do yarn jar /usr/local/hadoop/hadoop-2.7.2/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar pi 2 1000 in bash, it gives me error message like:
Number of Maps = 2
Samples per Map = 1000
java.io.IOException: Failed on local exception: com.google.protobuf.InvalidProtocolBufferException: Protocol message tag had invalid wire type.; Host Details : local host is: "benji/192.168.1.4"; destination host is: "hadoop":9000;
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:773)
at org.apache.hadoop.ipc.Client.call(Client.java:1479)
at org.apache.hadoop.ipc.Client.call(Client.java:1412)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
at com.sun.proxy.$Proxy9.getFileInfo(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:771)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy10.getFileInfo(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:2108)
at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1305)
at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1301)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1301)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1424)
at org.apache.hadoop.examples.QuasiMonteCarlo.estimatePi(QuasiMonteCarlo.java:278)
at org.apache.hadoop.examples.QuasiMonteCarlo.run(QuasiMonteCarlo.java:354)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.examples.QuasiMonteCarlo.main(QuasiMonteCarlo.java:363)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71)
at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:74)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: com.google.protobuf.InvalidProtocolBufferException: Protocol message tag had invalid wire type.
at com.google.protobuf.InvalidProtocolBufferException.invalidWireType(InvalidProtocolBufferException.java:99)
at com.google.protobuf.UnknownFieldSet$Builder.mergeFieldFrom(UnknownFieldSet.java:498)
at com.google.protobuf.GeneratedMessage.parseUnknownField(GeneratedMessage.java:193)
at org.apache.hadoop.ipc.protobuf.RpcHeaderProtos$RpcResponseHeaderProto.<init>(RpcHeaderProtos.java:2207)
at org.apache.hadoop.ipc.protobuf.RpcHeaderProtos$RpcResponseHeaderProto.<init>(RpcHeaderProtos.java:2165)
at org.apache.hadoop.ipc.protobuf.RpcHeaderProtos$RpcResponseHeaderProto$1.parsePartialFrom(RpcHeaderProtos.java:2295)
at org.apache.hadoop.ipc.protobuf.RpcHeaderProtos$RpcResponseHeaderProto$1.parsePartialFrom(RpcHeaderProtos.java:2290)
at com.google.protobuf.AbstractParser.parsePartialFrom(AbstractParser.java:200)
at com.google.protobuf.AbstractParser.parsePartialDelimitedFrom(AbstractParser.java:241)
at com.google.protobuf.AbstractParser.parseDelimitedFrom(AbstractParser.java:253)
at com.google.protobuf.AbstractParser.parseDelimitedFrom(AbstractParser.java:259)
at com.google.protobuf.AbstractParser.parseDelimitedFrom(AbstractParser.java:49)
at org.apache.hadoop.ipc.protobuf.RpcHeaderProtos$RpcResponseHeaderProto.parseDelimitedFrom(RpcHeaderProtos.java:3167)
at org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1085)
at org.apache.hadoop.ipc.Client$Connection.run(Client.java:979)
And if I do hadoop jar /usr/local/hadoop/hadoop-2.7.2/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar pi 2 1000, it gives error message like:
Number of Maps = 2
Samples per Map = 1000
java.io.IOException: Failed on local exception: com.google.protobuf.InvalidProtocolBufferException: Protocol message tag had invalid wire type.; Host Details : local host is: "hadoop/192.168.1.4"; destination host is: "hadoop":9000;
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:773)
... blabla ...
Notice the weird mysterious difference between the two error messages lies in local host name (one is benji/192.168.1.4 and the other is hadoop/192.168.1.4). I do start-dfs.sh, and start-yarn.sh before the yarn jar ..., all look well.
I will be very much appreciate if anyone can help to figure out the problem. Here are contents of some configuration files:
/etc/hosts file (benji is the non-hadoop account on the master computer):
192.168.1.4 hadoop benji
192.168.1.5 hadoop1
192.168.1.9 hadoop2
/etc/hostname file:
hadoop
~/.ssh/config file:
# hadoop1
Host hadoop1
HostName 192.168.1.5
User hadoop1
IdentityFile ~/.ssh/hadoopid
# hadoop2
Host hadoop2
HostName 192.168.1.9
User hadoop2
IdentityFile ~/.ssh/hadoopid
# hadoop localhost
Host localhost
HostName localhost
User hadoop
IdentityFile ~/.ssh/hadoopid
# hadoop
Host hadoop
HostName hadoop
User hadoop
IdentityFile ~/.ssh/hadoopid
core-site.xml file:
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>file:///usr/local/hadoop/hadoopdata/tmp</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoop:9000</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
</property>
</configuration>
hdfs-site.xml file:
<configuration>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///usr/local/hadoop/hadoopdata/dfs/namenode</value>
</property>
<property>
<name>dfs.datanode.du.reserved</name>
<value>21474836480</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///usr/local/hadoop/hadoopdata/dfs/datanode</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>hadoop:9001</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property> </configuration>
Could anyone help on this issue? Thanks!
UPDATE 1
I figured out part of the problem. I did jps and found datanode and namenode was not running. After netstat -an | grep 9000 and lsof -i :9000 I found that another process is listening the port 9000. The namenode was able to run after I changed fs.defaultFS from hdfs://hadoop:9000 to hdfs://hadoop:9001 in the core-site.xml file, and dfs.namenode.secondary.http-address from hadoop:9001 to hadoop:9002 in hdfs-site.xml. The protocol-buffer error message disappeared after this change. But the datanodes were still not running according to the result of jps.
The datanode log file shows something weird happening:
... blabla ...
2016-05-19 12:27:12,157 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: hadoop/192.168.
1.4:9000. Already tried 44 time(s); maxRetries=45
2016-05-19 12:27:32,158 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Problem connecting to se
rver: hadoop/192.168.1.4:9000
... blabla ...
2016-05-19 13:41:55,382 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: RECEIVED SIGNAL 15: SIGTERM
2016-05-19 13:41:55,387 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG:
... blabla ...
I do not understand why the datanode tries to connect to the namenode on port 9000.
You should have a configured Hadoop package installed in all your slaves, only changing the fs.defaultFS in namenode to hdfs://hadoop:9001 but not the datanodes will yield the datanodes try to connect to hdfs://hadoop:900 as it's stated in their core-site.xml.

IOException: Cannot initialize Cluster | hadoop 2.4.0

I am trying to run a Map-reduce using hadoop 2.4.0
My code has some dependencies on third-party jar's, So I created a FAT jar using eclipse export->runnable Jar option.
Now when I run the FAT jar using
hadoop jar ~/Documents/job.jar
I get the exception
java.lang.reflect.InvocationTargetException
The above exception is caused by this:
Caused by: java.io.IOException: Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses.
at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:120)
at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:82)
at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:75)
at org.apache.hadoop.mapreduce.Job$9.run(Job.java:1255)
at org.apache.hadoop.mapreduce.Job$9.run(Job.java:1251)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.hadoop.mapreduce.Job.connect(Job.java:1250)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1279)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1303)
at imgProc.MasterClass.main(MasterClass.java:84)
... 10 more
hadoop classpath
hduser#livingstream:/usr/local/hadoop$ hadoop classpath
/usr/local/hadoop-2.4.0/etc/hadoop:/usr/local/hadoop-`2.4.0/share/hadoop/common/lib/*:/usr/local/hadoop-2.4.0/share/hadoop/common/*:/usr/local/hadoop-2.4.0/share/hadoop/hdfs:/usr/local/hadoop-2.4.0/share/hadoop/hdfs/lib/*:/usr/local/hadoop-2.4.0/share/hadoop/hdfs/*:/usr/local/hadoop-2.4.0/share/hadoop/yarn/lib/*:/usr/local/hadoop-2.4.0/share/hadoop/yarn/*:/usr/local/hadoop-2.4.0/share/hadoop/mapreduce/lib/*:/usr/local/hadoop-2.4.0/share/hadoop/mapreduce/*:/usr/local/hadoop/contrib/capacity-scheduler/*.jar`
My configuration files
mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
core-site.xml
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:54310</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/hadoop/data</value>
</property>
</configuration>
yarn-site.xml
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>localhost:8025</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>localhost:8030</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>localhost:8050</value>
</property>
</configuration>
I am not really sure what is going on now , Is it because of the JARs or my config files.
Anybody has anyidea , anything is appreciated ! :)
Error message cleary states that,
Caused by: java.io.IOException: Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses.
You have to configure the mapred-site.xml and core-site.xml and many other configurations you should do..
For step by step, you refer this link Hadoop V2 setup
Hope this helps you.

Cannot initialize cluster exception while running job on Hadoop 2

The question is linked to my previous question All the daemons are running, jps shows:
6663 JobHistoryServer
7213 ResourceManager
9235 Jps
6289 DataNode
6200 NameNode
7420 NodeManager
but the wordcount example keeps on failing with the following exception:
ERROR security.UserGroupInformation: PriviledgedActionException as:root (auth:SIMPLE) cause:java.io.IOException: Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses.
Exception in thread "main" java.io.IOException: Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses.
at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:120)
at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:82)
at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:75)
at org.apache.hadoop.mapreduce.Job$9.run(Job.java:1238)
at org.apache.hadoop.mapreduce.Job$9.run(Job.java:1234)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
at org.apache.hadoop.mapreduce.Job.connect(Job.java:1233)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1262)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1286)
at WordCount.main(WordCount.java:80)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Since it says the problem is in configuration, I am posting the configuration files here. The intention is to create a single node cluster.
yarn-site.xml
<?xml version="1.0"?>
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
</configuration>
core-site.xml
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/home/hduser/yarn/yarn_data/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/home/hduser/yarn/yarn_data/hdfs/datanode</value>
</property>
</configuration>
mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>Yarn</value>
</property>
</configuration>
Please tell what is missing or what am I doing wrong.
I was having similar issue, but yarn was not the issue.
After adding following jars into my classpath issue got resolved:
hadoop-mapreduce-client-jobclient-2.2.0.2.0.6.0-76
hadoop-mapreduce-client-common-2.2.0.2.0.6.0-76
hadoop-mapreduce-client-shuffle-2.2.0.2.0.6.0-76
You have uppercased Yarn, which is probably why it can not resolve it. Try the lowercase version that is suggested in the official documentation.
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
Looks like i had a lucky day and went with this exception through 'all' of those causes. Summary:
wrong mapreduce.framework.name (see above)
missing mapreduce job-client jars (see above)
wrong version (see Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses-submiting job2remoteClustr )
my configured 'yarn.ipc.client.factory.class' wasn't in the classpath of the yarn server's (just on the client)
In my case i was trying to use sqoop and ran into this error.
Turns out that i was pointing to the latest version of hadoop 2.0 available from CDH repo for which sqoop was not supported.
The version of cloudera was 2.0.0-cdh4.4.0 which had yarn support build in.
When i used 2.0.0-cdh4.4.0 under hadoop-0.20 the problem went away.
Hope this helps.
changing mapreduce_shuffle to mapreduce.shuffle made it work in my case

Categories