Cannot replicate to Datanode in Hadoop while importing from MySQL database - java

I am trying to import data from mysql table to hdfs. I am using the below sqoop import command
sqoop import --connect jdbc:mysql://localhost:3306/employee --username root --password *** --table Emp --m 1
I am getting the below error
16/05/07 20:01:18 ERROR tool.ImportTool: Encountered IOException running import job: java.io.FileNotFoundException: File does not exist: hdfs://localhost:54310/usr/lib/sqoop/lib/parquet-format-2.0.0.jar
at org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1122)
at org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1114)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1114)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:288)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:224)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:93)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestampsAndCacheVisibilities(ClientDistributedCacheManager.java:57)
at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:269)
at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:390)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:483)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1296)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1293)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1293)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1314)
at org.apache.sqoop.mapreduce.ImportJobBase.doSubmitJob(ImportJobBase.java:196)
at org.apache.sqoop.mapreduce.ImportJobBase.runJob(ImportJobBase.java:169)
at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:266)
at org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:673)
at org.apache.sqoop.manager.MySQLManager.importTable(MySQLManager.java:118)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:497)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
I have the parquet-format-2.0.0.jar at usr/lib/sqoop folder, but even then it showing the error.
I tried to import all the sqoop lib to the hdfs but then i can do that its throwing the below error
16/05/07 18:40:11 WARN hdfs.DFSClient: DataStreamer Exception
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /usr/lib/sqoop/lib/xz-1.0.jar.COPYING could only be replicated to 0 nodes instead of minReplication (=1). There are 0 datanode(s) running and no node(s) are excluded in this operation.
at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1549)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3200)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:641)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:482)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
What can do now? I can't copy the jar files to the HDFS and also I can't import the data to HDFS form MySQL.
I tried this solution
sqoop import eror - File does not exist:
but can't proceed from the second step. I also cleared the cache and restarted the Hadoop file system.
Thanks

For datanode replication error, following may be the reasons:
Datanodes doesn’t have enough disk space to store the blocks
Namenode can not reach Datanode(s) or Datanode(s) could be down/unavailable
Make sure the connectivity between Namenode and Datanode(s) and also the Datanode(s) have sufficient space to store the new blocks.
In case, if there is enough disk space, you need to reformat your nameNode.

Related

Getting hadoop namenode -format

I am trying to put a jar file in hadoop but getting the hadoop namenode -format
enter image description here as you can see jar file is already created
[cloudera#quickstart ~]$ hdfs dfs -put AverageWordCount.jar /user/cloudera
21/10/14 01:27:08 WARN hdfs.DFSClient: DataStreamer Exception
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /user/cloudera/AverageWordCount.jar._COPYING_ could only be replicated to 0 nodes instead of minReplication (=1). There are 0 datanode(s) running and no node(s) are excluded in this operation.
at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1541)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3286)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:667)
at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.addBlock(AuthorizationProviderProxyClientProtocol.java:212)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:483)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2040)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2038)
at org.apache.hadoop.ipc.Client.call(Client.java:1468)
at org.apache.hadoop.ipc.Client.call(Client.java:1399)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
at com.sun.proxy.$Proxy14.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:399)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy15.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1544)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1361)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:600)
put: File /user/cloudera/AverageWordCount.jar._COPYING_ could only be replicated to 0 nodes instead of minReplication (=1). There are 0 datanode(s) running and no node(s) are excluded in this operation.
Need to first make sure the Namenode is in safe mode to check that below is the steps:
To know the status of Safemode, use command:
hadoop dfsadmin –safemode get
To enter Safemode, use command:
bin/hadoop dfsadmin –safemode enter
To come out of Safemode, use command:
hadoop dfsadmin -safemode leave
After that restart the hdfs services. And try again to Put the .jar file again in the Hadoop

Run co-occurrence algorithm in hadoop

I found the following project on github https://github.com/fbukevin/hadoop-cooccurrence which uses a co-occurrence algorithm in hadoop.
I’m using a virtualized Ubuntu 14.04 and managed to install hadoop as a single node cluster with this instruction http://www.bogotobogo.com/Hadoop/BigData_hadoop_Install_on_ubuntu_single_node_cluster.php. I'm new to hadoop and this are my first attempts to run a program with yarn.
I can execute the command yarn in command line, but I don’t know how to run the co-occurrence algorithm in yarn. In the description it says that the program can be used with the following command
$ yarn jar <hadoop>.jar [pairs | stripes] <input_file>
So I tried this:
$ yarn jar /home/vmiller/Downloads/hadoop-2.7.2/share/hadoop/common/hadoop-common-2.7.2.jar pairs pg100.txt
Exception in thread "main" java.lang.ClassNotFoundException: pairs
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:278)
at org.apache.hadoop.util.RunJar.run(RunJar.java:214)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
This is definitely not correct but I don't know how to run the command correctly. Somehow I have to tell yarn to use the Cooccurrence.java located in hadoop-cooccurrence/src/main/java/cooc/Cooccurrence.java because this file seems to be the one that executes the co-occurrence algorithm. But how do I tell yarn to use this file with the pairs and stripesarguments on the input file?
You should give jar the path to the jar including the Cooccurrence class.
Jar is in target folder (cooc-1.0-SNAPSHOT.jar).
You don't need to indicate the class name as it is set up in the Manifest file
I actually managed to run the programm. My approach wasn't that wrong, as tokiloutok mentioned I had to include the right jar file.
Before I could execute the command I had to import the pg100.txt into HDFS.
So I had to deactivate the safe mode of the name node with
hdfs dfsadmin -safemode leave
and import the file with
hdfs dfs -put /home/vmiller/workspace/hadoop-cooccurrence/pg100.txt /user/hadoop/
so that I could finally run
yarn jar target/cooc-1.0-SNAPSHOT.jar pairs pg100.txt
without getting any errors.

GridGain Error : java.lang.NoClassDefFoundError: org/apache/hadoop/mapreduce/JobContext

I have configured gridgain-hadoop-os-6.6.2.zip, and followed steps as mentioned in docs/hadoop_readme.pdf . started gridgain using bin/ggstart.sh command, now am running a simple wordcount code in gridgain with hadoop-2.2.0. using command
hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/*-mapreduce-examples-*.jar wordcount /input /output
Steps I tried:
Step 1:
Extracted hadoop-2.2.0 and gridgain-hadoop-os-6.6.2.zip file in usr/local folder and changed name for gridgain folder as "gridgain".
Step 2:
Set the path for export GRIDGAIN_HOME=/usr/local/gridgain.. and path for hadoop-2.2.0 with JAVA_HOME as
# Set Hadoop-related environment variables
export HADOOP_PREFIX=/usr/local/hadoop-2.2.0
export HADOOP_HOME=/usr/local/hadoop-2.2.0
export HADOOP_MAPRED_HOME=/usr/local/hadoop-2.2.0
export HADOOP_COMMON_HOME=/usr/local/hadoop-2.2.0
export HADOOP_HDFS_HOME=/usr/local/hadoop-2.2.0
export YARN_HOME=/usr/local/hadoop-2.2.0
export HADOOP_CONF_DIR=/usr/local/hadoop-2.2.0/etc/hadoop
export GRIDGAIN_HADOOP_CLASSPATH='/usr/local/hadoop-2.2.0/lib/*:/usr/local/hadoop-2.2.0/lib/*:/usr/local/hadoop-2.2.0/lib/*'
Step 3:
now i run command as bin/setup-hadoop.sh ... answer Y to every prompt.
Step 4:
started gridgain using command
bin/ggstart.sh
Step 5:
now i created dir and uploaded file using :
hadoop fs -mkdir /input
hadoop fs -copyFromLocal $HADOOP_HOME/README.txt /input/WORD_COUNT_ME.
txt
Step 6:
Running this command gives me error:
hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/*-mapreduce-examples-*.
jar wordcount /input /output
Getting following error:
15/02/22 12:49:13 INFO Configuration.deprecation: mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir
15/02/22 12:49:13 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_091ebfbd-2993-475f-a506-28280dbbf891_0002
15/02/22 12:49:13 INFO mapreduce.JobSubmitter: Cleaning up the staging area /tmp/hadoop-yarn/staging/hduser/.staging/job_091ebfbd-2993-475f-a506-28280dbbf891_0002
java.lang.NullPointerException
at org.gridgain.client.hadoop.GridHadoopClientProtocol.processStatus(GridHadoopClientProtocol.java:329)
at org.gridgain.client.hadoop.GridHadoopClientProtocol.submitJob(GridHadoopClientProtocol.java:115)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:430)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1268)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1265)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1265)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1286)
at org.apache.hadoop.examples.WordCount.main(WordCount.java:84)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:74)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
and gridgain console error as:
sLdrId=a0b8610bb41-091ebfbd-2993-475f-a506-28280dbbf891, userVer=0, loc=true, sampleClsName=java.lang.String, pendingUndeploy=false, undeployed=false, usage=0]], taskClsName=o.g.g.kernal.processors.hadoop.proto.GridHadoopProtocolSubmitJobTask, sesId=e129610bb41-091ebfbd-2993-475f-a506-28280dbbf891, startTime=1424589553332, endTime=9223372036854775807, taskNodeId=091ebfbd-2993-475f-a506-28280dbbf891, clsLdr=sun.misc.Launcher$AppClassLoader#1bdcbb2, closed=false, cpSpi=null, failSpi=null, loadSpi=null, usage=1, fullSup=false, subjId=091ebfbd-2993-475f-a506-28280dbbf891], jobId=f129610bb41-091ebfbd-2993-475f-a506-28280dbbf891]]
java.lang.NoClassDefFoundError: org/apache/hadoop/mapreduce/JobContext
at java.lang.Class.getDeclaredConstructors0(Native Method)
at java.lang.Class.privateGetDeclaredConstructors(Class.java:2585)
at java.lang.Class.getConstructor0(Class.java:2885)
at java.lang.Class.getConstructor(Class.java:1723)
at org.gridgain.grid.hadoop.GridHadoopDefaultJobInfo.createJob(GridHadoopDefaultJobInfo.java:107)
at org.gridgain.grid.kernal.processors.hadoop.jobtracker.GridHadoopJobTracker.job(GridHadoopJobTracker.java:959)
at org.gridgain.grid.kernal.processors.hadoop.jobtracker.GridHadoopJobTracker.submit(GridHadoopJobTracker.java:222)
at org.gridgain.grid.kernal.processors.hadoop.GridHadoopProcessor.submit(GridHadoopProcessor.java:188)
at org.gridgain.grid.kernal.processors.hadoop.GridHadoopImpl.submit(GridHadoopImpl.java:73)
at org.gridgain.grid.kernal.processors.hadoop.proto.GridHadoopProtocolSubmitJobTask.run(GridHadoopProtocolSubmitJobTask.java:54)
at org.gridgain.grid.kernal.processors.hadoop.proto.GridHadoopProtocolSubmitJobTask.run(GridHadoopProtocolSubmitJobTask.java:37)
at org.gridgain.grid.kernal.processors.hadoop.proto.GridHadoopProtocolTaskAdapter$Job.execute(GridHadoopProtocolTaskAdapter.java:95)
at org.gridgain.grid.kernal.processors.job.GridJobWorker$2.call(GridJobWorker.java:484)
at org.gridgain.grid.util.GridUtils.wrapThreadLoader(GridUtils.java:6136)
at org.gridgain.grid.kernal.processors.job.GridJobWorker.execute0(GridJobWorker.java:478)
at org.gridgain.grid.kernal.processors.job.GridJobWorker.body(GridJobWorker.java:429)
at org.gridgain.grid.util.worker.GridWorker.run(GridWorker.java:151)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException: Failed to load class: org.apache.hadoop.mapreduce.JobContext
at org.gridgain.grid.kernal.processors.hadoop.GridHadoopClassLoader.loadClass(GridHadoopClassLoader.java:125)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
... 20 more
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.mapreduce.JobContext
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at org.gridgain.grid.kernal.processors.hadoop.GridHadoopClassLoader.loadClassExplicitly(GridHadoopClassLoader.java:196)
at org.gridgain.grid.kernal.processors.hadoop.GridHadoopClassLoader.loadClass(GridHadoopClassLoader.java:106)
... 21 more
^[[B
Help here....
Edited Here :
raj#ubuntu:~$ hadoop classpath
/usr/local/hadoop-2.2.0/etc/hadoop:/usr/local/hadoop-2.2.0/share/hadoop/common/lib/*:/usr/local/hadoop-2.2.0/share/hadoop/common/*:/usr/local/hadoop-2.2.0/share/hadoop/hdfs:/usr/local/hadoop-2.2.0/share/hadoop/hdfs/lib/*:/usr/local/hadoop-2.2.0/share/hadoop/hdfs/*:/usr/local/hadoop-2.2.0/share/hadoop/yarn/lib/*:/usr/local/hadoop-2.2.0/share/hadoop/yarn/*:/usr/local/hadoop-2.2.0/share/hadoop/mapreduce/lib/*:/usr/local/hadoop-2.2.0/share/hadoop/mapreduce/*:/usr/local/hadoop-2.2.0/contrib/capacity-scheduler/*.jar
raj#ubuntu:~$ jps
3529 GridCommandLineStartup
3646 Jps
raj#ubuntu:~$ echo $GRIDGAIN_HOME
/usr/local/gridgain
raj#ubuntu:~$ echo $HADOOP_HOME
/usr/local/hadoop-2.2.0
raj#ubuntu:~$ hadoop version
Hadoop 2.2.0
Subversion https://svn.apache.org/repos/asf/hadoop/common -r 1529768
Compiled by hortonmu on 2013-10-07T06:28Z
Compiled with protoc 2.5.0
From source with checksum 79e53ce7994d1628b240f09af91e1af4
This command was run using /usr/local/hadoop-2.2.0/share/hadoop/common/hadoop-common-2.2.0.jar
raj#ubuntu:~$ cd /usr/local/hadoop-2.2.0/share/hadoop/mapreduce
raj#ubuntu:/usr/local/hadoop-2.2.0/share/hadoop/mapreduce$ ls
hadoop-mapreduce-client-app-2.2.0.jar hadoop-mapreduce-client-hs-2.2.0.jar hadoop-mapreduce-client-jobclient-2.2.0-tests.jar lib
hadoop-mapreduce-client-common-2.2.0.jar hadoop-mapreduce-client-hs-plugins-2.2.0.jar hadoop-mapreduce-client-shuffle-2.2.0.jar lib-examples
hadoop-mapreduce-client-core-2.2.0.jar hadoop-mapreduce-client-jobclient-2.2.0.jar hadoop-mapreduce-examples-2.2.0.jar sources
raj#ubuntu:/usr/local/hadoop-2.2.0/share/hadoop/mapreduce$
I configured exactly the versions you mentioned (gridgain-hadoop-os-6.6.2.zip + hadoop-2.2.0) -- the "wordcount" sample works fine.
[UPD after question's author log analysis:]
Raju, thanks for the detailed logs.
The cause of the problem are incorrectly set env variables
export HADOOP_MAPRED_HOME=${HADOOP_HOME}
export HADOOP_COMMON_HOME=${HADOOP_HOME}
export HADOOP_HDFS_HOME=${HADOOP_HOME}
You explicitly set all these variables to ${HADOOP_HOME} value, what is wrong. This causes GG to compose incorrect hadoop classpath, as seen from the below GG Node log:
+++ HADOOP_PREFIX=/usr/local/hadoop-2.2.0
+++ [[ -z /usr/local/hadoop-2.2.0 ]]
+++ '[' -z /usr/local/hadoop-2.2.0 ']'
+++ HADOOP_COMMON_HOME=/usr/local/hadoop-2.2.0
+++ HADOOP_HDFS_HOME=/usr/local/hadoop-2.2.0
+++ HADOOP_MAPRED_HOME=/usr/local/hadoop-2.2.0
+++ GRIDGAIN_HADOOP_CLASSPATH='/usr/local/hadoop-2.2.0/lib/*:/usr/local/hadoop-2.2.0/lib/*:/usr/local/hadoop-2.2.0/lib/*'
So, to fix the issue please don't set unnecessary env variables. JAVA_HOME and HADOOP_HOME is quite enough, nothing else is needed.
many thnaks to Ivan, thanks for your help and support, the solution you gave was good to get me out of the problem.
The issue was not to set other hadoop related environment variables. this is enough to set.
JAVA_HOME , HADOOP_HOME and GRIDGAIN_HOME

I am getting org.apache.hadoop.ipc.RemoteException: java.io.IOException in Mapreduce job?

I have installed hadoop using following points.
Installed hadoop using tar file
created hdfs user and group and assigned them to hadoop folder
then created hdfs directories for namenode and datanode in /opt folder
Configuration files are also set.
But when i tried to run hadoop jar hadoop-examples-1.0.0.jar pi 4 100 I am getting this error.
2014-11-05 12:12:02,978 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block null bad datanode[0] nodes == null
2014-11-05 12:12:02,978 WARN org.apache.hadoop.hdfs.DFSClient: Could not get block locations. Source file "/tmp/hadoop-hdfs/mapred/system/jobtracker.info" - Aborting...
2014-11-05 12:12:02,979 WARN org.apache.hadoop.mapred.JobTracker: Writing to file hdfs://hostname:9000/tmp/hadoop-hdfs/mapred/system/jobtracker.info failed!
2014-11-05 12:12:02,979 WARN org.apache.hadoop.mapred.JobTracker: FileSystem is not ready yet!
2014-11-05 12:12:02,982 WARN org.apache.hadoop.mapred.JobTracker: Failed to initialize recovery manager.
org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /tmp/hadoop-hdfs/mapred/system/jobtracker.info could only be replicated to 0 nodes, instead of 1
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1556)
at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:696)
at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:563)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1384)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1382)
at org.apache.hadoop.ipc.Client.call(Client.java:1066)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
at com.sun.proxy.$Proxy5.addBlock(Unknown Source)
at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at com.sun.proxy.$Proxy5.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:3507)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:3370)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2700(DFSClient.java:2586)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2826)
One thing here I want to mention is that I have set hdfs paths to /mnt direcotry but hdfs still pointing to /tmp/hadoop-hdfs
Please give some suggestions.
Check all the paths of the action node you are trying to run,this usually occurs due to wrong input/output paths provided.
Also if you are rerunning a workflow job,make sure all the events and properties provided in coordinator.xml(or job.xml) must be present in job.properties,because rerunning a workflow job doesn't refer to job.xml as against in the case of normal coordinator job run(scheduled running).

Can access hadoop fs through shell, but not through java main

I would like to see the following code make a directory in my "/tmp" via hdfs.
I can, for instance, run
hadoop fs -mkdir hdfs://localhost:9000/tmp/newdir
and succeed.
jps lists that namenode, datanode are running.
Hadoop version 0.20.1+169.89.
public static void main(String[] args) throws IOException {
Configuration conf = new Configuration();
conf.set("fs.default.name", "hdfs://localhost:9000");
FileSystem fs = FileSystem.get(conf);
fs.mkdirs(new Path("hdfs://localhost:9000/tmp/alex"));
}
I get the following error:
Exception in thread "main" java.io.IOException: Failed on local exception: java.io.EOFException; Host Details : local host is: "<my-machine-name>/192.168.2.6"; destination host is: "localhost":9000;
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764)
at org.apache.hadoop.ipc.Client.call(Client.java:1351)
at org.apache.hadoop.ipc.Client.call(Client.java:1300)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
at com.sun.proxy.$Proxy9.mkdirs(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy9.mkdirs(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.mkdirs(ClientNamenodeProtocolTranslatorPB.java:467)
at org.apache.hadoop.hdfs.DFSClient.primitiveMkdir(DFSClient.java:2394)
at org.apache.hadoop.hdfs.DFSClient.mkdirs(DFSClient.java:2365)
at org.apache.hadoop.hdfs.DistributedFileSystem$16.doCall(DistributedFileSystem.java:817)
at org.apache.hadoop.hdfs.DistributedFileSystem$16.doCall(DistributedFileSystem.java:813)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.mkdirsInternal(DistributedFileSystem.java:813)
at org.apache.hadoop.hdfs.DistributedFileSystem.mkdirs(DistributedFileSystem.java:806)
at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:1933)
at com.twitter.amplify.core.dao.AccessHdfs.main(AccessHdfs.java:39)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:120)
Caused by: java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:375)
at org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:995)
at org.apache.hadoop.ipc.Client$Connection.run(Client.java:891)
You have a version mismatch - your questions notes a NameNode running version 0.20.1+169.89 (which i think is from Cloudera distro CDH2 - http://archive.cloudera.com/cdh/2/), and in IntelliJ you are using Apache hadoop version 2.2.0.
Update your IntelliJ classpath to use the jars compatible with your cluster version - namely:
hadoop-0.20.1+169.89-core.jar
I had same version of Hadoop(hadoop-2.2.0) installed on my master and slave nodes but still I was getting same exception. To get rid of it I have followed below steps:
1. from $HADOP_HOME execute sbin/stop-all.sh, to stop the cluster
2. delete the data directory from all problematic node. If you dont know where data directory is then open core-site.xml, find the value corresponding to hadoop.tmp.dir, go to that directory, then cd dfs there you will find a directory named data, delete that data directory from all problematic datanodes
3. format the master node
4. from $HADOP_HOME execute sbin/start-all.sh, to start the cluster

Categories