Mapr distcp: FileNotFoundException - java

I am trying to copy files between two mapr clusters
hadoop distcp maprfs:///path/to/source/hive/table maprfs:///path/to/destination/hive/table
The source table has some data, but the destination table is empty. I have just used the create table command in the destination.
but for the above distcp command, i get
java.io.FileNotFoundException: Requested file maprfs:/var/mapr/cluster/yarn/rm/staging/some_user/.staging/job_1528383939830_608130 does not exist.
at com.mapr.fs.MapRFileSystem.getMapRFileStatus(MapRFileSystem.java:1262)
at com.mapr.fs.MapRFileSystem.getFileStatus(MapRFileSystem.java:927)
at com.mapr.fs.MFS.getFileStatus(MFS.java:151)
at org.apache.hadoop.fs.AbstractFileSystem.resolvePath(AbstractFileSystem.java:467)
at org.apache.hadoop.mapred.YARNRunner.createApplicationSubmissionContext(YARNRunner.java:370)
at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:306)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:243)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1595)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
at org.apache.hadoop.tools.DistCp.createAndSubmitJob(DistCp.java:183)
at org.apache.hadoop.tools.DistCp.execute(DistCp.java:153)
at org.apache.hadoop.tools.DistCp.run(DistCp.java:126)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.tools.DistCp.main(DistCp.java:430)
what is this staging file and how do i get it? Is my syntax correct?

Related

KiteSDK MapReduce: EOF exception during parquet file load

I have hadoop map-reduce job which uses KitSDK DatasetKeyInputFormat. It is configured to read parquet file.
Eveery time I run the job I get following exception:
Error: java.io.EOFException
at java.io.DataInputStream.readFully(DataInputStream.java:197)
at java.io.DataInputStream.readFully(DataInputStream.java:169)
at parquet.hadoop.ParquetInputSplit.readArray(ParquetInputSplit.java:304)
at parquet.hadoop.ParquetInputSplit.readFields(ParquetInputSplit.java:263)
at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:71)
at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:42)
at org.apache.hadoop.mapred.MapTask.getSplitDetails(MapTask.java:372)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:754)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
The same file can be successfully read by map-reduce jobs created by hive. I.e. I can query it successfully.
To isolate possible issue I have created map-reduce job based on the KiteSDK example for mapreduce. But I still get the same exception.
Note: AVRO and CSV formats work well.

Hadoop job cannot see the file, that hadoop fs -cat prints just fine

I have some code that accesses a file on hdfs in a function that is called from main() before launching the job.
This is a single node cluster (pseudo distributed mode) on a mac.
I am not sure if this is an issue due to me calling the function before I launch a job, but I am getting the following errors:
java.io.FileNotFoundException: hdfs:/localhost/usr/local/tmp/hadoop/hadoop-${user}/data/input/file.csv (No such file or directory)
at java.io.FileInputStream.open(Native Method)
at java.io.FileInputStream.<init>(FileInputStream.java:146)
at org.apache.mahout.common.iterator.FileLineIterator.getFileInputStream(FileLineIterator.java:116)
at org.apache.mahout.common.iterator.FileLineIterable.<init>(FileLineIterable.java:53)
at org.apache.mahout.common.iterator.FileLineIterable.<init>(FileLineIterable.java:48)
at profile2sparse.MRDriver.createDictionaryChunks(MRDriver.java:98)
at profile2sparse.MRDriver.main(MRDriver.java:177)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
I can see the contents of this file with hadoop fs -cat
Thanks and Regards,
Atul.
Are you able to cat the file with below command?
hadoop fs -cat hdfs:/localhost/usr/local/tmp/hadoop/hadoop-${USER}/data/input/file.csv
If yes, it looks like ${user} is not resolved in your program. Make sure to uppercase ${user} like ${USER}.

I am getting org.apache.hadoop.ipc.RemoteException: java.io.IOException in Mapreduce job?

I have installed hadoop using following points.
Installed hadoop using tar file
created hdfs user and group and assigned them to hadoop folder
then created hdfs directories for namenode and datanode in /opt folder
Configuration files are also set.
But when i tried to run hadoop jar hadoop-examples-1.0.0.jar pi 4 100 I am getting this error.
2014-11-05 12:12:02,978 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block null bad datanode[0] nodes == null
2014-11-05 12:12:02,978 WARN org.apache.hadoop.hdfs.DFSClient: Could not get block locations. Source file "/tmp/hadoop-hdfs/mapred/system/jobtracker.info" - Aborting...
2014-11-05 12:12:02,979 WARN org.apache.hadoop.mapred.JobTracker: Writing to file hdfs://hostname:9000/tmp/hadoop-hdfs/mapred/system/jobtracker.info failed!
2014-11-05 12:12:02,979 WARN org.apache.hadoop.mapred.JobTracker: FileSystem is not ready yet!
2014-11-05 12:12:02,982 WARN org.apache.hadoop.mapred.JobTracker: Failed to initialize recovery manager.
org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /tmp/hadoop-hdfs/mapred/system/jobtracker.info could only be replicated to 0 nodes, instead of 1
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1556)
at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:696)
at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:563)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1384)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1382)
at org.apache.hadoop.ipc.Client.call(Client.java:1066)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
at com.sun.proxy.$Proxy5.addBlock(Unknown Source)
at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at com.sun.proxy.$Proxy5.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:3507)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:3370)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2700(DFSClient.java:2586)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2826)
One thing here I want to mention is that I have set hdfs paths to /mnt direcotry but hdfs still pointing to /tmp/hadoop-hdfs
Please give some suggestions.
Check all the paths of the action node you are trying to run,this usually occurs due to wrong input/output paths provided.
Also if you are rerunning a workflow job,make sure all the events and properties provided in coordinator.xml(or job.xml) must be present in job.properties,because rerunning a workflow job doesn't refer to job.xml as against in the case of normal coordinator job run(scheduled running).

Can access hadoop fs through shell, but not through java main

I would like to see the following code make a directory in my "/tmp" via hdfs.
I can, for instance, run
hadoop fs -mkdir hdfs://localhost:9000/tmp/newdir
and succeed.
jps lists that namenode, datanode are running.
Hadoop version 0.20.1+169.89.
public static void main(String[] args) throws IOException {
Configuration conf = new Configuration();
conf.set("fs.default.name", "hdfs://localhost:9000");
FileSystem fs = FileSystem.get(conf);
fs.mkdirs(new Path("hdfs://localhost:9000/tmp/alex"));
}
I get the following error:
Exception in thread "main" java.io.IOException: Failed on local exception: java.io.EOFException; Host Details : local host is: "<my-machine-name>/192.168.2.6"; destination host is: "localhost":9000;
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764)
at org.apache.hadoop.ipc.Client.call(Client.java:1351)
at org.apache.hadoop.ipc.Client.call(Client.java:1300)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
at com.sun.proxy.$Proxy9.mkdirs(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy9.mkdirs(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.mkdirs(ClientNamenodeProtocolTranslatorPB.java:467)
at org.apache.hadoop.hdfs.DFSClient.primitiveMkdir(DFSClient.java:2394)
at org.apache.hadoop.hdfs.DFSClient.mkdirs(DFSClient.java:2365)
at org.apache.hadoop.hdfs.DistributedFileSystem$16.doCall(DistributedFileSystem.java:817)
at org.apache.hadoop.hdfs.DistributedFileSystem$16.doCall(DistributedFileSystem.java:813)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.mkdirsInternal(DistributedFileSystem.java:813)
at org.apache.hadoop.hdfs.DistributedFileSystem.mkdirs(DistributedFileSystem.java:806)
at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:1933)
at com.twitter.amplify.core.dao.AccessHdfs.main(AccessHdfs.java:39)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:120)
Caused by: java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:375)
at org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:995)
at org.apache.hadoop.ipc.Client$Connection.run(Client.java:891)
You have a version mismatch - your questions notes a NameNode running version 0.20.1+169.89 (which i think is from Cloudera distro CDH2 - http://archive.cloudera.com/cdh/2/), and in IntelliJ you are using Apache hadoop version 2.2.0.
Update your IntelliJ classpath to use the jars compatible with your cluster version - namely:
hadoop-0.20.1+169.89-core.jar
I had same version of Hadoop(hadoop-2.2.0) installed on my master and slave nodes but still I was getting same exception. To get rid of it I have followed below steps:
1. from $HADOP_HOME execute sbin/stop-all.sh, to stop the cluster
2. delete the data directory from all problematic node. If you dont know where data directory is then open core-site.xml, find the value corresponding to hadoop.tmp.dir, go to that directory, then cd dfs there you will find a directory named data, delete that data directory from all problematic datanodes
3. format the master node
4. from $HADOP_HOME execute sbin/start-all.sh, to start the cluster

Running wordcount.jar on hadoop in windows using command line

I am trying to run a simple wordcount program on hadoop, but facing an error as below.
Exception in thread "main" java.io.IOException: Error opening job jar: /user/asiapac/bmohanty6/wordcount/wordcount.jar
at org.apache.hadoop.util.RunJar.main(RunJar.java:90)
Caused by: java.io.FileNotFoundException: \user\asiapac\bmohanty6\wordcount\wordcount.jar (The system cannot find the path specified)
at java.util.zip.ZipFile.open(Native Method)
at java.util.zip.ZipFile.<init>(Unknown Source)
at java.util.zip.ZipFile.<init>(Unknown Source)
at java.util.jar.JarFile.<init>(Unknown Source)
at java.util.jar.JarFile.<init>(Unknown Source)
at org.apache.hadoop.util.RunJar.main(RunJar.java:88)
I am using below command.
$ bin/hadoop jar /user/asiapac/bmohanty6/wordcount/wordcount.jar WordCount /user/asiapac/bmohanty6/wo
rdcount/input /user/asiapac/bmohanty6/wordcount/output
I am using Cygwin, hadoop-0.20.2 with pseudo node set up. I have also uploaded the wordcount.jar to my DFS. See below my DFS screenshot
I am able to run the same wordcount program in eclipse successfully. I have created the wordcount.jar file via eclipse as per this tutorial. I searched a lot in web but could not understand how to solve this. Please help me.
You need to add / before user:
bin/hadoop jar /user/asiapac/bmohanty6/wordcount/wordcount.jar WordCount /user/asiapac/bmohanty6/wordcount/input /user/asiapac/bmohanty6/wordcount/output
This makes them fully-qualified paths. If you omit / before user, Hadoop will search from the current directory.

Categories