Java Hadoop-lzo Found interface but class was expected LzoTextInputFormat

Java Hadoop-lzo Found interface but class was expected LzoTextInputFormat - java

I'm trying to use the Hadoop-LZO package (built using the steps here). Seems like everything worked successfully as I was able to convert my lzo files to indexed files via (this returns big_file.lzo.index as expected):
hadoop jar /path/to/your/hadoop-lzo.jar com.hadoop.compression.lzo.LzoIndexer big_file.lzo
Then I go to use these files in my mapreduce jobs (with big_file.lzo.index as the input):
import com.hadoop.mapreduce.LzoTextInputFormat;
....
Job jobConverter = new Job(conf, "conversion");
jobConverter.setJar("JsonConverter.jar");
jobConverter.setInputFormatClass(LzoTextInputFormat.class);
....
and I get the following error:
Exception in thread "main" java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected
at com.hadoop.mapreduce.LzoTextInputFormat.listStatus(LzoTextInputFormat.java:62)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:389)
at com.hadoop.mapreduce.LzoTextInputFormat.getSplits(LzoTextInputFormat.java:101)
at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:304)
at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:321)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:199)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1595)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1308)
at org.wwbp.JsonConverter.run(JsonConverter.java:116)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.wwbp.JsonConverter.main(JsonConverter.java:74)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
I've seen other questions answering this and they say to re-build against Hadoop v2. So I redownloaded everything fro Github and ran
% hadoop version
Hadoop 2.7.0-mapr-1607
Compiled by root on 2016-07-18T07:56Z
Compiled with protoc 2.5.0
This command was run using /opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/hadoop-common-2.7.0-mapr-1607.jar
% ant clean compile-native tar -Dhadoopversion=27
....
tar:
[tar] Building tar: ../jars/hadoop-lzo/build/hadoop-lzo-0.4.15.tar.gz
BUILD SUCCESSFUL
Total time: 15 seconds
When building my paths are as follows:
C_INCLUDE_PATH=../jars/lzo-2.09/include
LIBRARY_PATH=../jars/lzo-2.09/lib
JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64
I'm really not sure what I am doing wrong. How do I get ant to see Hadoop v2?
Edit 1: Possibly of note: when I run both my mapreduce job (calling LzoTextInputFormat.class) and the lzo converter (on big_file.lzo) my classpath is as follows
CLASS_PATH=/opt/mapr/hadoop/hadoop-2.7.0/etc/hadoop:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/*:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/*:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/hdfs:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/hdfs/lib/*:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/hdfs/*:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/yarn/lib/*:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/yarn/*:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/mapreduce/lib/*:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/mapreduce/*:/contrib/capacity-scheduler/*.jar:/opt/mapr/lib/kvstore*.jar:/opt/mapr/lib/libprotodefs*.jar:/opt/mapr/lib/baseutils*.jar:/opt/mapr/lib/maprutil*.jar:/opt/mapr/lib/json-20080701.jar:/opt/mapr/lib/flexjson-2.1.jar:/jars/hadoop-lzo-0.4.15/hadoop-lzo-0.4.15.jar
Edit 2: If I index the lzo file as follows (i.e. try to index via a mapreduce job with DistributedLzoIndexer instead of LzoIndexer) I get a similar error:
> hadoop jar /path/to/your/hadoop-lzo.jar com.hadoop.compression.lzo.DistributedLzoIndexer big_file.lzo
16/12/09 13:06:24 INFO mapreduce.Job: map 0% reduce 0%
16/12/09 13:06:29 INFO mapreduce.Job: Task Id : attempt_1472572940387_0370_m_000000_0, Status : FAILED
Error: Found interface org.apache.hadoop.mapreduce.TaskAttemptContext, but class was expected

No idea why the above isn't working so I started from scratch using this repo:
https://github.com/twitter/hadoop-lzo
instead of the one linked above and used maven to build instead of ant (using all of the same settings as above).

Related

Hadoop 1.2.1: Put jars in hdfs on classpath

I have a hadoop job which requires several 3rd party jars. I have put them on the classpath with conf/hadoop-env.sh
export HADOOP_CLASSPATH=hdfs://name.node.private.ip:9000/home/ec2-user/hadoop-gremlin-libs/
When I run $ bin/hadoop classpath this path is included, as you can see here. However, when I go to run a job, it throws an error in initialization:
Error: java.lang.ClassNotFoundException: com.google.common.collect.Lists
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at org.apache.giraph.conf.AllOptions.<clinit>(AllOptions.java:37)
at org.apache.giraph.conf.ClassConfOption.<init>(ClassConfOption.java:47)
at org.apache.giraph.conf.ClassConfOption.create(ClassConfOption.java:60)
at org.apache.giraph.conf.GiraphConstants.<clinit>(GiraphConstants.java:62)
at org.apache.giraph.conf.GiraphClasses.readFromConf(GiraphClasses.java:152)
at org.apache.giraph.conf.GiraphClasses.<init (GiraphClasses.java:142)
at org.apache.giraph.conf.ImmutableClassesGiraphConfiguration.<init>(ImmutableClassesGiraphConfiguration.java:93)
at org.apache.giraph.bsp.BspOutputFormat.getOutputCommitter(BspOutputFormat.java:56)
at org.apache.hadoop.mapred.Task.initialize(Task.java:515)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:347)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
This particular class should be packaged in guava, which is included on the classpath:
[ec2-user]$ bin/hadoop dfs -ls /home/ec2-user/hadoop-gremlin-libs | grep guava
-rw-r--r-- 3 ec2-user supergroup 0 2017-04-20 17:57 /home/ec2-user/hadoop-gremlin-libs/guava-18.0.jar
I am submitting the job from gremlin as follows:
graph = GraphFactory.open('conf/hadoop.properties')
result = graph.compute().program(MyVertexProgram.build().create()).submit().get()
I have also tried putting the jars on the local filesystem and receive the same error. Does anyone know how to solve this issue?

I can't tell exactly what kind of job are you doing, but looking at those classes it appears to be a Mapreduce2 maptask it is trying to setup when you hit that exception.
I think you are updating the wrong classpath value probably. You are updating the Hadoop classpath not the mapreduce classpath.
More than likely you need to update the hadoop clusters yarn/mapreduce2 application classpath values in the cluster manager application, or their site xml files the cluster is using. You should have a mapred-site.xml file which has property named mapreduce.application.classpath that has its own classpath to point to its own jars it needs to execute its jobs, add your path to the classpath in the value of the mapreduce.application.classpath value instead.
The second goes for yarn, update the yarn.application.classpath property if yarn needs any other jars, as the yarn classpath points to yarn jars that help yarn run. You can update this easily in a cluster manager application if you have it, or edit the yarn-site.xml manually to add this classpath.
The only other option is if your client software program has its own dedicated mapred-site.xml file it reads to get the mapreduce.application.classpath from for you. If so it is possible you can just modify the mapreduce.application.classpath on the client site if your software supports it. Some client programs may have their own classpaths, or read the hadoop clusters site xml files to connect to the cluster.
I am pretty sure from what it shows in the exception you need this jar somehow in the mapreduce.application.path not the hadoop classpath.

Can't create a Hadoop sequence file on a local file system

I found this example of how to write to a local file system, but it throws this exception:
Exception in thread "main" java.io.IOException: (null) entry in command string: null chmod 0644 C:\temp\test.seq
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:770)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:866)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:849)
at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:733)
at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.<init>(RawLocalFileSystem.java:225)
at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.<init>(RawLocalFileSystem.java:209)
at org.apache.hadoop.fs.RawLocalFileSystem.createOutputStreamWithMode(RawLocalFileSystem.java:307)
at org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:296)
at org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:328)
at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.<init>(ChecksumFileSystem.java:398)
at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:461)
at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:440)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:911)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:892)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:789)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:778)
at org.apache.hadoop.io.SequenceFile$Writer.<init>(SequenceFile.java:1168)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
at java.lang.reflect.Constructor.newInstance(Unknown Source)
Running this on a Windows 10 box. I even tried using the msys git bash shell thinking maybe that would help the JVM simulate a chmod operation. Didn't change anything. Any suggestions on how to do this on Windows?

I too faced this error and it was resolved after following the steps. (Note : I am using Spark 2.0.2 and Hadoop 2.7)
Verify whether you are getting "java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.". You check it by running "spark-shell" command.
I got the above mentioned error. It occurred because I didn't add "HADOOP_HOME" in environment var. After adding the "HADOOP_HOME", in my case same as "SPARK_HOME", the issue was resolved.

Running a Hadoop program using only jars on Windows requires a few steps beyond just referencing the jars.
Credit to Professor Lu at University of Helsinki for posting a Hadoop on Windows guide for his students.
Here is a rundown of steps I had to take using Windows 10 and Hadoop 2.7.3:
Download and extract Hadoop binaries to somewhere like C:\hadoop-2.7.3.
Download patch files from https://github.com/srccodes/hadoop-common-2.2.0-bin/archive/master.zip and extract them to your %HADOOP_HOME%\bin directory.
Set a HADOOP_HOME environment variable. For example, C:\hadoop-2.7.3.
Download the Hadoop source code, copy hadoop-common-project\hadoop-common\src\main\java\org\apache\hadoop\io\nativeio\NativeIO.java to your project, and modify line 609 from
return access0(path, desiredAccess.accessRight());
to
return true;

One of the solutions is as follows.
In the Project Structure (Intelij), under SDK's ensure there is no other version of Hadoop referenced. In my case - I was running Spark earlier and it was referring Hadoop JAR's and this was causing access issues. Once I removed them and ran the MR job it ran fine.

Hive shell throws Filenotfound exception while executing queries, inspite of adding jar files using "ADD JAR"

1) I have added serde jar file using "ADD JAR /home/hduser/softwares/hive/hive-serdes-1.0-SNAPSHOT.jar;"
2) Create table
3) The table is creates successfully
4) But when I execute any select query it throws file not found exception
hive> select count(*) from tab_tweets;
Query ID = hduser_20150604145353_51b4def4-11fb-4638-acac-77301c1c1806
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
java.io.FileNotFoundException: File does not exist: hdfs://node1:9000/home/hduser/softwares/hive/hive-serdes-1.0-SNAPSHOT.jar
at org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1122)
at org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1114)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1114)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:288)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:224)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:99)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestampsAndCacheVisibilities(ClientDistributedCacheManager.java:57)
at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:269)
at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:390)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:483)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1296)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1293)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1293)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:557)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:548)
at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:428)
at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:137)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:88)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1638)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1397)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1183)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1039)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:207)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:159)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:370)
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:754)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:615)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:483)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Job Submission failed with exception 'java.io.FileNotFoundException(File does not exist: hdfs://node1:9000/home/hduser/softwares/hive/hive-serdes-1.0-SNAPSHOT.jar)'
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask

METHOD 1: Copy hive-serdes-1.0-SNAPSHOT.jar file from local filesystem to HDFS.
hadoop fs -mkdir /home/hduser/softwares/hive/
hadoop fs -put /home/hduser/softwares/hive/hive-serdes-1.0-SNAPSHOT.jar /home/hduser/softwares/hive/
Note: Use hdfs dfs instead of hadoop fs, if you are using latest hadoop versions.
METHOD 2: Change the value for hive.aux.jars.path in hive-site.xml as:
<property>
<name>hive.aux.jars.path</name>
<value>file:///home/hduser/softwares/hive/hive-serdes-1.0-SNAPSHOT.jar</value>
</property>
METHOD 3: Add hive-serdes-1.0-SNAPSHOT.jar in hadoop classpath. i.e., add this line in hadoop-env.sh:
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:/home/hduser/softwares/hive/hive-serdes-1.0-SNAPSHOT.jar
NOTE: I have mentioned the paths considering you have installed hive in /home/hduser/softwares/hive. If you have hive installed
elsewhere, please change /home/hduser/softwares/hive to point to
your hive installation folder.

Check jar exists at /home/hduser/softwares/hive/hive-serdes-1.0-SNAPSHOT.jar

Note: No need to copy the hive-serdes-1.0-SNAPSHOT.jar in to hdfs, keep it in local Fs itself.
At Query execution time. Hive will take care of it making available in all nodes as D.C
Refer this link for details : official link
FYI - refer Hive Resources
Once a resource is added to a session, Hive queries can refer to it by its name (in map/reduce/transform clauses)
and the resource is available locally at execution time on the entire Hadoop cluster.
Hive uses Hadoop's Distributed Cache to distribute the added resources to all the machines in the cluster at query execution time
you can add additional Jars in multipe ways:
for current hive session:
hive > add jar /local/fs/path/to/your/file.jar
hive > list jars //-- to check
Adding on the node from which you are running hive in .hiverc like .bashrc
cd $HOME
create a file .hiverc
cat $HOME/.hiverc
add jar /local/fs/path/to/your/file.jar // add this line
Adding jar file to hive-site.xml
hive.aux.jars.path
file:///home/user/path/to/your/hive-serdes-1.0-SNAPSHOT.jar

Run Hadoop MR Job via Intellij IDEA

I have a Map-only Job configured to run in distributive mode. When I run it throw CLI, Job runs successfully. Launch string looks like:
hadoop jar FileHandy.jar com.company.MainRun arg1 arg2
But if I run it via IDE (Intellij IDEA), it fails with error (could not find Mapper class):
14/07/30 01:07:34 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
14/07/30 01:07:34 WARN mapred.JobClient: No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String).
14/07/30 01:07:35 INFO input.FileInputFormat: Total input paths to process : 1
14/07/30 01:07:36 INFO mapred.JobClient: Running job: job_201407300013_0001
14/07/30 01:07:37 INFO mapred.JobClient: map 0% reduce 0%
14/07/30 01:07:55 INFO mapred.JobClient: Task Id : attempt_201407300013_0001_m_000000_0, Status : FAILED
java.lang.RuntimeException: java.lang.ClassNotFoundException: Class com.expedia.eww.FileMapper not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1617)
at org.apache.hadoop.mapreduce.task.JobContextImpl.getMapperClass(JobContextImpl.java:191)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:631)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: java.lang.ClassNotFoundException: Class com.expedia.eww.FileMapper not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1523)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1615)
... 8 more
I've setup IDE and use maven pom.xml with dependencies only (I using jar file generated by Build process by IDEA instead of maven jar, but if using maven jar file - results are same). My Run Configuration for IDE is following:
Main class: org.apache.hadoop.util.RunJar
Programs args: /path/to/jar/FileHandy.jar com.company.FileRun arg1 arg2
Work dir set
Code snippet:
Job job = new Job(conf, "File2Hdfs");
job.setJarByClass(FileRun.class);
job.setMapperClass(FileMapper.class);
job.setInputFormatClass(NLineInputFormat.class);
job.setNumReduceTasks(0);
//FileOutputFormat.setOutputPath(job, new Path("hdfs://localhost/user/cloudera/out111"));
FileOutputFormat.setOutputPath(job, new Path(arg0[1]));
FileInputFormat.addInputPath(job, new Path(fileForMapper));
return job.waitForCompletion(true) ? 0 : 1;
FileRun.class (with main) and FileMapper.class (mapper) are in com.company package.
IDEA launch following when Run project:
/usr/java/jdk1.6.0_32/bin/java -Didea.launcher.port=7547 -Didea.launcher.bin.path=/home/cloudera/Downloads/idea-IC-135.909/bin -Dfile.encoding=UTF-8 -classpath /usr/java/jdk1.6.0_32/jre/lib/rt.jar:/usr/java/jdk1.6.0_32/jre/lib/deploy.jar:/usr/java/jdk1.6.0_32/jre/lib/resources.jar:/usr/java/jdk1.6.0_32/jre/lib/jsse.jar:/usr/java/jdk1.6.0_32/jre/lib/management-agent.jar:/usr/java/jdk1.6.0_32/jre/lib/jce.jar:/usr/java/jdk1.6.0_32/jre/lib/plugin.jar:/usr/java/jdk1.6.0_32/jre/lib/charsets.jar:/usr/java/jdk1.6.0_32/jre/lib/javaws.jar:/usr/java/jdk1.6.0_32/jre/lib/ext/sunpkcs11.jar:/usr/java/jdk1.6.0_32/jre/lib/ext/dnsns.jar:/usr/java/jdk1.6.0_32/jre/lib/ext/localedata.jar:/usr/java/jdk1.6.0_32/jre/lib/ext/sunjce_provider.jar:/home/cloudera/IdeaProjects/MavenFileHandy/target/classes:/home/cloudera/.m2/repository/org/apache/hadoop/hadoop-client/2.0.0-mr1-cdh4.4.0/hadoop-client-2.0.0-mr1-cdh4.4.0.jar:/home/cloudera/.m2/repository/org/apache/hadoop/hadoop-common/2.0.0-cdh4.4.0/hadoop-common-2.0.0-cdh4.4.0.jar:/home/cloudera/.m2/repository/org/apache/hadoop/hadoop-annotations/2.0.0-cdh4.4.0/hadoop-annotations-2.0.0-cdh4.4.0.jar:/usr/java/jdk1.6.0_32/lib/tools.jar:/home/cloudera/.m2/repository/com/google/guava/guava/11.0.2/guava-11.0.2.jar:/home/cloudera/.m2/repository/com/google/code/findbugs/jsr305/1.3.9/jsr305-1.3.9.jar:/home/cloudera/.m2/repository/org/apache/commons/commons-math/2.1/commons-math-2.1.jar:/home/cloudera/.m2/repository/xmlenc/xmlenc/0.52/xmlenc-0.52.jar:/home/cloudera/.m2/repository/commons-codec/commons-codec/1.4/commons-codec-1.4.jar:/home/cloudera/.m2/repository/commons-io/commons-io/2.1/commons-io-2.1.jar:/home/cloudera/.m2/repository/commons-net/commons-net/3.1/commons-net-3.1.jar:/home/cloudera/.m2/repository/commons-el/commons-el/1.0/commons-el-1.0.jar:/home/cloudera/.m2/repository/commons-logging/commons-logging/1.1.1/commons-logging-1.1.1.jar:/home/cloudera/.m2/repository/log4j/log4j/1.2.17/log4j-1.2.17.jar:/home/cloudera/.m2/repository/junit/junit/4.8.2/junit-4.8.2.jar:/home/cloudera/.m2/repository/commons-lang/commons-lang/2.5/commons-lang-2.5.jar:/home/cloudera/.m2/repository/commons-configuration/commons-configuration/1.6/commons-configuration-1.6.jar:/home/cloudera/.m2/repository/commons-collections/commons-collections/3.2.1/commons-collections-3.2.1.jar:/home/cloudera/.m2/repository/commons-digester/commons-digester/1.8/commons-digester-1.8.jar:/home/cloudera/.m2/repository/commons-beanutils/commons-beanutils/1.7.0/commons-beanutils-1.7.0.jar:/home/cloudera/.m2/repository/commons-beanutils/commons-beanutils-core/1.8.0/commons-beanutils-core-1.8.0.jar:/home/cloudera/.m2/repository/org/slf4j/slf4j-api/1.6.1/slf4j-api-1.6.1.jar:/home/cloudera/.m2/repository/org/slf4j/slf4j-log4j12/1.6.1/slf4j-log4j12-1.6.1.jar:/home/cloudera/.m2/repository/org/codehaus/jackson/jackson-core-asl/1.8.8/jackson-core-asl-1.8.8.jar:/home/cloudera/.m2/repository/org/codehaus/jackson/jackson-mapper-asl/1.8.8/jackson-mapper-asl-1.8.8.jar:/home/cloudera/.m2/repository/org/mockito/mockito-all/1.8.5/mockito-all-1.8.5.jar:/home/cloudera/.m2/repository/org/apache/avro/avro/1.7.4/avro-1.7.4.jar:/home/cloudera/.m2/repository/com/thoughtworks/paranamer/paranamer/2.3/paranamer-2.3.jar:/home/cloudera/.m2/repository/org/xerial/snappy/snappy-java/1.0.4.1/snappy-java-1.0.4.1.jar:/home/cloudera/.m2/repository/org/apache/commons/commons-compress/1.4.1/commons-compress-1.4.1.jar:/home/cloudera/.m2/repository/org/tukaani/xz/1.0/xz-1.0.jar:/home/cloudera/.m2/repository/com/google/protobuf/protobuf-java/2.4.0a/protobuf-java-2.4.0a.jar:/home/cloudera/.m2/repository/org/apache/hadoop/hadoop-auth/2.0.0-cdh4.4.0/hadoop-auth-2.0.0-cdh4.4.0.jar:/home/cloudera/.m2/repository/com/jcraft/jsch/0.1.42/jsch-0.1.42.jar:/home/cloudera/.m2/repository/org/apache/zookeeper/zookeeper/3.4.5-cdh4.4.0/zookeeper-3.4.5-cdh4.4.0.jar:/home/cloudera/.m2/repository/jline/jline/0.9.94/jline-0.9.94.jar:/home/cloudera/.m2/repository/org/apache/hadoop/hadoop-hdfs/2.0.0-cdh4.4.0/hadoop-hdfs-2.0.0-cdh4.4.0.jar:/home/cloudera/.m2/repository/com/sun/jersey/jersey-core/1.8/jersey-core-1.8.jar:/home/cloudera/.m2/repository/com/sun/jersey/jersey-server/1.8/jersey-server-1.8.jar:/home/cloudera/.m2/repository/asm/asm/3.1/asm-3.1.jar:/home/cloudera/.m2/repository/commons-cli/commons-cli/1.2/commons-cli-1.2.jar:/home/cloudera/.m2/repository/org/apache/hadoop/hadoop-core/2.0.0-mr1-cdh4.4.0/hadoop-core-2.0.0-mr1-cdh4.4.0.jar:/home/cloudera/.m2/repository/hsqldb/hsqldb/1.8.0.10/hsqldb-1.8.0.10.jar:/home/cloudera/Downloads/idea-IC-135.909/lib/idea_rt.jar com.intellij.rt.execution.application.AppMain org.apache.hadoop.util.RunJar /home/cloudera/IdeaProjects/MavenFileHandy/target/FileHandy.jar com.company.FileRun arg1 arg2
Why scripts throws exception and can't find Mapper Class when runs via IDE, and successfully complete same script via hadoop jar ... command?
Thanks

I've find the reason. TaskTrackers can't run job task (map) because jar file is not in Distributed Cash. To solve the problem it's necessary to add jar file to project classpath. Steps are:
File -> Project Structure -> Libraries, type '+' at the bottom pane and add jar file

Exception when Servlet try to run Hadoop 2.2.0 MapReduce Job

SOLVED (the solution is in the comments)
I'm using Hadoop 2.2.0 (in pseudo-distributed mode) on ubuntu 13.10 and Eclipse Kepler v4.3 to develop my Hadoop program and Dynamic Web Project (without Maven).
My Hadoop jar project, called "WorkTest.jar", works correctly when I run job from command line with: "Hadoop jar WorkTest.jar" and I see correctly the work progress on the terminal.
Hadoop project contains four elements:
DriverJob.java (class that configures and starts the job)
Mapper.java
Combiner.java
Reducer.java
Now I have written a new Dynamic Web Project with a ServletTest.java in which I entered the DriverJob class code, the other class (Mapper.java, Combiner.java, Reducer.java) are placed in the same package as the servlet (main package). The WebContent/lib folder contains all Hadoop jar necessary dependencies.
I have successfully deploy my application on WildFly 8 Server whit Eclipse but when I try to run mapreduce job (the job configuration runs successfully and I managed to delete and write a folder on HDFS), he keeps on failing with the following exception visible from the Hadoop Job log file:
FATAL [IPC Server handler 5 on 46834] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: attempt_1396015900746_0023_m_000002_0 - exited : java.lang.RuntimeException: java.lang.ClassNotFoundException: Class Mapper not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1720)
at org.apache.hadoop.mapreduce.task.JobContextImpl.getMapperClass(JobContextImpl.java:186)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:721)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:339)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157)
Caused by: java.lang.ClassNotFoundException: Class Mapper not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1626)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1718)
... 8 more
and from the WildFly log file:
WARN [org.apache.hadoop.mapreduce.JobSubmitter] Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
WARN [org.apache.hadoop.mapreduce.JobSubmitter] No job jar file set. User classes may not be found. See Job or Job#setJar(String).
But the WEB-INF/classes/ deploy folder on WildFly containing the Mapper.class, Combiner.class and Reducer.class.
I also tried to enter the class code of Mapper, Combiner and Reducer inside the servlet, but does not work with the same error...
What I'm doing wrong?

I believe you need to have your .class files in an archive (jar) that can be distributed to the nodes in the cluster.
WARN [org.apache.hadoop.mapreduce.JobSubmitter] No job jar file set. User classes may not be found. See Job or Job#setJar(String).
This error is the key. Generally you would use job.setJarByClass(DriverJob.class) to tell the mapreduce client which jar file has the Mapper/Reducer classes. You don't have a jar and so that method for distributing the proper classes falls apart.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.