I'm running hadoop in a school cluster. I get exception in thread main with class not found exception.
Exception in thread "main" java.lang.ClassNotFoundException: movielens.MovieLensDriver
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:264)
at org.apache.hadoop.util.RunJar.main(RunJar.java:153)
But I'm aware that I have to use full package name in the command and i have done the same. Following is the command I used
hadoop jar movielens.jar movielens.MovieLensDriver input output
Following is the code for my driver class.
package movielens;
import java.io.IOException;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.FileInputFormat;
import org.apache.hadoop.mapred.FileOutputFormat;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapred.KeyValueTextInputFormat;
import org.apache.hadoop.mapred.jobcontrol.Job;
import org.apache.hadoop.mapred.jobcontrol.JobControl;
public class MovieLensDriver {
public static class JobRunner implements Runnable {
private JobControl control;
public JobRunner(JobControl _control) {
this.control = _control;
}
public void run() {
this.control.run();
}
}
public static void handleRun(JobControl control)
throws InterruptedException {
JobRunner runner = new JobRunner(control);
Thread t = new Thread(runner);
t.start();
while (!control.allFinished()) {
System.out.println("Still running...");
Thread.sleep(5000);
}
}
public static void main(String args[]) throws IOException,
InterruptedException {
System.out.println("Program started");
if (args.length != 2) {
System.err
.println("Usage: MovieLensDriver <input path> <output path>");
System.exit(-1);
}
JobConf conf1 = new JobConf(movielens.MovieLensDriver.class);
conf1.setMapperClass(MoviePairsMapper.class);
conf1.setReducerClass(MoviePairsReducer.class);
conf1.setJarByClass(MovieLensDriver.class);
FileInputFormat.addInputPath(conf1, new Path(args[0]));
FileOutputFormat.setOutputPath(conf1, new Path("temp"));
conf1.setMapOutputKeyClass(Text.class);
conf1.setMapOutputValueClass(Text.class);
conf1.setOutputKeyClass(Text.class);
conf1.setOutputValueClass(IntWritable.class);
JobConf conf2 = new JobConf(MovieLensDriver.class);
conf2.setMapperClass(MoviePairsCoOccurMapper.class);
conf2.setReducerClass(MoviePairsCoOccurReducer.class);
conf2.setJarByClass(MovieLensDriver.class);
FileInputFormat.addInputPath(conf2, new Path("temp"));
FileOutputFormat.setOutputPath(conf2, new Path(args[1]));
conf2.setInputFormat(KeyValueTextInputFormat.class);
conf2.setMapOutputKeyClass(Text.class);
conf2.setMapOutputValueClass(IntWritable.class);
conf2.setOutputKeyClass(Text.class);
conf2.setOutputValueClass(IntWritable.class);
Job job1 = new Job(conf1);
Job job2 = new Job(conf2);
JobControl jobControl = new JobControl("jobControl");
jobControl.addJob(job1);
jobControl.addJob(job2);
job2.addDependingJob(job1);
handleRun(jobControl);
System.out.println("Program complete.");
System.exit(0);
}
}
It has been a frustrating search for the bug for last 3 hours and any help is appreciated.
You can try the 'libjar' option, which will take the jar and place it in the distributed cache. This makes the jar available to all of the job's task attempts. Notice that the libjars argument takes a comma separated list, not a colon or semi-colon separated list.
export LIBJARS=/path/jars1,/path/jars2,/path/movielens.jar
hadoop jar movielens.jar movielens.MovieLensDriver -libjars ${LIBJARS} input output
Related
I'm working on simple program on hadoop, I followed this tutorial steps:
http://www.bogotobogo.com/Hadoop/BigData_hadoop_Creating_Java_Wordcount_Project_with_Eclipse_MapReduce2.php
even though I tried it on two different machines, it keeps showing this exception:
Exception in thread "main" java.lang.ClassNotFoundException: test.java
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:270)
at org.apache.hadoop.util.RunJar.run(RunJar.java:214)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
package pa2;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.FileOutputFormat;
import org.apache.hadoop.mapred.FileInputFormat;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
public class test extends Configured implements Tool{
public int run(String[] args) throws Exception
{ if (args.length<2)
{
System.out.println("plz give proper arguments");
return -1;
}
//creating a JobConf object and assigning a job name for identification purposes
JobConf conf = new JobConf(test.class);
FileInputFormat.setInputPaths(conf, new Path(args[0]));
FileOutputFormat.setOutputPath(conf, new Path(args[1]));
conf.setMapperClass(mapper.class);
conf.setMapOutputKeyClass(Text.class);
conf.setMapOutputValueClass(IntWritable.class);
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(IntWritable.class);
JobClient.runJob(conf);
return 0;
}
public static void main(String[] args) throws Exception
{
// this main function will call run method defined above.
int exitcode = ToolRunner.run(new test(),args);
System.exit(exitcode);
}
}
can you please tell me what is wrong here?
update:
mapper class:
package pa2;
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.Mapper;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reporter;
public class mapper extends MapReduceBase
implements Mapper<LongWritable,Text, Text, IntWritable>
{
public void map(LongWritable Key, Text value,
OutputCollector<Text, IntWritable> output, Reporter r)
throws IOException {
int i=0;
String [] array = new String [50];
String name;
String year;
String s=value.toString();
for (String word:s.split(",")){
word = s.substring(0, s.indexOf(",")+1);
year= word.substring(0, s.indexOf(",")+1);
name=word.substring(s.indexOf(",")+1);
int theyear= Integer.parseInt(year);
if(theyear<2000){
array[i] =name;
output.collect(new Text(word), new IntWritable(1));
i++;}
}
}
}
I haven't written the reducer class. I exported the project as jar file,and I made a text file called movies to be the input of the program. then wrote this in the terminal:
[cloudera#quickstart ~]$ cd workspace
[cloudera#quickstart workspace]$ ls
pa2 pa2.jar training
[cloudera#quickstart workspace]$ hadoop jar pa2.jar test movies.txt output.txt
Exception in thread "main" java.lang.ClassNotFoundException: test
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:270)
at org.apache.hadoop.util.RunJar.run(RunJar.java:214)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
No guarantees this is the solution to the immediate problem, but
package pa2;
This is appended to the class name. In other words, the fully-qualified class name is pa2.test.
So, try
hadoop jar ~/workspace/pa2.jar pa2.test input output
If you used the default package like that tutorial showed, you wouldn't need to specify the package on the command line.
The actual name of your map class should be provided here
conf.setMapperClass(mapper.class);
If you are trying to use the default map class, then write "Mapper.class".
I am trying to use Apache Hadoop for Windows Platform through this tutorial: http://www.codeproject.com/Articles/757934/Apache-Hadoop-for-Windows-Platform?fid=1858035, the eclipse part. Everything is going fine until the last step. When running the program I got:
log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Exception in thread "main" java.lang.NullPointerException
at java.lang.ProcessBuilder.start(Unknown Source)
at org.apache.hadoop.util.Shell.runCommand(Shell.java:445)
at org.apache.hadoop.util.Shell.run(Shell.java:418)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:650)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:739)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:722)
at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:631)
at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:421)
at org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:277)
at org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:125)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:348)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1282)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Unknown Source)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1282)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1303)
at Recipe.main(Recipe.java:82)
The code is:
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;
import com.google.gson.Gson;
public class Recipe {
public static class TokenizerMapper
extends Mapper<Object, Text, Text, IntWritable>{
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
Gson gson = new Gson();
public void map(Object key, Text value, Context context
) throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
context.write(word, one);
}
Roo roo=gson.fromJson(value.toString(),Roo.class);
if(roo.cookTime!=null)
{
word.set(roo.cookTime);
}
else
{
word.set("none");
}
context.write(word, one);
}
}
public static class IntSumReducer
extends Reducer<Text,IntWritable,Text,IntWritable> {
private IntWritable result = new IntWritable();
public void reduce(Text key, Iterable<IntWritable> values,
Context context
) throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
result.set(sum);
context.write(key, result);
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
#SuppressWarnings("deprecation")
Job job = new Job(conf, "Recipe");
job.setJarByClass(Recipe.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
//FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
//FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
FileInputFormat.addInputPath(job, new Path("hdfs://127.0.0.1:9000/in"));
FileOutputFormat.setOutputPath(job, new Path("hdfs://127.0.0.1:9000/output"));
System.exit(job.waitForCompletion(true) ? 0 : 1);
// job.submit();
}
}
class Id
{
public String oid;
}
class Ts
{
public long date ;
}
class Roo
{
public Id _id ;
public String name ;
public String ingredients ;
public String url ;
public String image ;
public Ts ts ;
public String cookTime;
public String source ;
public String recipeYield ;
public String datePublished;
public String prepTime ;
public String description;
}
This happens only when I try to run it through Eclipse. Through CMD it went fine:
javac -classpath C:\hadoop-2.3\share\hadoop\common\hadoop-common-2.3.0.jar;C:\hadoop-2.3\share\hadoop\common\lib\gson-2.2.4.jar;C:\hadoop-2.3\share\hadoop\common\lib\commons-cli-1.2.jar;C:\hadoop-2.3\share\hadoop\mapreduce\hadoop-mapreduce-client-core-2.3.0.jar;Recipe.java
jar -cvf Recipe.jar *.class
hadoop jar c:\Hwork\Recipe.jar Recipe /in /out
Any idea how can I solve this?
I had the same problem and workaround from here http://qnalist.com/questions/4994960/run-spark-unit-test-on-windows-7 fixed it.
Workaround is:
download compiled winutils.exe from
https://codeload.github.com/srccodes/hadoop-common-2.2.0-bin/zip/master
extract this archive directly into d:\winutil, d:\winutil\bin should be created
add System.setProperty("hadoop.home.dir", "d:\\winutil\\"); to your code before instantiating the Job (for example as the 1st line of your main method)
I am currently wanting to create a single instance node of Hadoop. So I am following this tutorial. I ran the following command in terminal:
hduser#ubuntu:/usr/local/hadoop$ bin/hadoop jar WordCount.jar geekyomega.WordCount /user/hduser/gutenberg /user/hduser/gutenberg-output
Things were going great until I ran into this error:
Exception in thread "main" java.lang.ClassNotFoundException: WordCount
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:270)
at org.apache.hadoop.util.RunJar.main(RunJar.java:153)
I am attempting to run this example using the following code, where I got from here. Here is my version of the code:
package geekyomega;
import java.io.IOException;
import java.util.*;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.*;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
public class WordCount {
public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while (tokenizer.hasMoreTokens()) {
word.set(tokenizer.nextToken());
context.write(word, one);
}
}
}
public static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> {
public void reduce(Text key, Iterable<IntWritable> values, Context context)
throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
context.write(key, new IntWritable(sum));
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = new Job(conf, "WordCount");
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
job.setMapperClass(Map.class);
job.setReducerClass(Reduce.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.waitForCompletion(true);
}
}
I thought my issue was job instantiation. So I did as follows, I changed:
Job job = new Job(conf, "wordcount");
To the following, capitalized version:
Job job = new Job(conf, "WordCount");
But that hasn't helped. Anyone know what could help me here?
Thanks,
Geeky
PS - I don't want to run the tutorial version of wordcount. What I did was created the project in eclipse, added the hadoop jar to it, and exported it as a jar file.
your classname is geekyomega.WordCount
you are not appending the package name . in the command line , just after jar file name, give the fully qualified name of your job class.
Along with adding the package add the following line as well in the job config part of your program :
job.setJarByClass(WordCount.class);
I'm running hadoop wordcount program. But it is giving me error like "NoClassDefFoundError"
command for running :
hadoop -jar /home/user/Pradeep/sample.jar hdp_java.WordCount /user/hduser/ana.txt /user/hduser/prout
Exception in thread "main" java.lang.NoClassDefFoundError: WordCount
Caused by: java.lang.ClassNotFoundException: WordCount
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
Could not find the main class: WordCount. Program will exit.
i've created the program in eclipse and then exported as jar file
Eclipse code :
package hdp_java;
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
public class WordCount {
public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while (tokenizer.hasMoreTokens()) {
word.set(tokenizer.nextToken());
context.write(word, one);
}
}
}
public static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> {
public void reduce(Text key, Iterable<IntWritable> values, Context context)
throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
context.write(key, new IntWritable(sum));
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = new Job(conf, "wordcount");
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
job.setMapperClass(Map.class);
job.setReducerClass(Reduce.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.waitForCompletion(true);
}
}
Can anyone tell me where am i wrong?
You need to tell the hadoop job which jar to use like so:
job.setJarByClass(WordCount.class);
Also be sure to add any dependencies to both the HADOOP_CLASSPATH and -libjars upon submitting a job like in the following examples:
Use the following to add all the jar dependencies from (for example) current and lib directories:
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:`echo *.jar`:`echo lib/*.jar | sed 's/ /:/g'`
Bear in mind that when starting a job through hadoop jar you'll need to also pass it the jars of any dependencies through use of -libjars. I like to use:
hadoop jar <jar> <class> -libjars `echo ./lib/*.jar | sed 's/ /,/g'` [args...]
NOTE: The sed commands require a different delimiter character; the HADOOP_CLASSPATH is : separated and the -libjars need to be , separated.
Add this line in your code :
job.setJarByClass(WordCount.class);
If it still doesn't work export this job as a jar and add it to itself as an external jar and see if it works.
What should I change to fix following error:
I'm trying to start a job on Elastic Mapreduce, and it crashes every time with message:
java.lang.RuntimeException: java.lang.ClassNotFoundException: iataho.mapreduce.NewMaxTemperatureMapper
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:831)
at org.apache.hadoop.mapreduce.JobContext.getMapperClass(JobContext.java:157)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:577)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:310)
at org.apache.hadoop.mapred.Child.main(Child.java:170)
Caused by: java.lang.ClassNotFoundException: iataho.mapreduce.NewMaxTemperatureMapper
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:247)
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:778)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:829)
... 4 more
The class NewMaxTemperatureMapper is declared and I've checked, it is included in the jar, which is than located at s3.
Here's the code for all app classes:
NewMaxTemperature.java:
package iataho.mapreduce;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class NewMaxTemperature {
/**
* #param args
*/
public static void main(String[] args) {
try {
if (args.length != 2) {
System.err.println("Usage: NewMaxTemperature <input path> <output path>");
System.exit(123);
}
Job job = new Job();
job.setJarByClass(NewMaxTemperature.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.setMapperClass(NewMaxTemperatureMapper.class);
job.setCombinerClass(NewMaxTemperatureReducer.class);
job.setReducerClass(NewMaxTemperatureReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
System.exit(job.waitForCompletion(true) ? 0 : 1);
} catch (Exception e) {
e.printStackTrace();
}
}
}
NewMaxTemperatureReducer.java:
package iataho.mapreduce;
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
public class NewMaxTemperatureReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
int maxValue = Integer.MIN_VALUE;
for (IntWritable value : values) {
maxValue = Math.max(maxValue, value.get());
}
context.write(key, new IntWritable(maxValue));
}
}
NewMaxTemperatureMapper.java:
package iataho.mapreduce;
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
public class NewMaxTemperatureMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
private static final int MISSING = 9999;
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
String line = value.toString();
String year = line.substring(15, 19);
int airTemperature;
if (line.charAt(87) == '+') { // parseInt doesn't like leading plus
// signs
airTemperature = Integer.parseInt(line.substring(88, 92));
} else {
airTemperature = Integer.parseInt(line.substring(87, 92));
}
String quality = line.substring(92, 93);
if (airTemperature != MISSING && quality.matches("[01459]")) {
context.write(new Text(year), new IntWritable(airTemperature));
}
}
}
I've made the jar file with which I'm getting this crash avaliable here: download jar
Check the jars you are including when executing the application. Add more info to the question about this please.
===
Ok. The problem was that I've used eclipse option "Package libraries into generated JAR". I changed it to "Extract generated libraries into generated JAR", and now it works fine