Hadoop class not found exception - java

I'm working on simple program on hadoop, I followed this tutorial steps:
http://www.bogotobogo.com/Hadoop/BigData_hadoop_Creating_Java_Wordcount_Project_with_Eclipse_MapReduce2.php
even though I tried it on two different machines, it keeps showing this exception:
Exception in thread "main" java.lang.ClassNotFoundException: test.java
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:270)
at org.apache.hadoop.util.RunJar.run(RunJar.java:214)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
package pa2;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.FileOutputFormat;
import org.apache.hadoop.mapred.FileInputFormat;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
public class test extends Configured implements Tool{
public int run(String[] args) throws Exception
{ if (args.length<2)
{
System.out.println("plz give proper arguments");
return -1;
}
//creating a JobConf object and assigning a job name for identification purposes
JobConf conf = new JobConf(test.class);
FileInputFormat.setInputPaths(conf, new Path(args[0]));
FileOutputFormat.setOutputPath(conf, new Path(args[1]));
conf.setMapperClass(mapper.class);
conf.setMapOutputKeyClass(Text.class);
conf.setMapOutputValueClass(IntWritable.class);
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(IntWritable.class);
JobClient.runJob(conf);
return 0;
}
public static void main(String[] args) throws Exception
{
// this main function will call run method defined above.
int exitcode = ToolRunner.run(new test(),args);
System.exit(exitcode);
}
}
can you please tell me what is wrong here?
update:
mapper class:
package pa2;
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.Mapper;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reporter;
public class mapper extends MapReduceBase
implements Mapper<LongWritable,Text, Text, IntWritable>
{
public void map(LongWritable Key, Text value,
OutputCollector<Text, IntWritable> output, Reporter r)
throws IOException {
int i=0;
String [] array = new String [50];
String name;
String year;
String s=value.toString();
for (String word:s.split(",")){
word = s.substring(0, s.indexOf(",")+1);
year= word.substring(0, s.indexOf(",")+1);
name=word.substring(s.indexOf(",")+1);
int theyear= Integer.parseInt(year);
if(theyear<2000){
array[i] =name;
output.collect(new Text(word), new IntWritable(1));
i++;}
}
}
}
I haven't written the reducer class. I exported the project as jar file,and I made a text file called movies to be the input of the program. then wrote this in the terminal:
[cloudera#quickstart ~]$ cd workspace
[cloudera#quickstart workspace]$ ls
pa2 pa2.jar training
[cloudera#quickstart workspace]$ hadoop jar pa2.jar test movies.txt output.txt
Exception in thread "main" java.lang.ClassNotFoundException: test
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:270)
at org.apache.hadoop.util.RunJar.run(RunJar.java:214)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)

No guarantees this is the solution to the immediate problem, but
package pa2;
This is appended to the class name. In other words, the fully-qualified class name is pa2.test.
So, try
hadoop jar ~/workspace/pa2.jar pa2.test input output
If you used the default package like that tutorial showed, you wouldn't need to specify the package on the command line.

The actual name of your map class should be provided here
conf.setMapperClass(mapper.class);
If you are trying to use the default map class, then write "Mapper.class".

Related

Error in running MapReduce job in eclipse from windows

I Have pseudo-distributed Hadoop setup on a linux machine. I have done a few examples in eclipse which is also installed in that linux machine and they worked fine. Now I want to perform MapReduce Jobs through eclipse (installed in windows machine) and access the HDFS which is already present in my linux machine. I have written the following Driver code:
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.FileInputFormat;
import org.apache.hadoop.mapred.FileOutputFormat;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
public class Windows_Driver extends Configured implements Tool{
public static void main(String[] args) throws Exception {
int exitcode = ToolRunner.run(new Windows_Driver(), args);
System.exit(exitcode);
}
#Override
public int run(String[] arg0) throws Exception {
JobConf conf = new JobConf(Windows_Driver.class);
conf.set("fs.defaultFS", "hdfs://<Ip address>:50070");
FileInputFormat.setInputPaths(conf, new Path("sample"));
FileOutputFormat.setOutputPath(conf, new Path("sam"));
conf.setMapperClass(Win_Mapper.class);
conf.setMapOutputKeyClass(Text.class);
conf.setMapOutputValueClass(Text.class);
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(Text.class);
JobClient.runJob(conf);
return 0;
}
}
And the Mapper code :
import java.io.IOException;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.Mapper;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reporter;
public class Win_Mapper extends MapReduceBase implements Mapper<LongWritable, Text,Text, Text> {
#Override
public void map(LongWritable key, Text value, OutputCollector<Text, Text> o, Reporter arg3) throws IOException {
...
o.collect(... , ...);
}
}
When I run this, I get the following error:
SEVERE: PriviledgedActionException as:miracle cause:java.io.IOException: Failed to set permissions of path: \tmp\hadoop-miracle\mapred\staging\miracle1262421749\.staging to 0700
Exception in thread "main" java.io.IOException: Failed to set permissions of path: \tmp\hadoop-miracle\mapred\staging\miracle1262421749\.staging to 0700
at org.apache.hadoop.fs.FileUtil.checkReturnValue(FileUtil.java:691)
at org.apache.hadoop.fs.FileUtil.setPermission(FileUtil.java:664)
at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:514)
at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:349)
at org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:193)
at org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:126)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:942)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:936)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:936)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:910)
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1353)
at Windows_Driver.run(Windows_Driver.java:41)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at Windows_Driver.main(Windows_Driver.java:16)
How can I rectify the error? And how can I access my HDFS remotely from windows?
submit() method on the Job creates an internal Jobsubmitter instance and that would do all the data validations including input path,output path availability,file/directory creation permissions and other things. During different phases of MR, it will create temporary directories under which it will put the temp. files. The temp directory is taken from core-site.xml with property hadoop.tmp.dir. The issue with your system is it seems the temp. directory is /tmp/ and the user running the MR job doesn't have permission to change its rwx status to 700. Provide appropriate permissions and rerun the job.

Hadoop MapReduce error while parsing CSV

I'm getting following error in map function while parsing a CSV file.
14/07/15 19:40:05 INFO mapreduce.Job: Task Id : attempt_1403602091361_0018_m_000001_2, Status : FAILED
Error: java.lang.ArrayIndexOutOfBoundsException: 4
at com.test.mapreduce.RetailCustomerAnalysis_2$MapClass.map(RetailCustomerAnalysis_2.java:55)
at com.test.mapreduce.RetailCustomerAnalysis_2$MapClass.map(RetailCustomerAnalysis_2.java:1)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:429)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157)
The map function is given below
package com.test.mapreduce;
import java.io.IOException;
import java.text.ParseException;
import java.text.SimpleDateFormat;
import java.util.Date;
import java.util.Iterator;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.FileInputFormat;
import org.apache.hadoop.mapred.FileOutputFormat;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapred.KeyValueTextInputFormat;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.Mapper;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reducer;
import org.apache.hadoop.mapred.Reporter;
import org.apache.hadoop.mapred.TextOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
public class RetailCustomerAnalysis_2 extends Configured implements Tool {
public static class MapClass extends MapReduceBase
implements Mapper<Text, Text, Text, Text> {
private Text key1 = new Text();
private Text value1 = new Text();
public void map(Text key, Text value,
OutputCollector<Text, Text> output,
Reporter reporter) throws IOException {
String line = value.toString();
String[] split = line.split(",");
key1.set(split[0].trim());
/* line no 55 where error is occuring */
value1.set(split[4].trim());
output.collect(key1, value1);
}
}
public int run(String[] args) throws Exception {
Configuration conf = getConf();
JobConf job = new JobConf(conf, RetailCustomerAnalysis_2.class);
Path in = new Path(args[0]);
Path out = new Path(args[1]);
FileInputFormat.setInputPaths(job, in);
FileOutputFormat.setOutputPath(job, out);
job.setJobName("RetailCustomerAnalysis_2");
job.setMapperClass(MapClass.class);
job.setReducerClass(Reduce.class);
job.setInputFormat(KeyValueTextInputFormat.class);
job.setOutputFormat(TextOutputFormat.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
// job.set("key.value.separator.in.input.line", ",");
JobClient.runJob(job);
return 0;
}
public static void main(String[] args) throws Exception {
int res = ToolRunner.run(new Configuration(), new RetailCustomerAnalysis_2(), args);
System.exit(res);
}
}
Sample Input used to run this code is as follows
PRAVEEN,4002012,Kids,02GK,7/4/2010
PRAVEEN,400201,TOY,020383,14/04/2014
I'm running the application using the following command and Inputs.
yarn jar RetailCustomerAnalysis_2.jar com.test.mapreduce.RetailCustomerAnalysis_2 /hduser/input5 /hduser/output5
Add check to see if input line has all fields defined or ignore processing it map function. Code would be something like this in new API.
if(split.length!=noOfFields){
return;
}
Additionally, if you are further interested you can set up hadoop countner to know how many rows were in total which dint contain all required fields in csv file.
if(split.length!=noOfFields){
context.getCounter(MTJOB.DISCARDED_ROWS_DUE_MISSING_FIELDS)
.increment(1);
return;
}
split[] has elements split[0], split[1], split[2] and split[3] only
In case of KeyValueTextInputFormat the first String before the separator is considered as the key and rest of the line is considered as value. A byte separator(coma OR whitespace etc.) is used to separate between key and value from every record.
In your code the first string before first coma is taken as the key and rest of line is taken as value. And as you split the value, there are only 4 strings in it. Therefore the String array can go from split[0] to split[3] only instead of split[4].
Any suggestions or corrections are welcomed.

Exception in Thread Main : ClassNotFoundException

I'm running hadoop in a school cluster. I get exception in thread main with class not found exception.
Exception in thread "main" java.lang.ClassNotFoundException: movielens.MovieLensDriver
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:264)
at org.apache.hadoop.util.RunJar.main(RunJar.java:153)
But I'm aware that I have to use full package name in the command and i have done the same. Following is the command I used
hadoop jar movielens.jar movielens.MovieLensDriver input output
Following is the code for my driver class.
package movielens;
import java.io.IOException;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.FileInputFormat;
import org.apache.hadoop.mapred.FileOutputFormat;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapred.KeyValueTextInputFormat;
import org.apache.hadoop.mapred.jobcontrol.Job;
import org.apache.hadoop.mapred.jobcontrol.JobControl;
public class MovieLensDriver {
public static class JobRunner implements Runnable {
private JobControl control;
public JobRunner(JobControl _control) {
this.control = _control;
}
public void run() {
this.control.run();
}
}
public static void handleRun(JobControl control)
throws InterruptedException {
JobRunner runner = new JobRunner(control);
Thread t = new Thread(runner);
t.start();
while (!control.allFinished()) {
System.out.println("Still running...");
Thread.sleep(5000);
}
}
public static void main(String args[]) throws IOException,
InterruptedException {
System.out.println("Program started");
if (args.length != 2) {
System.err
.println("Usage: MovieLensDriver <input path> <output path>");
System.exit(-1);
}
JobConf conf1 = new JobConf(movielens.MovieLensDriver.class);
conf1.setMapperClass(MoviePairsMapper.class);
conf1.setReducerClass(MoviePairsReducer.class);
conf1.setJarByClass(MovieLensDriver.class);
FileInputFormat.addInputPath(conf1, new Path(args[0]));
FileOutputFormat.setOutputPath(conf1, new Path("temp"));
conf1.setMapOutputKeyClass(Text.class);
conf1.setMapOutputValueClass(Text.class);
conf1.setOutputKeyClass(Text.class);
conf1.setOutputValueClass(IntWritable.class);
JobConf conf2 = new JobConf(MovieLensDriver.class);
conf2.setMapperClass(MoviePairsCoOccurMapper.class);
conf2.setReducerClass(MoviePairsCoOccurReducer.class);
conf2.setJarByClass(MovieLensDriver.class);
FileInputFormat.addInputPath(conf2, new Path("temp"));
FileOutputFormat.setOutputPath(conf2, new Path(args[1]));
conf2.setInputFormat(KeyValueTextInputFormat.class);
conf2.setMapOutputKeyClass(Text.class);
conf2.setMapOutputValueClass(IntWritable.class);
conf2.setOutputKeyClass(Text.class);
conf2.setOutputValueClass(IntWritable.class);
Job job1 = new Job(conf1);
Job job2 = new Job(conf2);
JobControl jobControl = new JobControl("jobControl");
jobControl.addJob(job1);
jobControl.addJob(job2);
job2.addDependingJob(job1);
handleRun(jobControl);
System.out.println("Program complete.");
System.exit(0);
}
}
It has been a frustrating search for the bug for last 3 hours and any help is appreciated.
You can try the 'libjar' option, which will take the jar and place it in the distributed cache. This makes the jar available to all of the job's task attempts. Notice that the libjars argument takes a comma separated list, not a colon or semi-colon separated list.
export LIBJARS=/path/jars1,/path/jars2,/path/movielens.jar
hadoop jar movielens.jar movielens.MovieLensDriver -libjars ${LIBJARS} input output

Exception in thread "main" java.lang.ClassNotFoundException: WordCount

I am currently wanting to create a single instance node of Hadoop. So I am following this tutorial. I ran the following command in terminal:
hduser#ubuntu:/usr/local/hadoop$ bin/hadoop jar WordCount.jar geekyomega.WordCount /user/hduser/gutenberg /user/hduser/gutenberg-output
Things were going great until I ran into this error:
Exception in thread "main" java.lang.ClassNotFoundException: WordCount
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:270)
at org.apache.hadoop.util.RunJar.main(RunJar.java:153)
I am attempting to run this example using the following code, where I got from here. Here is my version of the code:
package geekyomega;
import java.io.IOException;
import java.util.*;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.*;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
public class WordCount {
public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while (tokenizer.hasMoreTokens()) {
word.set(tokenizer.nextToken());
context.write(word, one);
}
}
}
public static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> {
public void reduce(Text key, Iterable<IntWritable> values, Context context)
throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
context.write(key, new IntWritable(sum));
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = new Job(conf, "WordCount");
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
job.setMapperClass(Map.class);
job.setReducerClass(Reduce.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.waitForCompletion(true);
}
}
I thought my issue was job instantiation. So I did as follows, I changed:
Job job = new Job(conf, "wordcount");
To the following, capitalized version:
Job job = new Job(conf, "WordCount");
But that hasn't helped. Anyone know what could help me here?
Thanks,
Geeky
PS - I don't want to run the tutorial version of wordcount. What I did was created the project in eclipse, added the hadoop jar to it, and exported it as a jar file.
your classname is geekyomega.WordCount
you are not appending the package name . in the command line , just after jar file name, give the fully qualified name of your job class.
Along with adding the package add the following line as well in the job config part of your program :
job.setJarByClass(WordCount.class);

java.lang.RuntimeException: java.lang.ClassNotFoundException when trying to run Jar job on Elastic MapReduce

What should I change to fix following error:
I'm trying to start a job on Elastic Mapreduce, and it crashes every time with message:
java.lang.RuntimeException: java.lang.ClassNotFoundException: iataho.mapreduce.NewMaxTemperatureMapper
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:831)
at org.apache.hadoop.mapreduce.JobContext.getMapperClass(JobContext.java:157)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:577)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:310)
at org.apache.hadoop.mapred.Child.main(Child.java:170)
Caused by: java.lang.ClassNotFoundException: iataho.mapreduce.NewMaxTemperatureMapper
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:247)
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:778)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:829)
... 4 more
The class NewMaxTemperatureMapper is declared and I've checked, it is included in the jar, which is than located at s3.
Here's the code for all app classes:
NewMaxTemperature.java:
package iataho.mapreduce;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class NewMaxTemperature {
/**
* #param args
*/
public static void main(String[] args) {
try {
if (args.length != 2) {
System.err.println("Usage: NewMaxTemperature <input path> <output path>");
System.exit(123);
}
Job job = new Job();
job.setJarByClass(NewMaxTemperature.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.setMapperClass(NewMaxTemperatureMapper.class);
job.setCombinerClass(NewMaxTemperatureReducer.class);
job.setReducerClass(NewMaxTemperatureReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
System.exit(job.waitForCompletion(true) ? 0 : 1);
} catch (Exception e) {
e.printStackTrace();
}
}
}
NewMaxTemperatureReducer.java:
package iataho.mapreduce;
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
public class NewMaxTemperatureReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
int maxValue = Integer.MIN_VALUE;
for (IntWritable value : values) {
maxValue = Math.max(maxValue, value.get());
}
context.write(key, new IntWritable(maxValue));
}
}
NewMaxTemperatureMapper.java:
package iataho.mapreduce;
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
public class NewMaxTemperatureMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
private static final int MISSING = 9999;
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
String line = value.toString();
String year = line.substring(15, 19);
int airTemperature;
if (line.charAt(87) == '+') { // parseInt doesn't like leading plus
// signs
airTemperature = Integer.parseInt(line.substring(88, 92));
} else {
airTemperature = Integer.parseInt(line.substring(87, 92));
}
String quality = line.substring(92, 93);
if (airTemperature != MISSING && quality.matches("[01459]")) {
context.write(new Text(year), new IntWritable(airTemperature));
}
}
}
I've made the jar file with which I'm getting this crash avaliable here: download jar
Check the jars you are including when executing the application. Add more info to the question about this please.
===
Ok. The problem was that I've used eclipse option "Package libraries into generated JAR". I changed it to "Extract generated libraries into generated JAR", and now it works fine

Categories