Hadoop MapReduce Output for Maximum

Hadoop MapReduce Output for Maximum - java

I am currently using Eclipse and Hadoop to create a mapper and reducer to find Maximum Total Cost of an Airline Data Set.
So the Total Cost is Decimal Value and Airline Carrier is Text.
The dataset I used can be found in the following weblink:
https://www.gov.uk/government/uploads/system/uploads/attachment_data/file/236265/dft-flights-data-2011.csv
When I export the jar file in Hadoop,
I am getting the following message: ls: "output" : No such file or directory.
Can anyone help me correct the code please?
My code is below.
Mapper:
package org.myorg;
import java.io.IOException;
import org.apache.hadoop.io.DoubleWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
public class MaxTotalCostMapper extends Mapper<LongWritable, Text, Text, DoubleWritable>
{
private final static DoubleWritable totalcostWritable = new DoubleWritable(0);
private Text AirCarrier = new Text();
#Override
public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException
{
String[] line = value.toString().split(",");
AirCarrier.set(line[8]);
double totalcost = Double.parseDouble(line[2].trim());
totalcostWritable.set(totalcost);
context.write(AirCarrier, totalcostWritable);
}
}
Reducer:
package org.myorg;
import java.io.IOException;
import java.util.ArrayList;
import org.apache.hadoop.io.DoubleWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
public class MaxTotalCostReducer extends Reducer<Text, DoubleWritable, Text, DoubleWritable>
{
ArrayList<Double> totalcostList = new ArrayList<Double>();
#Override
public void reduce(Text key, Iterable<DoubleWritable> values, Context context)
throws IOException, InterruptedException
{
double maxValue=0.0;
for (DoubleWritable value : values)
{
maxValue = Math.max(maxValue, value.get());
}
context.write(key, new DoubleWritable(maxValue));
}
}
Main:
package org.myorg;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.DoubleWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class MaxTotalCost
{
public static void main(String[] args) throws Exception
{
Configuration conf = new Configuration();
if (args.length != 2)
{
System.err.println("Usage: MaxTotalCost<input path><output path>");
System.exit(-1);
}
Job job;
job=Job.getInstance(conf, "Max Total Cost");
job.setJarByClass(MaxTotalCost.class);
FileInputFormat.addInputPath(job, new Path(args[1]));
FileOutputFormat.setOutputPath(job, new Path(args[2]));
job.setMapperClass(MaxTotalCostMapper.class);
job.setReducerClass(MaxTotalCostReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(DoubleWritable.class);
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}

ls: "output" : No such file or directory
You have no HDFS user directory. Your code isn't making it into the Mapper or Reducer. That error typically arises at the Job
FileOutputFormat.setOutputPath(job, new Path(args[2]));
Run an hdfs dfs -ls, see if you get any errors. If so, make a directory under /user that matches your current user.
Otherwise, change your output directory to something like /tmp/max

Related

"How to fix 'uses or overrides a deprecated API and Recompile with -Xlint:deprecation for details' in Java"

I want to take advantage of using linux to practice examples of Hadoop-MapReduce.
I have written a code for my project and I am getting some warning message when I compile. I could not run it. Having tried many possible ways like ignoring warning and etc, I am still unable to run it. Below you will find the code.
import java.io.IOException;
import java.util.StringTokenizer;
import java.util.Iterator;
import org.apache.hadoop.conf.Configuration;
importorg.apache.hadoop.filecache.DistributedCache;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.Mapper.Context;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;
import org.apache.hadoop.util.StringUtils;
public class Average{
public static class Map extends Mapper<Object, Text, Text, IntWritable> {
public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
String line = value.toString();
StringTokenizer script = new StringTokenizer(line, "\n"); while (script.hasMoreTokens()) {
StringTokenizer scriptLine = new StringTokenizer(script.nextToken());
Text Name = new Text(scriptLine.nextToken());
int Score = Integer.parseInt(scriptLine.nextToken());
context.write(Name, new IntWritable(Score));
}
}
}
public static class Reduce extends Reducer<Text,IntWritable,Text,IntWritable> {
public void reduce(Text key, Iterable<IntWritable> value, Context context) throws IOException, InterruptedException{
int numerator = 0;
int denominator = 0;
int avg = 0;
for (IntWritable score : value) {
numerator += score.get();
denominator++;
}
avg = numerator/denominator;
context.write(key, new IntWritable(avg));
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
Path dst_path = new Path (otherArgs[1]);
FileSystem hdfs = dst_path.getFileSystem(conf);
if (hdfs.exists(dst_path)){
hdfs.delete(dst_path, true);
};
Job job = new Job(conf, "Average");
job.setJarByClass(Average.class);
job.setMapperClass(Map.class);
job.setCombinerClass(Reduce.class);
job.setReducerClass(Reduce.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}

Running Implementing Custom Output Format in Hadoop Tutorial

I want to run the code described in this tutorial in order to customize the output format in Hadoop. More precisely, the tutorial shows two java files:
WordCount: is the word count java application (similar to the WordCount v1.0 of the MapReduce Tutorial in this link)
XMLOutputFormat: java class that extends FileOutputFormat and implements the method to customize the output.
Well, what I did was to take the WordCount v1.0 of the MapReduce Tutorial (instead of using the WordCount showed in the tutorial) and add in the driver job.setOutputFormatClass(XMLOutputFormat.class); and execute the hadoop app in this way:
/usr/local/hadoop/bin/hadoop com.sun.tools.javac.Main WordCount.java && jar cf wc.jar WordCount*.class && /usr/local/hadoop/bin/hadoop jar wc.jar WordCount /home/luis/Desktop/mytest/input/ ./output_folder
note: /home/luis/Desktop/mytest/input/ and ./output_folder are the input and output folders, respectively.
Unfortunately, the terminal shows me the following error:
WordCount.java:57: error: cannot find symbol
job.setOutputFormatClass(XMLOutputFormat.class);
^
symbol: class XMLOutputFormat
location: class WordCount
1 error
Why? WordCount.java and XMLOutputFormat.java are stored in the same folder.
The following is my code.
WordCount code:
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class WordCount {
public static class TokenizerMapper
extends Mapper<Object, Text, Text, IntWritable>{
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(Object key, Text value, Context context
) throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
context.write(word, one);
}
}
}
public static class IntSumReducer
extends Reducer<Text,IntWritable,Text,IntWritable> {
private IntWritable result = new IntWritable();
public void reduce(Text key, Iterable<IntWritable> values,
Context context
) throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
result.set(sum);
context.write(key, result);
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "word count");
job.setJarByClass(WordCount.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
job.setOutputFormatClass(XMLOutputFormat.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
XMLOutputFormat code:
import java.io.DataOutputStream;
import java.io.IOException;
import org.apache.hadoop.fs.FSDataOutputStream;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.RecordWriter;
import org.apache.hadoop.mapreduce.TaskAttemptContext;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class XMLOutputFormat extends FileOutputFormat<Text, IntWritable> {
protected static class XMLRecordWriter extends RecordWriter<Text, IntWritable> {
private DataOutputStream out;
public XMLRecordWriter(DataOutputStream out) throws IOException{
this.out = out;
out.writeBytes("<Output>\n");
}
private void writeStyle(String xml_tag,String tag_value) throws IOException {
out.writeBytes("<"+xml_tag+">"+tag_value+"</"+xml_tag+">\n");
}
public synchronized void write(Text key, IntWritable value) throws IOException {
out.writeBytes("<record>\n");
this.writeStyle("key", key.toString());
this.writeStyle("value", value.toString());
out.writeBytes("</record>\n");
}
public synchronized void close(TaskAttemptContext job) throws IOException {
try {
out.writeBytes("</Output>\n");
} finally {
out.close();
}
}
}
public RecordWriter<Text, IntWritable> getRecordWriter(TaskAttemptContext job) throws IOException {
String file_extension = ".xml";
Path file = getDefaultWorkFile(job, file_extension);
FileSystem fs = file.getFileSystem(job.getConfiguration());
FSDataOutputStream fileOut = fs.create(file, false);
return new XMLRecordWriter(fileOut);
}
}

You need to either add package testpackage; at the beginning of your WordCount class
or
import testpackage.XMLOutputFormat; in your WordCount class.
Because they are in the same directory, it doesn't imply they are in the same package.

We will need to add the XMLOutputFormat.jar file to the HADOOP_CLASSPATH first for the driver code to find it. And pass it in -libjars option to be added to classpath of the map and reduce jvms.
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:/abc/xyz/XMLOutputFormat.jar
yarn jar wordcount.jar com.sample.test.Wordcount
-libjars /path/to/XMLOutputFormat.jar
/lab/mr/input /lab/output/output

Running MR program with separate mapper, reducer and driver classes

maxtempmapper.java class:
package com.hadoop.gskCodeBase.maxTemp;
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
public class MaxTempMapper extends Mapper<LongWritable,Text,Text,IntWritable> {
private static final int MISSING=9999;
#Override
public void map(LongWritable kay,Text value,Context context) throws IOException,InterruptedException {
String line = value.toString();
String year = line.substring(15,19);
int airTemperature;
if(line.charAt(87)== '+'){
airTemperature=Integer.parseInt(line.substring(88, 92));
}else{
airTemperature=Integer.parseInt(line.substring(87, 92));
}
String quality=line.substring(92,93);
if(airTemperature !=MISSING && quality.matches("[01459]")){
context.write(new Text(year), new IntWritable(airTemperature));
}
}
}
maxtempreducer.java class:
package com.hadoop.gskCodeBase.maxTemp;
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
public class MaxTempReducer extends Reducer<Text,IntWritable,Text,IntWritable> {
#Override
public void reduce(Text key,Iterable<IntWritable> values,Context context) throws IOException,InterruptedException {
int maxValue = Integer.MIN_VALUE;
for(IntWritable value : values){
maxValue=Math.max(maxValue, value.get());
}
context.write(key, new IntWritable(maxValue));
}
}
maxtempdriver.java class:
package com.hadoop.gskCodeBase.maxTemp;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
public class MaxTempDriver extends Configured implements Tool{
public int run(String[] args) throws Exception{
if(args.length !=2){
System.err.println("UsageTemperatureDriver <input path> <outputpath>");
System.exit(-1);
}
Job job = Job.getInstance();
job.setJarByClass(MaxTempDriver.class);
job.setJobName("Max Temperature");
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job,new Path(args[1]));
job.setMapperClass(MaxTempMapper.class);
job.setReducerClass(MaxTempReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
System.exit(job.waitForCompletion(true) ? 0:1);
boolean success = job.waitForCompletion(true);
return success ? 0 : 1;
}
public static void main(String[] args) throws Exception {
MaxTempDriver driver = new MaxTempDriver();
int exitCode = ToolRunner.run(driver, args);
System.exit(exitCode);
}
}
I have to execute the above three classes on single node hadoop cluster on windows using command prompt
can someone please help me in how to execute these three classes on command prompt(windows)?

Archive all the java files into a single .jar file. Then just run it as you normally do. In Windows, it's easier to run Hadoop via Cygwin terminal. You can execute the job by the following command:
hadoop jar <path to .jar> <path to input folder in hdfs> <path to output folder in hdfs>
Eg:
hadoop jar wordcount.jar /input /output
-UPDATE-
You should assign you driver class to the job.setJarByClass(). In this case, it would be your MaxTempDriver.class
In eclipse, you can create a jar file by right clicking on your source folder > Export > JAR file. From there you can follow the steps. You can set your Main Class during the process as well.
Hope this answers your question.

MapReduce: How to get mapper to process multiple lines?

Goal:
I want to be able to specify the number of mappers used on an input file
Equivalently, I want to specify the number of line of a file each mapper will take
Simple example:
For an input file of 10 lines (of unequal length; example below), I want there to be 2 mappers -- each mapper will thus process 5 lines.
This is
an arbitrary example file
of 10 lines.
Each line does
not have to be
of
the same
length or contain
the same
number of words
This is what I have:
(I have it so that each mapper produces one "<map,1>" key-value pair ... so that it will then be summed in the reducer)
package org.myorg;
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.NLineInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.InputFormat;
public class Test {
// prduce one "<map,1>" pair per mapper
public static class Map extends Mapper<Object, Text, Text, IntWritable>{
private final static IntWritable one = new IntWritable(1);
public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
context.write(new Text("map"), one);
}
}
// reduce by taking a sum
public static class Red extends Reducer<Text,IntWritable,Text,IntWritable> {
private IntWritable result = new IntWritable();
public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
result.set(sum);
context.write(key, result);
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job1 = Job.getInstance(conf, "pass01");
job1.setJarByClass(Test.class);
job1.setMapperClass(Map.class);
job1.setCombinerClass(Red.class);
job1.setReducerClass(Red.class);
job1.setOutputKeyClass(Text.class);
job1.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job1, new Path(args[0]));
FileOutputFormat.setOutputPath(job1, new Path(args[1]));
// // Attempt#1
// conf.setInt("mapreduce.input.lineinputformat.linespermap", 5);
// job1.setInputFormatClass(NLineInputFormat.class);
// // Attempt#2
// NLineInputFormat.setNumLinesPerSplit(job1, 5);
// job1.setInputFormatClass(NLineInputFormat.class);
// // Attempt#3
// conf.setInt(NLineInputFormat.LINES_PER_MAP, 5);
// job1.setInputFormatClass(NLineInputFormat.class);
// // Attempt#4
// conf.setInt("mapreduce.input.fileinputformat.split.minsize", 234);
// conf.setInt("mapreduce.input.fileinputformat.split.maxsize", 234);
System.exit(job1.waitForCompletion(true) ? 0 : 1);
}
}
The above code, using the above example data, will produce
map 10
I want the output to be
map 2
where the first mapper will do something will the first 5 lines, and the second mapper will do something with the second 5 lines.

You could use NLineInputFormat.
With NLineInputFormat functionality, you can specify exactly how many lines should go to a mapper.
E.g. If your file has 500 lines, and you set number of lines per mapper to 10, you have 50 mappers
(instead of one - assuming the file is smaller than a HDFS block size).
EDIT:
Here is an example for using NLineInputFormat:
Mapper Class:
import java.io.IOException;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
public class MapperNLine extends Mapper<LongWritable, Text, LongWritable, Text> {
#Override
public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
context.write(key, value);
}
}
Driver class:
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.NLineInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.LazyOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
public class Driver extends Configured implements Tool {
#Override
public int run(String[] args) throws Exception {
if (args.length != 2) {
System.out
.printf("Two parameters are required for DriverNLineInputFormat- <input dir> <output dir>\n");
return -1;
}
Job job = new Job(getConf());
job.setJobName("NLineInputFormat example");
job.setJarByClass(Driver.class);
job.setInputFormatClass(NLineInputFormat.class);
NLineInputFormat.addInputPath(job, new Path(args[0]));
job.getConfiguration().setInt("mapreduce.input.lineinputformat.linespermap", 5);
LazyOutputFormat.setOutputFormatClass(job, TextOutputFormat.class);
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.setMapperClass(MapperNLine.class);
job.setNumReduceTasks(0);
boolean success = job.waitForCompletion(true);
return success ? 0 : 1;
}
public static void main(String[] args) throws Exception {
int exitCode = ToolRunner.run(new Configuration(), new Driver(), args);
System.exit(exitCode);
}
}
With the input you provided the output from the above sample Mapper would be written to two files as 2 Mappers get initialized :
part-m-00001
0 This is
8 an arbitrary example file
34 of 10 lines.
47 Each line does
62 not have to be
part-m-00002
77 of
80 the same
89 length or contain
107 the same
116 number of words

getCredentials method error when trying to run "Word Count" program in eclipse by importing all JAR files

Error : Exception in thread "main" java.lang.NoSuchMethodError: org.apache.hadoop.security.UserGroupInformation.getCredentials()Lorg/apache/hadoop/security/Credentials;
at org.apache.hadoop.mapreduce.Job.(Job.java:135)
at org.apache.hadoop.mapreduce.Job.getInstance(Job.java:176)
at org.apache.hadoop.mapreduce.Job.getInstance(Job.java:195)
at WordCount.main(WordCount.java:20)
Hadoop version 2.2.0
WordCount.java
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
public class WordCount {
public static void main(String[] args) throws Exception {
if (args.length != 2) {
System.out.println("usage: [input] [output]");
System.exit(-1);
}
Job job = Job.getInstance(new Configuration(), "word count");
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
job.setMapperClass(WordMapper.class);
job.setReducerClass(SumReducer.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
FileInputFormat.setInputPaths(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.setJarByClass(WordCount.class);
job.setJobName("WordCount");
job.submit();
}
}
WordMapper.java
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
public class WordMapper extends Mapper<Object, Text, Text, IntWritable> {
private Text word = new Text();
private final static IntWritable one = new IntWritable(1);
#Override
public void map(Object key, Text value,
Context contex) throws IOException, InterruptedException {
// Break line into words for processing
StringTokenizer wordList = new StringTokenizer(value.toString());
while (wordList.hasMoreTokens()) {
word.set(wordList.nextToken());
contex.write(word, one);
}
}
}
SumReducer.java
import java.io.IOException;
import java.util.Iterator;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
public class SumReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
private IntWritable totalWordCount = new IntWritable();
#Override
public void reduce(Text key, Iterable<IntWritable> values, Context context)
throws IOException, InterruptedException {
int wordCount = 0;
Iterator<IntWritable> it=values.iterator();
while (it.hasNext()) {
wordCount += it.next().get();
}
totalWordCount.set(wordCount);
context.write(key, totalWordCount);
}
}
Please let me know what can be done ?Latest mapreduce API is used for the program. All the jars that came with hadoop 2.2.0 are also imported into eclipse.
Thanks :)

Are you using an Eclipse plugin for Hadoop? If not that is the problem. With out the plugin, Eclipse if just running the WordCount class and Hadoop can't find the necessary jars. Bundle all the jars including WordCount and run it in Cluster.
If you want to run it from Eclipse you need Eclipse plugin. If you don't have one, you can build the plugin by following this instructions

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Hadoop MapReduce Output for Maximum - java

Related

"How to fix 'uses or overrides a deprecated API and Recompile with -Xlint:deprecation for details' in Java"

Running Implementing Custom Output Format in Hadoop Tutorial

Running MR program with separate mapper, reducer and driver classes

MapReduce: How to get mapper to process multiple lines?

getCredentials method error when trying to run "Word Count" program in eclipse by importing all JAR files

Categories

Resources