Does this use Log4j Hadoop in the right way? - java

I keep getting the following error:
OpcodeCount.java:24: error: <identifier> expected
LOG.warn("something :)");
^
OpcodeCount.java:24: error: illegal start of type
Is it not allowed to call Log4j in the following way?
public class OpcodeCount {
// debugging output
private static final Logger LOG = org.apache.log4j.Logger.getLogger(this.getClass());
LOG.warn("something :)");
Here's the rest of my code:
import org.apache.log4j.Logger;
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class OpcodeCount {
// debugging output
private static final Logger LOG = org.apache.log4j.Logger.getLogger(this.getClass());
LOG.warn("something :)");
public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable>{
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
// debugging output
private static final Logger LOG = org.apache.log4j.Logger.getLogger(this.getClass());
LOG.warn("something :)");
public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
context.write(word, one);
}
}
}
public static class IntSumReducer extends Reducer<Text,IntWritable,Text,IntWritable> {
private IntWritable result = new IntWritable();
// debugging output
private static final Logger LOG = org.apache.log4j.Logger.getLogger(this.getClass());
LOG.warn("something :)");
public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
result.set(sum);
context.write(key, result);
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "opcode count");
job.setJarByClass(OpcodeCount.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}

Log4j isn't the problem since it's the Java compiler throwing the error.
You can't call an instance method outside of another method or a static initializer block.
Move the .warn() into the map()

Related

Analyzing multiple input files and output only one file containing one final result

I do not have a great understanding of MapReduce. What I need to achieve is one line result output from the analysis of a few input files. Currently, my result contains one line per input file. So if I have 3 input files, I will have one output file containing 3 lines; a result per each input. Since I sort the result, I need to write only the first result to HDFS file. My code is below:
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;
public class WordLength {
public static class Map extends Mapper<Object, Text, LongWritable, Text> {
// private final static IntWritable one = new IntWritable(1);
int max = Integer.MIN_VALUE;
private Text word = new Text();
public void map(Object key, Text value, Context context)
throws IOException, InterruptedException {
String line = value.toString(); //cumleni goturur file dan, 1 line i
StringTokenizer tokenizer = new StringTokenizer(line); //cumleni sozlere bolur
while (tokenizer.hasMoreTokens()) {
String s= tokenizer.nextToken();
int val = s.length();
if(val>max) {
max=val;
word.set(s);
}
}
}
public void cleanup(Context context) throws IOException, InterruptedException {
context.write(new LongWritable(max), word);
}
}
public static class IntSumReducer
extends Reducer<LongWritable,Text,Text,LongWritable> {
private IntWritable result = new IntWritable();
int max=-100;
public void reduce(LongWritable key, Iterable<Text> values,
Context context
) throws IOException, InterruptedException {
context.write(new Text("longest"), key);
//context.write(new Text("longest"),key);
System.err.println(key);
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "word count");
job.setJarByClass(WordCount.class);
job.setMapperClass(Map.class);
job.setSortComparatorClass(LongWritable.DecreasingComparator.class);
//job.setCombinerClass(IntSumReducer.class);
job.setNumReduceTasks(1);
job.setReducerClass(IntSumReducer.class);
job.setMapOutputKeyClass(LongWritable.class);
job.setMapOutputValueClass(Text.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(LongWritable.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
}
It finds the longest length of a word per each input and prints it out. But i need to find the longest length among all possible input files, and print only one line.
So the output is:
longest 11
longest 10
longest 8
I want it to contain only:
longest 11
Thanks
changed my code for finding the longest word length. Now it prints only longest 11. If you have a better way, please feel free to correct my solution as I am eager to learn best options
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Mapper.Context;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;
public class WordCount {
public static class Map extends Mapper<Object, Text, Text, LongWritable> {
// private final static IntWritable one = new IntWritable(1);
int max = Integer.MIN_VALUE;
private Text word = new Text();
public void map(Object key, Text value, Context context)
throws IOException, InterruptedException {
String line = value.toString(); //cumleni goturur file dan, 1 line i
StringTokenizer tokenizer = new StringTokenizer(line); //cumleni sozlere bolur
while (tokenizer.hasMoreTokens()) {
String s= tokenizer.nextToken();
int val = s.length();
if(val>max) {
max=val;
word.set(s);
context.write(word,new LongWritable(val));
}
}
}
}
public static class IntSumReducer
extends Reducer<Text,LongWritable,Text,LongWritable> {
private LongWritable result = new LongWritable();
long max=-100;
public void reduce(Text key, Iterable<LongWritable> values,
Context context
) throws IOException, InterruptedException {
// int sum = -1;
for (LongWritable val : values) {
if(val.get()>max) {
max=val.get();
}
}
result.set(max);
}
public void cleanup(Context context) throws IOException, InterruptedException {
context.write(new Text("longest"),result );
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "word count");
job.setJarByClass(WordCount.class);
job.setMapperClass(Map.class);
job.setSortComparatorClass(LongWritable.DecreasingComparator.class);
// job.setCombinerClass(IntSumReducer.class);
job.setNumReduceTasks(1);
job.setReducerClass(IntSumReducer.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(LongWritable.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(LongWritable.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}

Hadoop MapReduce Word Count Global Sorting Output Empty

I want to write a optimized word count and I modify the example from the official website.
I add the combiner class to transfer little data and want to use global total sorting.
But I found that if I add TotalOrderPartitioner then the output will be empty and I don't know why. Since I comment the total order out then it will output the result.
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.input.KeyValueTextInputFormat;
import org.apache.hadoop.mapreduce.lib.partition.InputSampler;
import org.apache.hadoop.mapreduce.lib.partition.InputSampler.RandomSampler;
import org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner;
public class WordCount {
public static class TokenizerMapper
extends Mapper<Object, Text, Text, IntWritable>{
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(Object key, Text value, Context context
) throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
context.write(word, one);
}
}
}
public static class IntSumReducer
extends Reducer<Text,IntWritable,Text,IntWritable> {
private IntWritable result = new IntWritable();
public void reduce(Text key, Iterable<IntWritable> values,
Context context
) throws IOException, InterruptedException {
int count = 0;
for (IntWritable val : values) {
count += val.get();
}
result.set(count);
context.write(key, result);
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "word count");
job.setJarByClass(WordCount.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
job.setNumReduceTasks(100);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
// total order part
Path partitionFile = new Path("partitionFile");
RandomSampler<Text, Text> sampler = new RandomSampler<Text, Text>(0.1, 10000, 10);
TotalOrderPartitioner.setPartitionFile(job.getConfiguration(), partitionFile);
job.setInputFormatClass(KeyValueTextInputFormat.class);
job.setPartitionerClass(TotalOrderPartitioner.class);
InputSampler.writePartitionFile(job, sampler);
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}

How to submit WordCount.jar to hadoop via servlet

I now have a WordCount.jar stored in the linux local file system and a file containing a set of words stored in HDFS .
How can i run this WordCount.jar through a servlet and specify the input and output paths in the servlet.
package org.apache.hadoop.examples;
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;
public class WordCount {
public static class TokenizerMapper
extends Mapper<Object, Text, Text, IntWritable>{
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(Object key, Text value, Context context
) throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
context.write(word, one);
}
}
}
public static class IntSumReducer
extends Reducer<Text,IntWritable,Text,IntWritable> {
private IntWritable result = new IntWritable();
public void reduce(Text key, Iterable<IntWritable> values,
Context context
) throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
result.set(sum);
context.write(key, result);
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
if (otherArgs.length < 2) {
System.err.println("Usage: wordcount <in> [<in>...] <out>");
System.exit(2);
}
Job job = Job.getInstance(conf, "word count");
job.setJarByClass(WordCount.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
for (int i = 0; i < otherArgs.length - 1; ++i) {
FileInputFormat.addInputPath(job, new Path(otherArgs[i]));
}
FileOutputFormat.setOutputPath(job,
new Path(otherArgs[otherArgs.length - 1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
Generally following steps
Put your wordcount class jar file to your server classpath as well as necessary hadoop client jar files
Specify the input and output dir as http request argument
Parse the dirs from http request in your servlet doGet() method
Use JobClient to submit your job

Word Merge in hadoop

Currently i would like merge or concatenate two strings using hadoop. where The mapper function would group the words and the reduce will concatenate the values based on common key.
Below is my code for the map-reduce job.
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class mr2 {
// mapper class
public static class TokenizerMapper extends Mapper<Text, Text, Text, Text>{
private Text word = new Text(); // key
private Text value_of_key = new Text(); // value
public void map(Text key, Text value, Context context) throws IOException, InterruptedException {
String line = value.toString();
String IndexAndCategory = "";
String value_of_the_key = "";
StringTokenizer itr = new StringTokenizer(line);
// key creation
IndexAndCategory += itr.nextToken() + " ";
IndexAndCategory += itr.nextToken() + " ";
// value creation
value_of_the_key += itr.nextToken() + ":";
value_of_the_key += itr.nextToken() + " ";
// key and value
word.set(IndexAndCategory);
value_of_key.set(value_of_the_key);
// write key-value pair
context.write(word, (Text)value_of_key);
}
}
// reducer class
public static class IntSumReducer extends Reducer<Text,Text,Text,Text> {
//private IntWritable result = new IntWritable();
private Text values_of_key = new Text();
#Override
public void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException {
String values_ = "";
for (Text val : values) {
values_ += val.toString();
}
values_of_key.set(values_);
context.write(key, values_of_key);
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "mr2");
job.setJarByClass(mr2.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
job.setNumReduceTasks(1);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
The input to mapper is in the below format.
1 A this 2
1 A the 1
3 B is 1
The mapper process this into the below format and gives to reducer
1 A this:2
1 A the:1
3 B is:1
The reduce then reduces the given input into below format.
1 A this:2 the:1
3 B is:1
I used word count as basic template and modified it to process Text(String) but when i execute the above mentioned code i am getting the below error.
Error: java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be cast to org.apache.hadoop.io.Text
at mr2$TokenizerMapper.map(mr2.java:17)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
It is expecting LongIntWritable. Any help to solve this issue is appreciated.
If you're reading a text file, the mapper must be defined as
public static class TokenizerMapper extends Mapper<LongWritable, Text, Text, Text>{
So the map method should look like this
#Override
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
The problem was in the main function i was not specify what is the output of the mapper, so the reducer was expecting the default one as input. For more details refer the this post.
Changed input type to Object from Text.
public static class TokenizerMapper extends Mapper{
public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
Adding the following lines solved the issue.
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(Text.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
The following is the complete working code.
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.io.LongWritable;
public class mr2 {
// mapper class
public static class TokenizerMapper extends Mapper<Object, Text, Text, Text>{
private Text word = new Text(); // key
private Text value_of_key = new Text(); // value
public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
String line = value.toString();
String IndexAndCategory = "";
String value_of_the_key = "";
StringTokenizer itr = new StringTokenizer(line);
// key creation
IndexAndCategory += itr.nextToken() + " ";
IndexAndCategory += itr.nextToken() + " ";
// value creation
value_of_the_key += itr.nextToken() + ":";
value_of_the_key += itr.nextToken() + " ";
// key and value
word.set(IndexAndCategory);
value_of_key.set(value_of_the_key);
// write key-value pair
context.write(word, value_of_key);
}
}
// reducer class
public static class IntSumReducer extends Reducer<Text,Text,Text,Text> {
//private IntWritable result = new IntWritable();
private Text values_of_key = new Text();
public void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException {
String values_ = "";
for (Text val : values) {
values_ += val.toString();
}
values_of_key.set(values_);
context.write(key, values_of_key);
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "mr2");
job.setJarByClass(mr2.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
job.setNumReduceTasks(1);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(Text.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}

While taking multiple input error is occuring

I am getting this error:
The method:
addInputPath(Job, Path, Class<? extends InputFormat>, Class<? extends Mapper>)
in the type MultipleInputs is
not applicable for the arguments (JobConf, Path, Class<TextInputFormat>, Class<App.MapClass>)
for following code:
MultipleInputs.addInputPath(job, in, TextInputFormat.class, MapClass.class);
/* ------------------------ */
package hadoop.mi4;
/**
* Hello world!
*
*/
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.TextInputFormat;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.MultipleInputs;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;
public class App {
public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable>{
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(Object key, Text value, Context context ) throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
context.write(word, one);
}
}
}
public static class TokenizerMapper1 extends Mapper<Object, Text, Text, IntWritable>{
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
context.write(word, one);
}
}
}
public static class IntSumReducer extends Reducer<Text,IntWritable,Text,IntWritable> {
private IntWritable result = new IntWritable();
public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
result.set(sum);
context.write(key, result);
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
if (otherArgs.length != 2) {
System.err.println("Usage: App <in> <out>");
System.exit(2);
}
Job job = new Job(conf, "word count");
job.setJarByClass(App.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
//FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
MultipleInputs.addInputPath(job, new Path(otherArgs[0]), TextInputFormat.class, TokenizerMapper.class);
MultipleInputs.addInputPath(job, new Path(otherArgs[1]), TextInputFormat.class, TokenizerMapper1.class);
FileOutputFormat.setOutputPath(job, new Path(otherArgs[2]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
It doesn't work because you've mixed up classes from mapred with mapreduce. Replace the following import
import org.apache.hadoop.mapred.TextInputFormat
with
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat
It looks like your job variable is of type JobConf whereas it should be a Job. You should try
MultipleInputs.addInputPath(new Job(job), in, TextInputFormat.class, MapClass.class);
you should also check that MapClass extends Mapper.

Categories