I got the following exception while trying to execute hadoop mapreduce program.
java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:865)
at com.vasa.books.BookDriver.main(BookDriver.java:37)
BookDriver.java
package com.vasa.books;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.FileInputFormat;
import org.apache.hadoop.mapred.FileOutputFormat;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapred.TextInputFormat;
import org.apache.hadoop.mapred.TextOutputFormat;
public class BookDriver {
public static void main(String args[]) {
// TODO Auto-generated method stub
JobClient client=new JobClient();
JobConf conf=new JobConf(com.vasa.books.BookDriver.class);
conf.setJobName("booknamefind");
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(IntWritable.class);
conf.setMapperClass(com.vasa.books.bookmapper.class);
conf.setReducerClass(com.vasa.books.bookreducer.class);
conf.setInputFormat(TextInputFormat.class);
conf.setOutputFormat(TextOutputFormat.class);
FileInputFormat.setInputPaths(conf,new Path(args[0]));
FileOutputFormat.setOutputPath(conf, new Path(args[1]));
client.setConf(conf);
try{
JobClient.runJob(conf);
}catch(Exception e){
e.printStackTrace();
}
}
}
BookMapper.java
package com.vasa.books;
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.Writable;
import org.apache.hadoop.io.WritableComparable;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.Mapper;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reporter;
public class BookMapper extends MapReduceBase implements Mapper<LongWritable,Text,Text,IntWritable> {
private final static IntWritable one=new IntWritable(1);
public void map(LongWritable _key, Text value,
OutputCollector<Text, IntWritable> output, Reporter reporter)
throws IOException {
// TODO Auto-generated method stub
String Tempstring=value.toString();
String[] singlebookdata=Tempstring.split("\",\"");
output.collect(new Text(singlebookdata[3]), one);
}
}
why does that exception occur?
According to the JobClient source, JobClient.runJob() calls JobClient.monitorAndPrintJob() which returns a boolean. If that boolean is false (meaning the job failed), it prints out that useless error message "something failed!" that you are seeing.
To solve this you have two options:
1 - (Faster) Check the logs. The RunningJob failure info should be getting printed to the logs.
2 - If you don't know where the logs are, don't have logging enabled, or don't want to have to dig through logs, you could rewrite a bit of your code. Instead of using JobClient.runJob(), I would do the equivalent of what runJob() is doing in your code, so that when it fails you get a useful error message.
public static RunningJob myCustomRunJob(JobConf job) throws Exception {
JobClient jc = new JobClient(job);
RunningJob rj = jc.submitJob(job);
if (!jc.monitorAndPrintJob(job, rj)) {
throw new IOException("Job failed with info: " + rj.getFailureInfo());
}
return rj;
}
My guess is the underlying problem is that either arg[0] or arg[1] (your input or output files) are not being found.
Related
I am currently using Eclipse and Hadoop to create a mapper and reducer to find Maximum Total Cost of an Airline Data Set.
So the Total Cost is Decimal Value and Airline Carrier is Text.
The dataset I used can be found in the following weblink:
https://www.gov.uk/government/uploads/system/uploads/attachment_data/file/236265/dft-flights-data-2011.csv
When I export the jar file in Hadoop,
I am getting the following message: ls: "output" : No such file or directory.
Can anyone help me correct the code please?
My code is below.
Mapper:
package org.myorg;
import java.io.IOException;
import org.apache.hadoop.io.DoubleWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
public class MaxTotalCostMapper extends Mapper<LongWritable, Text, Text, DoubleWritable>
{
private final static DoubleWritable totalcostWritable = new DoubleWritable(0);
private Text AirCarrier = new Text();
#Override
public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException
{
String[] line = value.toString().split(",");
AirCarrier.set(line[8]);
double totalcost = Double.parseDouble(line[2].trim());
totalcostWritable.set(totalcost);
context.write(AirCarrier, totalcostWritable);
}
}
Reducer:
package org.myorg;
import java.io.IOException;
import java.util.ArrayList;
import org.apache.hadoop.io.DoubleWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
public class MaxTotalCostReducer extends Reducer<Text, DoubleWritable, Text, DoubleWritable>
{
ArrayList<Double> totalcostList = new ArrayList<Double>();
#Override
public void reduce(Text key, Iterable<DoubleWritable> values, Context context)
throws IOException, InterruptedException
{
double maxValue=0.0;
for (DoubleWritable value : values)
{
maxValue = Math.max(maxValue, value.get());
}
context.write(key, new DoubleWritable(maxValue));
}
}
Main:
package org.myorg;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.DoubleWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class MaxTotalCost
{
public static void main(String[] args) throws Exception
{
Configuration conf = new Configuration();
if (args.length != 2)
{
System.err.println("Usage: MaxTotalCost<input path><output path>");
System.exit(-1);
}
Job job;
job=Job.getInstance(conf, "Max Total Cost");
job.setJarByClass(MaxTotalCost.class);
FileInputFormat.addInputPath(job, new Path(args[1]));
FileOutputFormat.setOutputPath(job, new Path(args[2]));
job.setMapperClass(MaxTotalCostMapper.class);
job.setReducerClass(MaxTotalCostReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(DoubleWritable.class);
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
ls: "output" : No such file or directory
You have no HDFS user directory. Your code isn't making it into the Mapper or Reducer. That error typically arises at the Job
FileOutputFormat.setOutputPath(job, new Path(args[2]));
Run an hdfs dfs -ls, see if you get any errors. If so, make a directory under /user that matches your current user.
Otherwise, change your output directory to something like /tmp/max
Am doing mapreduce for the following code. When i run this Job every thing works fine. But the output shows 0 0. I suspect this may be due to the TryparseInt() method which i QUICKFIXED as it was undefined previously.Initially there was no method for the TryparseInt(). so i created one, Can any one check whether the code is correct expecially the TryParseInt Method and tell me any suggetion to run this program successfully.
input looks like :
Thanks in Advance
import java.io.IOException;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.io.LongWritable;
public class MaxPubYear {
public static class MaxPubYearMapper extends Mapper<LongWritable , Text, IntWritable,Text>
{
public void map(LongWritable key, Text value , Context context)
throws IOException, InterruptedException
{
String delim = "\t";
Text valtosend = new Text();
String tokens[] = value.toString().split(delim);
if (tokens.length == 2)
{
valtosend.set(tokens[0] + ";"+ tokens[1]);
context.write(new IntWritable(1), valtosend);
}
}
}
public static class MaxPubYearReducer extends Reducer<IntWritable ,Text, Text, IntWritable>
{
public void reduce(IntWritable key, Iterable<Text> values , Context context) throws IOException, InterruptedException
{
int maxiValue = Integer.MIN_VALUE;
String maxiYear = "";
for(Text value:values) {
String token[] = value.toString().split(";");
if(token.length == 2 && TryParseInt(token[1]).intValue()> maxiValue)
{
maxiValue = TryParseInt(token[1]);
maxiYear = token[0];
}
}
context.write(new Text(maxiYear), new IntWritable(maxiValue));
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job Job = new Job(conf, "Maximum Publication year");
Job.setJarByClass(MaxPubYear.class);
Job.setOutputKeyClass(Text.class);
Job.setOutputValueClass(IntWritable.class);
Job.setMapOutputKeyClass(IntWritable.class);
Job.setMapOutputValueClass(Text.class);
Job.setMapperClass(MaxPubYearMapper.class);
Job.setReducerClass(MaxPubYearReducer.class);
FileInputFormat.addInputPath(Job,new Path(args[0]));
FileOutputFormat.setOutputPath(Job,new Path(args[1]));
System.exit(Job.waitForCompletion(true)?0:1);
}
public static Integer TryParseInt(String string) {
// TODO Auto-generated method stub
return(0);
}
}
The errors mean exactly what they say: for the three 'could not be resolved to a type' errors you probaobly forgot to import the right classes. Error 2 simply means there is no method TryParseInt(String) in the class MaxPubYear.MaxPubYearReducer you have to create one there.
I'm getting following error in map function while parsing a CSV file.
14/07/15 19:40:05 INFO mapreduce.Job: Task Id : attempt_1403602091361_0018_m_000001_2, Status : FAILED
Error: java.lang.ArrayIndexOutOfBoundsException: 4
at com.test.mapreduce.RetailCustomerAnalysis_2$MapClass.map(RetailCustomerAnalysis_2.java:55)
at com.test.mapreduce.RetailCustomerAnalysis_2$MapClass.map(RetailCustomerAnalysis_2.java:1)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:429)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157)
The map function is given below
package com.test.mapreduce;
import java.io.IOException;
import java.text.ParseException;
import java.text.SimpleDateFormat;
import java.util.Date;
import java.util.Iterator;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.FileInputFormat;
import org.apache.hadoop.mapred.FileOutputFormat;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapred.KeyValueTextInputFormat;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.Mapper;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reducer;
import org.apache.hadoop.mapred.Reporter;
import org.apache.hadoop.mapred.TextOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
public class RetailCustomerAnalysis_2 extends Configured implements Tool {
public static class MapClass extends MapReduceBase
implements Mapper<Text, Text, Text, Text> {
private Text key1 = new Text();
private Text value1 = new Text();
public void map(Text key, Text value,
OutputCollector<Text, Text> output,
Reporter reporter) throws IOException {
String line = value.toString();
String[] split = line.split(",");
key1.set(split[0].trim());
/* line no 55 where error is occuring */
value1.set(split[4].trim());
output.collect(key1, value1);
}
}
public int run(String[] args) throws Exception {
Configuration conf = getConf();
JobConf job = new JobConf(conf, RetailCustomerAnalysis_2.class);
Path in = new Path(args[0]);
Path out = new Path(args[1]);
FileInputFormat.setInputPaths(job, in);
FileOutputFormat.setOutputPath(job, out);
job.setJobName("RetailCustomerAnalysis_2");
job.setMapperClass(MapClass.class);
job.setReducerClass(Reduce.class);
job.setInputFormat(KeyValueTextInputFormat.class);
job.setOutputFormat(TextOutputFormat.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
// job.set("key.value.separator.in.input.line", ",");
JobClient.runJob(job);
return 0;
}
public static void main(String[] args) throws Exception {
int res = ToolRunner.run(new Configuration(), new RetailCustomerAnalysis_2(), args);
System.exit(res);
}
}
Sample Input used to run this code is as follows
PRAVEEN,4002012,Kids,02GK,7/4/2010
PRAVEEN,400201,TOY,020383,14/04/2014
I'm running the application using the following command and Inputs.
yarn jar RetailCustomerAnalysis_2.jar com.test.mapreduce.RetailCustomerAnalysis_2 /hduser/input5 /hduser/output5
Add check to see if input line has all fields defined or ignore processing it map function. Code would be something like this in new API.
if(split.length!=noOfFields){
return;
}
Additionally, if you are further interested you can set up hadoop countner to know how many rows were in total which dint contain all required fields in csv file.
if(split.length!=noOfFields){
context.getCounter(MTJOB.DISCARDED_ROWS_DUE_MISSING_FIELDS)
.increment(1);
return;
}
split[] has elements split[0], split[1], split[2] and split[3] only
In case of KeyValueTextInputFormat the first String before the separator is considered as the key and rest of the line is considered as value. A byte separator(coma OR whitespace etc.) is used to separate between key and value from every record.
In your code the first string before first coma is taken as the key and rest of line is taken as value. And as you split the value, there are only 4 strings in it. Therefore the String array can go from split[0] to split[3] only instead of split[4].
Any suggestions or corrections are welcomed.
I am facing the problem while write MapReduce Program, my input file is being read twice by the program. already have gone through this why is my sequence file being read twice in my hadoop mapper class? answer, but unfortunately it did not help
My Mapper class is:
package com.siddu.mapreduce.csv;
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.Writable;
import org.apache.hadoop.io.WritableComparable;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.Mapper;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reporter;
public class SidduCSVMapper extends MapReduceBase implements Mapper<LongWritable,Text,Text,IntWritable>
{
IntWritable one = new IntWritable(1);
#Override
public void map(LongWritable key, Text line,
OutputCollector<Text, IntWritable> output, Reporter report)
throws IOException
{
String lineCSV= line.toString();
String[] tokens = lineCSV.split(";");
output.collect(new Text(tokens[2]), one);
}
}
And My Reducer class is:
package com.siddu.mapreduce.csv;
import java.io.IOException;
import java.util.Iterator;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.WritableComparable;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reducer;
import org.apache.hadoop.mapred.Reporter;
public class SidduCSVReducer extends MapReduceBase implements Reducer<Text,IntWritable,Text,IntWritable>
{
#Override
public void reduce(Text key, Iterator<IntWritable> inputFrmMapper,
OutputCollector<Text, IntWritable> output, Reporter reporter)
throws IOException
{
System.out.println("In reducer the key is:"+key.toString());
int relationOccurance=0;
while(inputFrmMapper.hasNext())
{
IntWritable intWriteOb = inputFrmMapper.next();
int val = intWriteOb.get();
relationOccurance += val;
}
output.collect(key, new IntWritable(relationOccurance));
}
}
And finally My Driver class is:
package com.siddu.mapreduce.csv;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.FileInputFormat;
import org.apache.hadoop.mapred.FileOutputFormat;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapred.TextInputFormat;
import org.apache.hadoop.mapred.TextOutputFormat;
public class SidduCSVMapReduceDriver
{
public static void main(String[] args)
{
JobClient client = new JobClient();
JobConf conf = new JobConf(com.siddu.mapreduce.csv.SidduCSVMapReduceDriver.class);
conf.setJobName("Siddu CSV Reader 1.0");
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(IntWritable.class);
conf.setMapperClass(com.siddu.mapreduce.csv.SidduCSVMapper.class);
conf.setReducerClass(com.siddu.mapreduce.csv.SidduCSVReducer.class);
conf.setInputFormat(TextInputFormat.class);
conf.setOutputFormat(TextOutputFormat.class);
FileInputFormat.setInputPaths(conf, new Path(args[0]));
FileOutputFormat.setOutputPath(conf, new Path(args[1]));
client.setConf(conf);
try
{
JobClient.runJob(conf);
}
catch(Exception e)
{
e.printStackTrace();
}
}
}
You should be aware that hadoop spawns multiple attempts of a task, usually two for each mapper. If you see log file output twice, that's probably the reason.
Error : Exception in thread "main" java.lang.NoSuchMethodError: org.apache.hadoop.security.UserGroupInformation.getCredentials()Lorg/apache/hadoop/security/Credentials;
at org.apache.hadoop.mapreduce.Job.(Job.java:135)
at org.apache.hadoop.mapreduce.Job.getInstance(Job.java:176)
at org.apache.hadoop.mapreduce.Job.getInstance(Job.java:195)
at WordCount.main(WordCount.java:20)
Hadoop version 2.2.0
WordCount.java
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
public class WordCount {
public static void main(String[] args) throws Exception {
if (args.length != 2) {
System.out.println("usage: [input] [output]");
System.exit(-1);
}
Job job = Job.getInstance(new Configuration(), "word count");
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
job.setMapperClass(WordMapper.class);
job.setReducerClass(SumReducer.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
FileInputFormat.setInputPaths(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.setJarByClass(WordCount.class);
job.setJobName("WordCount");
job.submit();
}
}
WordMapper.java
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
public class WordMapper extends Mapper<Object, Text, Text, IntWritable> {
private Text word = new Text();
private final static IntWritable one = new IntWritable(1);
#Override
public void map(Object key, Text value,
Context contex) throws IOException, InterruptedException {
// Break line into words for processing
StringTokenizer wordList = new StringTokenizer(value.toString());
while (wordList.hasMoreTokens()) {
word.set(wordList.nextToken());
contex.write(word, one);
}
}
}
SumReducer.java
import java.io.IOException;
import java.util.Iterator;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
public class SumReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
private IntWritable totalWordCount = new IntWritable();
#Override
public void reduce(Text key, Iterable<IntWritable> values, Context context)
throws IOException, InterruptedException {
int wordCount = 0;
Iterator<IntWritable> it=values.iterator();
while (it.hasNext()) {
wordCount += it.next().get();
}
totalWordCount.set(wordCount);
context.write(key, totalWordCount);
}
}
Please let me know what can be done ?Latest mapreduce API is used for the program. All the jars that came with hadoop 2.2.0 are also imported into eclipse.
Thanks :)
Are you using an Eclipse plugin for Hadoop? If not that is the problem. With out the plugin, Eclipse if just running the WordCount class and Hadoop can't find the necessary jars. Bundle all the jars including WordCount and run it in Cluster.
If you want to run it from Eclipse you need Eclipse plugin. If you don't have one, you can build the plugin by following this instructions