We are trying to design a simple program, where the goal is to read the patent data from a file, and check if other countries have cited that patent or not, this is from the text book 'Hadoop in Action' by 'chuck Lam', where we are trying to learn about advanced map/reduce programming.
The hadoop distribution which we have setup is Local Node, and we are executing the program in the Windows environment, using cygwin.
This is the URL http://www.nber.org/patents/ from which we downloaded files : apat63_99.txt and cite75_99.txt.
We are using 'apat63_99.txt' as the distributed cache files, and the 'cite75_99.txt' is in the input folder, which we are passing from the command line parameters.
The problem is that the program is not generating output, the output files which we are seeing has no data in it.
We have tried with the mapper phase as well as the reducer phase output and both are blank.
Here is the code which we have developed for this task:
package com.sample.patent;
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.util.Hashtable;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.filecache.DistributedCache;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;
public class country_cite {
private static Hashtable<String, String> joinData
= new Hashtable<String, String>();
public static class Country_Citation_Class extends
Mapper<Text, Text, Text, Text> {
Path[] cacheFiles;
public void configure(JobConf conf) {
try {
cacheFiles = DistributedCache.getLocalCacheArchives(conf);
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
public void map(Text key, Text value, Context context)
throws IOException, InterruptedException {
if (cacheFiles != null && cacheFiles.length > 0) {
String line;
String[] tokens;
BufferedReader joinReader = new BufferedReader(new FileReader(
cacheFiles[0].toString()));
try {
while ((line = joinReader.readLine()) != null) {
tokens = line.split(",");
joinData.put(tokens[0], tokens[4]);
}
} finally {
joinReader.close();
}
}
if (joinData.get(key) != null)
context.write(key, new Text(joinData.get(key)));
}
}
public static class MyReduceClass extends Reducer<Text, Text, Text, Text> {
public void reduce(Text key, Iterable<Text> values, Context context)
throws IOException, InterruptedException {
String patent_country = joinData.get(key);
if (patent_country != null) {
for (Text val : values) {
String cited_country = joinData.get(val);
if (cited_country != null
&& !cited_country.equals(patent_country)) {
context.write(key, new Text(cited_country));
}
}
}
}
}
public static void main(String[] args) throws Exception {
// TODO Auto-generated method stub
Configuration conf = new Configuration();
DistributedCache.addCacheFile(new Path(args[0]).toUri(),
conf);
String[] otherArgs = new GenericOptionsParser(conf, args)
.getRemainingArgs();
if (otherArgs.length != 3) {
System.err.println("Usage: country_cite <in> <out>");
System.exit(2);
}
Job job = new Job(conf,"country_cite");
job.setJarByClass(country_cite.class);
job.setMapperClass(Country_Citation_Class.class);
job.setInputFormatClass(org.apache.hadoop.mapreduce.lib.input.KeyValueTextInputFormat.class);
// job.setReducerClass(MyReduceClass.class);
job.setNumReduceTasks(0);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
FileInputFormat.addInputPath(job, new Path(otherArgs[1]));
FileOutputFormat.setOutputPath(job, new Path(otherArgs[2]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
The tool is Eclipse and Hadoop's version which we are using is 1.2.1.
These are the command line parameters to run the job:
/cygdrive/c/cygwin64/usr/local/hadoop
$ bin/hadoop jar PatentCitation.jar country_cite apat63_99.txt input output
This is the trace which gets generated while the program executes:
/cygdrive/c/cygwin64/usr/local/hadoop
$ bin/hadoop jar PatentCitation.jar country_cite apat63_99.txt input output
Patch for HADOOP-7682: Instantiating workaround file system
14/06/22 12:39:20 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Patch for HADOOP-7682: Ignoring IOException setting persmission for path "file:/tmp/hadoop-RaoSa/mapred/staging/RaoSa1277400315/.staging": Failed to set permissions of path: \tmp\hadoop-RaoSa\mapred\staging\RaoSa1277400315\.staging to 0700
Patch for HADOOP-7682: Ignoring IOException setting persmission for path "file:/tmp/hadoop-RaoSa/mapred/staging/RaoSa1277400315/.staging/job_local1277400315_0001": Failed to set permissions of path: \tmp\hadoop-RaoSa\mapred\staging\RaoSa1277400315\.staging\job_local1277400315_0001 to 0700
14/06/22 12:39:21 WARN mapred.JobClient: No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String).
14/06/22 12:39:21 INFO input.FileInputFormat: Total input paths to process : 1
14/06/22 12:39:21 WARN snappy.LoadSnappy: Snappy native library not loaded
Patch for HADOOP-7682: Ignoring IOException setting persmission for path "file:/tmp/hadoop-RaoSa/mapred/staging/RaoSa1277400315/.staging/job_local1277400315_0001/job.split": Failed to set permissions of path: \tmp\hadoop-RaoSa\mapred\staging\RaoSa1277400315\.staging\job_local1277400315_0001\job.split to 0644
Patch for HADOOP-7682: Ignoring IOException setting persmission for path "file:/tmp/hadoop-RaoSa/mapred/staging/RaoSa1277400315/.staging/job_local1277400315_0001/job.splitmetainfo": Failed to set permissions of path: \tmp\hadoop-RaoSa\mapred\staging\RaoSa1277400315\.staging\job_local1277400315_0001\job.splitmetainfo to 0644
Patch for HADOOP-7682: Ignoring IOException setting persmission for path "file:/tmp/hadoop-RaoSa/mapred/staging/RaoSa1277400315/.staging/job_local1277400315_0001/job.xml": Failed to set permissions of path: \tmp\hadoop-RaoSa\mapred\staging\RaoSa1277400315\.staging\job_local1277400315_0001\job.xml to 0644
14/06/22 12:39:23 INFO filecache.TrackerDistributedCacheManager: Creating fileapat63_99.txt in /tmp/hadoop-RaoSa/mapred/local/archive/7067728792316735217_-679065598_1881640498-work-5016028422992714806 with rwxr-xr-x
Patch for HADOOP-7682: Ignoring IOException setting persmission for path "/tmp/hadoop-RaoSa/mapred/local/archive/7067728792316735217_-679065598_1881640498-work-5016028422992714806": Failed to set permissions of path: \tmp\hadoop-RaoSa\mapred\local\archive\7067728792316735217_-679065598_1881640498-work-5016028422992714806 to 0755
14/06/22 12:40:06 INFO filecache.TrackerDistributedCacheManager: Cached apat63_99.txt as /tmp/hadoop-RaoSa/mapred/local/archive/7067728792316735217_-679065598_1881640498/fileapat63_99.txt
14/06/22 12:40:08 INFO filecache.TrackerDistributedCacheManager: Cached apat63_99.txt as /tmp/hadoop-RaoSa/mapred/local/archive/7067728792316735217_-679065598_1881640498/fileapat63_99.txt
14/06/22 12:40:09 INFO mapred.JobClient: Running job: job_local1277400315_0001
14/06/22 12:40:10 INFO mapred.LocalJobRunner: Waiting for map tasks
14/06/22 12:40:10 INFO mapred.LocalJobRunner: Starting task: attempt_local1277400315_0001_m_000000_0
14/06/22 12:40:10 INFO mapred.Task: Using ResourceCalculatorPlugin : null
14/06/22 12:40:10 INFO mapred.MapTask: Processing split: file:/C:/cygwin64/usr/local/hadoop/input/cite75_99.txt:0+33554432
14/06/22 12:40:10 INFO mapred.JobClient: map 0% reduce 0%
14/06/22 12:40:15 INFO mapred.Task: Task:attempt_local1277400315_0001_m_000000_0 is done. And is in the process of commiting
14/06/22 12:40:15 INFO mapred.LocalJobRunner:
14/06/22 12:40:15 INFO mapred.Task: Task attempt_local1277400315_0001_m_000000_0 is allowed to commit now
14/06/22 12:40:15 INFO output.FileOutputCommitter: Saved output of task 'attempt_local1277400315_0001_m_000000_0' to output
14/06/22 12:40:15 INFO mapred.LocalJobRunner:
14/06/22 12:40:15 INFO mapred.Task: Task 'attempt_local1277400315_0001_m_000000_0' done.
14/06/22 12:40:15 INFO mapred.LocalJobRunner: Finishing task: attempt_local1277400315_0001_m_000000_0
14/06/22 12:40:15 INFO mapred.LocalJobRunner: Starting task: attempt_local1277400315_0001_m_000001_0
14/06/22 12:40:15 INFO mapred.Task: Using ResourceCalculatorPlugin : null
14/06/22 12:40:15 INFO mapred.MapTask: Processing split: file:/C:/cygwin64/usr/local/hadoop/input/cite75_99.txt:33554432+33554432
14/06/22 12:40:16 INFO mapred.JobClient: map 12% reduce 0%
14/06/22 12:40:21 INFO mapred.Task: Task:attempt_local1277400315_0001_m_000001_0 is done. And is in the process of commiting
14/06/22 12:40:21 INFO mapred.LocalJobRunner:
14/06/22 12:40:21 INFO mapred.Task: Task attempt_local1277400315_0001_m_000001_0 is allowed to commit now
14/06/22 12:40:21 INFO output.FileOutputCommitter: Saved output of task 'attempt_local1277400315_0001_m_000001_0' to output
14/06/22 12:40:21 INFO mapred.LocalJobRunner:
14/06/22 12:40:21 INFO mapred.Task: Task 'attempt_local1277400315_0001_m_000001_0' done.
14/06/22 12:40:21 INFO mapred.LocalJobRunner: Finishing task: attempt_local1277400315_0001_m_000001_0
14/06/22 12:40:21 INFO mapred.LocalJobRunner: Starting task: attempt_local1277400315_0001_m_000002_0
14/06/22 12:40:21 INFO mapred.Task: Using ResourceCalculatorPlugin : null
14/06/22 12:40:21 INFO mapred.MapTask: Processing split: file:/C:/cygwin64/usr/local/hadoop/input/cite75_99.txt:67108864+33554432
14/06/22 12:40:21 INFO mapred.JobClient: map 25% reduce 0%
14/06/22 12:40:26 INFO mapred.Task: Task:attempt_local1277400315_0001_m_000002_0 is done. And is in the process of commiting
14/06/22 12:40:26 INFO mapred.LocalJobRunner:
14/06/22 12:40:26 INFO mapred.Task: Task attempt_local1277400315_0001_m_000002_0 is allowed to commit now
14/06/22 12:40:26 INFO output.FileOutputCommitter: Saved output of task 'attempt_local1277400315_0001_m_000002_0' to output
14/06/22 12:40:26 INFO mapred.LocalJobRunner:
14/06/22 12:40:26 INFO mapred.Task: Task 'attempt_local1277400315_0001_m_000002_0' done.
14/06/22 12:40:26 INFO mapred.LocalJobRunner: Finishing task: attempt_local1277400315_0001_m_000002_0
14/06/22 12:40:26 INFO mapred.LocalJobRunner: Starting task: attempt_local1277400315_0001_m_000003_0
14/06/22 12:40:26 INFO mapred.Task: Using ResourceCalculatorPlugin : null
14/06/22 12:40:26 INFO mapred.MapTask: Processing split: file:/C:/cygwin64/usr/local/hadoop/input/cite75_99.txt:100663296+33554432
14/06/22 12:40:26 INFO mapred.JobClient: map 37% reduce 0%
14/06/22 12:40:29 INFO mapred.LocalJobRunner:
14/06/22 12:40:29 INFO mapred.JobClient: map 42% reduce 0%
14/06/22 12:40:29 INFO mapred.Task: Task:attempt_local1277400315_0001_m_000003_0 is done. And is in the process of commiting
14/06/22 12:40:29 INFO mapred.LocalJobRunner:
14/06/22 12:40:29 INFO mapred.Task: Task attempt_local1277400315_0001_m_000003_0 is allowed to commit now
14/06/22 12:40:29 INFO output.FileOutputCommitter: Saved output of task 'attempt_local1277400315_0001_m_000003_0' to output
14/06/22 12:40:29 INFO mapred.LocalJobRunner:
14/06/22 12:40:29 INFO mapred.Task: Task 'attempt_local1277400315_0001_m_000003_0' done.
14/06/22 12:40:29 INFO mapred.LocalJobRunner: Finishing task: attempt_local1277400315_0001_m_000003_0
14/06/22 12:40:29 INFO mapred.LocalJobRunner: Starting task: attempt_local1277400315_0001_m_000004_0
14/06/22 12:40:29 INFO mapred.Task: Using ResourceCalculatorPlugin : null
14/06/22 12:40:29 INFO mapred.MapTask: Processing split: file:/C:/cygwin64/usr/local/hadoop/input/cite75_99.txt:134217728+33554432
14/06/22 12:40:30 INFO mapred.JobClient: map 50% reduce 0%
14/06/22 12:40:30 INFO mapred.Task: Task:attempt_local1277400315_0001_m_000004_0 is done. And is in the process of commiting
14/06/22 12:40:30 INFO mapred.LocalJobRunner:
14/06/22 12:40:30 INFO mapred.Task: Task attempt_local1277400315_0001_m_000004_0 is allowed to commit now
14/06/22 12:40:30 INFO output.FileOutputCommitter: Saved output of task 'attempt_local1277400315_0001_m_000004_0' to output
14/06/22 12:40:30 INFO mapred.LocalJobRunner:
14/06/22 12:40:30 INFO mapred.Task: Task 'attempt_local1277400315_0001_m_000004_0' done.
14/06/22 12:40:30 INFO mapred.LocalJobRunner: Finishing task: attempt_local1277400315_0001_m_000004_0
14/06/22 12:40:30 INFO mapred.LocalJobRunner: Starting task: attempt_local1277400315_0001_m_000005_0
14/06/22 12:40:30 INFO mapred.Task: Using ResourceCalculatorPlugin : null
14/06/22 12:40:30 INFO mapred.MapTask: Processing split: file:/C:/cygwin64/usr/local/hadoop/input/cite75_99.txt:167772160+33554432
14/06/22 12:40:31 INFO mapred.JobClient: map 62% reduce 0%
14/06/22 12:40:31 INFO mapred.Task: Task:attempt_local1277400315_0001_m_000005_0 is done. And is in the process of commiting
14/06/22 12:40:31 INFO mapred.LocalJobRunner:
14/06/22 12:40:31 INFO mapred.Task: Task attempt_local1277400315_0001_m_000005_0 is allowed to commit now
14/06/22 12:40:31 INFO output.FileOutputCommitter: Saved output of task 'attempt_local1277400315_0001_m_000005_0' to output
14/06/22 12:40:31 INFO mapred.LocalJobRunner:
14/06/22 12:40:31 INFO mapred.Task: Task 'attempt_local1277400315_0001_m_000005_0' done.
14/06/22 12:40:31 INFO mapred.LocalJobRunner: Finishing task: attempt_local1277400315_0001_m_000005_0
14/06/22 12:40:31 INFO mapred.LocalJobRunner: Starting task: attempt_local1277400315_0001_m_000006_0
14/06/22 12:40:31 INFO mapred.Task: Using ResourceCalculatorPlugin : null
14/06/22 12:40:31 INFO mapred.MapTask: Processing split: file:/C:/cygwin64/usr/local/hadoop/input/cite75_99.txt:201326592+33554432
14/06/22 12:40:32 INFO mapred.JobClient: map 75% reduce 0%
14/06/22 12:40:32 INFO mapred.Task: Task:attempt_local1277400315_0001_m_000006_0 is done. And is in the process of commiting
14/06/22 12:40:32 INFO mapred.LocalJobRunner:
14/06/22 12:40:32 INFO mapred.Task: Task attempt_local1277400315_0001_m_000006_0 is allowed to commit now
14/06/22 12:40:32 INFO output.FileOutputCommitter: Saved output of task 'attempt_local1277400315_0001_m_000006_0' to output
14/06/22 12:40:32 INFO mapred.LocalJobRunner:
14/06/22 12:40:32 INFO mapred.Task: Task 'attempt_local1277400315_0001_m_000006_0' done.
14/06/22 12:40:32 INFO mapred.LocalJobRunner: Finishing task: attempt_local1277400315_0001_m_000006_0
14/06/22 12:40:32 INFO mapred.LocalJobRunner: Starting task: attempt_local1277400315_0001_m_000007_0
14/06/22 12:40:32 INFO mapred.Task: Using ResourceCalculatorPlugin : null
14/06/22 12:40:33 INFO mapred.MapTask: Processing split: file:/C:/cygwin64/usr/local/hadoop/input/cite75_99.txt:234881024+29194407
14/06/22 12:40:33 INFO mapred.JobClient: map 87% reduce 0%
14/06/22 12:40:35 INFO mapred.Task: Task:attempt_local1277400315_0001_m_000007_0 is done. And is in the process of commiting
14/06/22 12:40:35 INFO mapred.LocalJobRunner:
14/06/22 12:40:35 INFO mapred.Task: Task attempt_local1277400315_0001_m_000007_0 is allowed to commit now
14/06/22 12:40:35 INFO output.FileOutputCommitter: Saved output of task 'attempt_local1277400315_0001_m_000007_0' to output
14/06/22 12:40:35 INFO mapred.LocalJobRunner:
14/06/22 12:40:35 INFO mapred.Task: Task 'attempt_local1277400315_0001_m_000007_0' done.
14/06/22 12:40:35 INFO mapred.LocalJobRunner: Finishing task: attempt_local1277400315_0001_m_000007_0
14/06/22 12:40:35 INFO mapred.LocalJobRunner: Map task executor complete.
14/06/22 12:40:35 INFO mapred.JobClient: map 100% reduce 0%
14/06/22 12:40:35 INFO mapred.JobClient: Job complete: job_local1277400315_0001
14/06/22 12:40:35 INFO mapred.JobClient: Counters: 9
14/06/22 12:40:35 INFO mapred.JobClient: File Output Format Counters
14/06/22 12:40:35 INFO mapred.JobClient: Bytes Written=64
14/06/22 12:40:35 INFO mapred.JobClient: FileSystemCounters
14/06/22 12:40:35 INFO mapred.JobClient: FILE_BYTES_READ=5009033659
14/06/22 12:40:35 INFO mapred.JobClient: FILE_BYTES_WRITTEN=3820489832
14/06/22 12:40:35 INFO mapred.JobClient: File Input Format Counters
14/06/22 12:40:35 INFO mapred.JobClient: Bytes Read=264104103
14/06/22 12:40:35 INFO mapred.JobClient: Map-Reduce Framework
14/06/22 12:40:35 INFO mapred.JobClient: Map input records=16522439
14/06/22 12:40:35 INFO mapred.JobClient: Spilled Records=0
14/06/22 12:40:35 INFO mapred.JobClient: Total committed heap usage (bytes)=708313088
14/06/22 12:40:35 INFO mapred.JobClient: Map output records=0
14/06/22 12:40:35 INFO mapred.JobClient: SPLIT_RAW_BYTES=952
Kindly let us know where are we going wrong, in case if I have missed any vital information, let me know.
Thanks and Regards
I think that the error is in line if (joinData.get(key) != null). joinData uses String as key and you pass Text as an argument to get so get returns null every time. Try to replace this line with if (joinData.get(key.toString()) != null).
Another mistake is that each Mapper and each Reducer run in their own jvm so Reducer and Mapper can't communicate through the static objects and joinData is empty for every Reducer.
Related
The input file contains the adjacency list and has multiple lines in the following format:
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class Friends
{
public class FriendsMap extends Mapper < LongWritable, Text, Text, IntWritable >
{
private Text friendsAB;
private Text friendsBA;
private IntWritable one = new IntWritable(1);
private IntWritable oneLess = new IntWritable(-999999999);
//#SuppressWarnings("null")
#Override
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException
{
String friendsOfA[] = null; //This will be all of the friends of the user in this row
String oneRow[] = value.toString().split("\t,"); //Break the row up into users IDs
String userA = oneRow[0]; //This is the main user for this row
for (int i=1; i < oneRow.length; i++) //Create an array of the rest of the users in this row
{
friendsOfA[i-1] = oneRow[i];
}
for (int i=0; i < oneRow.length; i++) //Output the main user in pairs with all friends plus a lagre negative #
{
friendsAB.set(userA + " " + friendsOfA[i]);
context.write(friendsAB, oneLess);
System.out.println(friendsAB + " " + oneLess);
}
for (int i = 0; i < friendsOfA.length; i++) //Output each friend pair plus the number 1
{
for (int j = i + 1; j < friendsOfA.length; j++)
{
friendsAB.set(friendsOfA[i] + " " + friendsOfA[j]);
friendsBA.set(friendsOfA[j] + " " + friendsOfA[i]);
context.write(friendsAB, one);
context.write(friendsBA, one);
System.out.println(friendsAB + " " + one);
System.out.println(friendsBA + " " + one);
}
}
}
}
class FriendReducer extends Reducer < Text, IntWritable, Text, IntWritable >
{
private IntWritable result = new IntWritable();
#Override
public void reduce( Text key, Iterable < IntWritable > values, Context context) throws IOException, InterruptedException
{
int sum = 0;
for (IntWritable val : values)
{
sum += val.get();
}
if (sum > 1)
{
result.set( sum);
context.write( key, result);
}
//At this point I have all pairs of users with recomenede friends and a count of how many times they each
//friend has been recomended to a user.
//I need to sort by user and then by number of recomendations.
//Then print the user <tab> all recomendations with commas between them.
}
}
public static void main( String[] args) throws Exception
{
Configuration conf = new Configuration();
Job job = Job.getInstance( conf, "Friends");
job.setJarByClass(Friends.class);
FileInputFormat.addInputPath( job, new Path("input"));
FileOutputFormat.setOutputPath( job, new Path("output"));
job.setMapperClass( FriendsMap.class);
job.setCombinerClass( FriendReducer.class);
job.setReducerClass( FriendReducer.class);
job.setOutputKeyClass( Text.class);
job.setOutputValueClass( IntWritable.class);
System.exit( job.waitForCompletion( true) ? 0 : 1);
}
}
This is the errors I am getting in the console.
17/11/15 16:05:51 WARN util.NativeCodeLoader: Unable to load
native-hadoop library for your platform... using builtin-java classes
where applicable 17/11/15 16:06:54 INFO Configuration.deprecation:
session.id is deprecated. Instead, use dfs.metrics.session-id 17/11/15
16:06:54 INFO jvm.JvmMetrics: Initializing JVM Metrics with
processName=JobTracker, sessionId= 17/11/15 16:06:54 WARN
mapred.JobClient: Use GenericOptionsParser for parsing the arguments.
Applications should implement Tool for the same. 17/11/15 16:06:55
WARN mapred.JobClient: No job jar file set. User classes may not be
found. See JobConf(Class) or JobConf#setJar(String). 17/11/15 16:06:55
INFO input.FileInputFormat: Total input paths to process : 2 17/11/15
16:07:05 INFO mapred.JobClient: Running job: job_local426825952_0001
17/11/15 16:07:05 INFO mapred.LocalJobRunner: OutputCommitter set in
config null 17/11/15 16:07:05 INFO mapred.LocalJobRunner:
OutputCommitter is
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter 17/11/15
16:07:05 INFO mapred.LocalJobRunner: Waiting for map tasks 17/11/15
16:07:05 INFO mapred.LocalJobRunner: Starting task:
attempt_local426825952_0001_m_000000_0 17/11/15 16:07:05 WARN
mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is
deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead
17/11/15 16:07:05 INFO util.ProcessTree: setsid exited with exit code
0 17/11/15 16:07:05 INFO mapred.Task: Using ResourceCalculatorPlugin
: org.apache.hadoop.util.LinuxResourceCalculatorPlugin#670217f0
17/11/15 16:07:05 INFO mapred.LocalJobRunner: Starting task:
attempt_local426825952_0001_m_000001_0 17/11/15 16:07:05 WARN
mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is
deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead
17/11/15 16:07:05 INFO mapred.Task: Using ResourceCalculatorPlugin :
org.apache.hadoop.util.LinuxResourceCalculatorPlugin#1688e9ff 17/11/15
16:07:05 INFO mapred.LocalJobRunner: Map task executor complete.
17/11/15 16:07:05 WARN mapred.LocalJobRunner: job_local426825952_0001
java.lang.Exception: java.lang.RuntimeException:
java.lang.NoSuchMethodException: Friends$FriendsMap.() at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:406)
Caused by: java.lang.RuntimeException:
java.lang.NoSuchMethodException: Friends$FriendsMap.() at
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:131)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:631)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330) at
org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:268)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262) at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745) Caused by:
java.lang.NoSuchMethodException: Friends$FriendsMap.() at
java.lang.Class.getConstructor0(Class.java:2849) at
java.lang.Class.getDeclaredConstructor(Class.java:2053) at
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:125)
... 8 more 17/11/15 16:07:06 INFO mapred.JobClient: map 0% reduce 0%
17/11/15 16:07:06 INFO mapred.JobClient: Job complete:
job_local426825952_0001 17/11/15 16:07:06 INFO mapred.JobClient:
Counters: 0
After changing the classes to Static this is the new errors.
17/11/16 04:28:50 WARN util.NativeCodeLoader: Unable to load
native-hadoop library for your platform... using builtin-java classes
where applicable 17/11/16 04:28:52 INFO Configuration.deprecation:
session.id is deprecated. Instead, use dfs.metrics.session-id 17/11/16
04:28:52 INFO jvm.JvmMetrics: Initializing JVM Metrics with
processName=JobTracker, sessionId= 17/11/16 04:28:52 WARN
mapred.JobClient: Use GenericOptionsParser for parsing the arguments.
Applications should implement Tool for the same. 17/11/16 04:28:52
WARN mapred.JobClient: No job jar file set. User classes may not be
found. See JobConf(Class) or JobConf#setJar(String). 17/11/16 04:28:53
INFO input.FileInputFormat: Total input paths to process : 2 17/11/16
04:28:54 INFO mapred.LocalJobRunner: OutputCommitter set in config
null 17/11/16 04:28:54 INFO mapred.JobClient: Running job:
job_local1593958162_0001 17/11/16 04:28:54 INFO mapred.LocalJobRunner:
OutputCommitter is
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter 17/11/16
04:28:54 INFO mapred.LocalJobRunner: Waiting for map tasks 17/11/16
04:28:54 INFO mapred.LocalJobRunner: Starting task:
attempt_local1593958162_0001_m_000000_0 17/11/16 04:28:54 WARN
mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is
deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead
17/11/16 04:28:54 INFO util.ProcessTree: setsid exited with exit code
0 17/11/16 04:28:54 INFO mapred.Task: Using ResourceCalculatorPlugin
: org.apache.hadoop.util.LinuxResourceCalculatorPlugin#57d51956
17/11/16 04:28:54 INFO mapred.MapTask: Processing split:
file:/home/cloudera/workspace/Assignment4/input/Sample4.txt:0+4106187
17/11/16 04:28:54 INFO mapred.MapTask: Map output collector class =
org.apache.hadoop.mapred.MapTask$MapOutputBuffer 17/11/16 04:28:54
INFO mapred.MapTask: io.sort.mb = 100 17/11/16 04:28:55 INFO
mapred.MapTask: data buffer = 79691776/99614720 17/11/16 04:28:55 INFO
mapred.MapTask: record buffer = 262144/327680 17/11/16 04:28:55 INFO
mapred.LocalJobRunner: Starting task:
attempt_local1593958162_0001_m_000001_0 17/11/16 04:28:55 WARN
mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is
deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead
17/11/16 04:28:55 INFO mapred.Task: Using ResourceCalculatorPlugin :
org.apache.hadoop.util.LinuxResourceCalculatorPlugin#774140b3 17/11/16
04:28:55 INFO mapred.MapTask: Processing split:
file:/home/cloudera/workspace/Assignment4/input/Sample4.txt~:0+0
17/11/16 04:28:55 INFO mapred.MapTask: Map output collector class =
org.apache.hadoop.mapred.MapTask$MapOutputBuffer 17/11/16 04:28:55
INFO mapred.MapTask: io.sort.mb = 100 17/11/16 04:28:55 INFO
mapred.JobClient: map 0% reduce 0% 17/11/16 04:28:55 INFO
mapred.MapTask: data buffer = 79691776/99614720 17/11/16 04:28:55 INFO
mapred.MapTask: record buffer = 262144/327680 17/11/16 04:28:55 INFO
mapred.LocalJobRunner: 17/11/16 04:28:55 INFO mapred.MapTask:
Starting flush of map output 17/11/16 04:28:55 INFO mapred.Task:
Task:attempt_local1593958162_0001_m_000001_0 is done. And is in the
process of commiting 17/11/16 04:28:55 INFO mapred.LocalJobRunner:
17/11/16 04:28:55 INFO mapred.Task: Task
'attempt_local1593958162_0001_m_000001_0' done. 17/11/16 04:28:55 INFO
mapred.LocalJobRunner: Finishing task:
attempt_local1593958162_0001_m_000001_0 17/11/16 04:28:55 INFO
mapred.LocalJobRunner: Map task executor complete. 17/11/16 04:28:55
WARN mapred.LocalJobRunner: job_local1593958162_0001
java.lang.Exception: java.lang.NullPointerException at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:406)
Caused by: java.lang.NullPointerException at
Friends$FriendsMap.map(Friends.java:36) at
Friends$FriendsMap.map(Friends.java:1) at
org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:140) at
org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:672) at
org.apache.hadoop.mapred.MapTask.run(MapTask.java:330) at
org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:268)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262) at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745) 17/11/16 04:28:56 INFO
mapred.JobClient: Job complete: job_local1593958162_0001 17/11/16
04:28:56 INFO mapred.JobClient: Counters: 16 17/11/16 04:28:56 INFO
mapred.JobClient: File System Counters 17/11/16 04:28:56 INFO
mapred.JobClient: FILE: Number of bytes read=4674 17/11/16
04:28:56 INFO mapred.JobClient: FILE: Number of bytes
written=139416 17/11/16 04:28:56 INFO mapred.JobClient: FILE:
Number of read operations=0 17/11/16 04:28:56 INFO mapred.JobClient:
FILE: Number of large read operations=0 17/11/16 04:28:56 INFO
mapred.JobClient: FILE: Number of write operations=0 17/11/16
04:28:56 INFO mapred.JobClient: Map-Reduce Framework 17/11/16
04:28:56 INFO mapred.JobClient: Map input records=0 17/11/16
04:28:56 INFO mapred.JobClient: Map output records=0 17/11/16
04:28:56 INFO mapred.JobClient: Map output bytes=0 17/11/16
04:28:56 INFO mapred.JobClient: Input split bytes=125 17/11/16
04:28:56 INFO mapred.JobClient: Combine input records=0 17/11/16
04:28:56 INFO mapred.JobClient: Combine output records=0 17/11/16
04:28:56 INFO mapred.JobClient: Spilled Records=0 17/11/16
04:28:56 INFO mapred.JobClient: CPU time spent (ms)=0 17/11/16
04:28:56 INFO mapred.JobClient: Physical memory (bytes) snapshot=0
17/11/16 04:28:56 INFO mapred.JobClient: Virtual memory (bytes)
snapshot=0 17/11/16 04:28:56 INFO mapred.JobClient: Total
committed heap usage (bytes)=363696128
I think this is the part that is the most troublesome.
Screen Shot of some errors
This is the updated code.
public static class FriendsMap extends Mapper < LongWritable, Text, Text, IntWritable >
{
//#SuppressWarnings("null")
#Override
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException
{
String friendsOfA[]; //This will be all of the friends of the user in this row
friendsOfA = new String[] {};
String friendsAB = "1"; //This will be used to create pairs of users
String friendsBA = "2"; //This will be used to create pairs of users
Text pairA;
Text pairB;
IntWritable one = new IntWritable(1); //1 if they are not an existing pair here
IntWritable oneLess = new IntWritable(-999999999); // if they are an existing pair
String oneRow[] = value.toString().split("\t,"); //Break the row up into users IDs
Text userA = new Text(oneRow[0]); //This is the main user for this row
for (int i=1; i < oneRow.length; i++) //Create an array of the rest of the users in this row
{
friendsOfA[i-1] = oneRow[i];
}
for (int i=0; i < oneRow.length; i++) //Output the main user in pairs with all friends plus a large negative #
{ //We do not want to recommend them as friends because they are friends
Text FOA = new Text (friendsOfA[i]);
friendsAB = (userA + " " + FOA);
Text pair = new Text (friendsAB);
context.write(pair, oneLess);
System.out.println(pair + " " + oneLess);
}
for (int i = 0; i < friendsOfA.length; i++) //Output each friend pair plus the number 1
{ //We want to recommend them as potential friends
for (int j = i + 1; j < friendsOfA.length; j++)
{
Text FOA = new Text (friendsOfA[i]);
Text FOB = new Text (friendsOfA[j]);
friendsAB = (FOA + " " + FOB);
friendsBA = (FOB + " " + FOA);
pairA = new Text (friendsAB);
pairB = new Text (friendsBA);
context.write(pairA, one);
context.write(pairB, one);
System.out.println(pairA + " " + one);
System.out.println(pairB + " " + one);
}
}
}
}
And this is the new set of errors.
17/11/16 11:59:25 WARN util.NativeCodeLoader: Unable to load
native-hadoop library for your platform... using builtin-java classes
where applicable 17/11/16 11:59:27 INFO Configuration.deprecation:
session.id is deprecated. Instead, use dfs.metrics.session-id 17/11/16
11:59:27 INFO jvm.JvmMetrics: Initializing JVM Metrics with
processName=JobTracker, sessionId= 17/11/16 11:59:27 WARN
mapred.JobClient: Use GenericOptionsParser for parsing the arguments.
Applications should implement Tool for the same. 17/11/16 11:59:27
WARN mapred.JobClient: No job jar file set. User classes may not be
found. See JobConf(Class) or JobConf#setJar(String). 17/11/16 11:59:27
INFO input.FileInputFormat: Total input paths to process : 2 17/11/16
11:59:29 INFO mapred.JobClient: Running job: job_local1899187381_0001
17/11/16 11:59:29 INFO mapred.LocalJobRunner: OutputCommitter set in
config null 17/11/16 11:59:29 INFO mapred.LocalJobRunner:
OutputCommitter is
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter 17/11/16
11:59:29 INFO mapred.LocalJobRunner: Waiting for map tasks 17/11/16
11:59:29 INFO mapred.LocalJobRunner: Starting task:
attempt_local1899187381_0001_m_000000_0 17/11/16 11:59:29 WARN
mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is
deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead
17/11/16 11:59:29 INFO util.ProcessTree: setsid exited with exit code
0 17/11/16 11:59:29 INFO mapred.Task: Using ResourceCalculatorPlugin
: org.apache.hadoop.util.LinuxResourceCalculatorPlugin#4f94aaa1
17/11/16 11:59:29 INFO mapred.MapTask: Processing split:
file:/home/cloudera/workspace/Assignment4/input/Sample4.txt:0+4106187
17/11/16 11:59:29 INFO mapred.MapTask: Map output collector class =
org.apache.hadoop.mapred.MapTask$MapOutputBuffer 17/11/16 11:59:29
INFO mapred.MapTask: io.sort.mb = 100 17/11/16 11:59:29 INFO
mapred.MapTask: data buffer = 79691776/99614720 17/11/16 11:59:29 INFO
mapred.MapTask: record buffer = 262144/327680 17/11/16 11:59:29 INFO
mapred.LocalJobRunner: Starting task:
attempt_local1899187381_0001_m_000001_0 17/11/16 11:59:29 WARN
mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is
deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead
17/11/16 11:59:29 INFO mapred.Task: Using ResourceCalculatorPlugin :
org.apache.hadoop.util.LinuxResourceCalculatorPlugin#622ecc38 17/11/16
11:59:29 INFO mapred.MapTask: Processing split:
file:/home/cloudera/workspace/Assignment4/input/Sample4.txt~:0+0
17/11/16 11:59:29 INFO mapred.MapTask: Map output collector class =
org.apache.hadoop.mapred.MapTask$MapOutputBuffer 17/11/16 11:59:29
INFO mapred.MapTask: io.sort.mb = 100 17/11/16 11:59:30 INFO
mapred.JobClient: map 0% reduce 0% 17/11/16 11:59:30 INFO
mapred.MapTask: data buffer = 79691776/99614720 17/11/16 11:59:30 INFO
mapred.MapTask: record buffer = 262144/327680 17/11/16 11:59:30 INFO
mapred.LocalJobRunner: 17/11/16 11:59:30 INFO mapred.MapTask:
Starting flush of map output 17/11/16 11:59:30 INFO mapred.Task:
Task:attempt_local1899187381_0001_m_000001_0 is done. And is in the
process of commiting 17/11/16 11:59:30 INFO mapred.LocalJobRunner:
17/11/16 11:59:30 INFO mapred.Task: Task
'attempt_local1899187381_0001_m_000001_0' done. 17/11/16 11:59:30 INFO
mapred.LocalJobRunner: Finishing task:
attempt_local1899187381_0001_m_000001_0 17/11/16 11:59:30 INFO
mapred.LocalJobRunner: Map task executor complete. 17/11/16 11:59:30
WARN mapred.LocalJobRunner: job_local1899187381_0001
java.lang.Exception: java.lang.ArrayIndexOutOfBoundsException: 0 at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:406)
Caused by: java.lang.ArrayIndexOutOfBoundsException: 0 at
Friends$FriendsMap.map(Friends.java:41) at
Friends$FriendsMap.map(Friends.java:1) at
org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:140) at
org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:672) at
org.apache.hadoop.mapred.MapTask.run(MapTask.java:330) at
org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:268)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262) at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745) 17/11/16 11:59:31 INFO
mapred.JobClient: Job complete: job_local1899187381_0001 17/11/16
11:59:31 INFO mapred.JobClient: Counters: 16 17/11/16 11:59:31 INFO
mapred.JobClient: File System Counters 17/11/16 11:59:31 INFO
mapred.JobClient: FILE: Number of bytes read=4674 17/11/16
11:59:31 INFO mapred.JobClient: FILE: Number of bytes
written=139416 17/11/16 11:59:31 INFO mapred.JobClient: FILE:
Number of read operations=0 17/11/16 11:59:31 INFO mapred.JobClient:
FILE: Number of large read operations=0 17/11/16 11:59:31 INFO
mapred.JobClient: FILE: Number of write operations=0 17/11/16
11:59:31 INFO mapred.JobClient: Map-Reduce Framework 17/11/16
11:59:31 INFO mapred.JobClient: Map input records=0 17/11/16
11:59:31 INFO mapred.JobClient: Map output records=0 17/11/16
11:59:31 INFO mapred.JobClient: Map output bytes=0 17/11/16
11:59:31 INFO mapred.JobClient: Input split bytes=125 17/11/16
11:59:31 INFO mapred.JobClient: Combine input records=0 17/11/16
11:59:31 INFO mapred.JobClient: Combine output records=0 17/11/16
11:59:31 INFO mapred.JobClient: Spilled Records=0 17/11/16
11:59:31 INFO mapred.JobClient: CPU time spent (ms)=0 17/11/16
11:59:31 INFO mapred.JobClient: Physical memory (bytes) snapshot=0
17/11/16 11:59:31 INFO mapred.JobClient: Virtual memory (bytes)
snapshot=0 17/11/16 11:59:31 INFO mapred.JobClient: Total
committed heap usage (bytes)=363618304
You've delcared the classes as inner classes, which might be causing issues. An inner class can only exist within an instance of the enclosing class.
Its probably easier to change them to static classes.
public class Friends {
public static class FriendsMap extends Mapper <...> {}
public static class FriendReducer extends Reducer <...> {}
public static void main( String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "Friends");
job.setJarByClass(Friends.class);
FileInputFormat.addInputPath(job, new Path("input"));
FileOutputFormat.setOutputPath(job, new Path("output"));
job.setMapperClass(FriendsMap.class);
job.setCombinerClass(FriendReducer.class);
job.setReducerClass(FriendReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
System.exit( job.waitForCompletion( true) ? 0 : 1);
}
}
I have a program that prints the avg of the balance and counts the number of customers.Everything was working fine until i noticed that the part-r-0000 file empty.It is very strange because i haven't changed anything in hadoop configuration.I will post the stacktrace of cmd below
17/04/14 14:21:31 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
17/04/14 14:21:31 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
17/04/14 14:21:31 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
17/04/14 14:21:31 INFO input.FileInputFormat: Total input paths to process : 1
17/04/14 14:21:31 INFO mapreduce.JobSubmitter: number of splits:1
17/04/14 14:21:32 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local1656799721_0001
17/04/14 14:21:32 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
17/04/14 14:21:32 INFO mapreduce.Job: Running job: job_local1656799721_0001
17/04/14 14:21:32 INFO mapred.LocalJobRunner: OutputCommitter set in config null
17/04/14 14:21:32 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
17/04/14 14:21:32 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
17/04/14 14:21:32 INFO mapred.LocalJobRunner: Waiting for map tasks
17/04/14 14:21:32 INFO mapred.LocalJobRunner: Starting task: attempt_local1656799721_0001_m_000000_0
17/04/14 14:21:32 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
17/04/14 14:21:32 INFO util.ProcfsBasedProcessTree: ProcfsBasedProcessTree currently is supported only on Linux.
17/04/14 14:21:32 INFO mapred.Task: Using ResourceCalculatorProcessTree : org.apache.hadoop.yarn.util.WindowsBasedProcessTree#7c8cb1b6
17/04/14 14:21:32 INFO mapred.MapTask: Processing split: hdfs://localhost:19000/datagen/data/customer.tbl:0+2411114
17/04/14 14:21:32 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
17/04/14 14:21:32 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
17/04/14 14:21:32 INFO mapred.MapTask: soft limit at 83886080
17/04/14 14:21:32 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
17/04/14 14:21:32 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
17/04/14 14:21:32 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
17/04/14 14:21:32 INFO mapred.LocalJobRunner:
17/04/14 14:21:32 INFO mapred.MapTask: Starting flush of map output
17/04/14 14:21:32 INFO mapred.Task: Task:attempt_local1656799721_0001_m_000000_0 is done. And is in the process of committing
17/04/14 14:21:32 INFO mapred.LocalJobRunner: map
17/04/14 14:21:32 INFO mapred.Task: Task 'attempt_local1656799721_0001_m_000000_0' done.
17/04/14 14:21:32 INFO mapred.LocalJobRunner: Finishing task: attempt_local1656799721_0001_m_000000_0
17/04/14 14:21:32 INFO mapred.LocalJobRunner: map task executor complete.
17/04/14 14:21:32 INFO mapred.LocalJobRunner: Waiting for reduce tasks
17/04/14 14:21:32 INFO mapred.LocalJobRunner: Starting task: attempt_local1656799721_0001_r_000000_0
17/04/14 14:21:32 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
17/04/14 14:21:32 INFO util.ProcfsBasedProcessTree: ProcfsBasedProcessTree currently is supported only on Linux.
17/04/14 14:21:32 INFO mapred.Task: Using ResourceCalculatorProcessTree : org.apache.hadoop.yarn.util.WindowsBasedProcessTree#25135c4c
17/04/14 14:21:32 INFO mapred.ReduceTask: Using ShuffleConsumerPlugin: org.apache.hadoop.mapreduce.task.reduce.Shuffle#2d7e552d
17/04/14 14:21:32 INFO reduce.MergeManagerImpl: MergerManager: memoryLimit=334338464, maxSingleShuffleLimit=83584616, mergeThreshold=220663392, ioSortFactor=10, memToMemMergeOutputsThreshold=10
17/04/14 14:21:32 INFO reduce.EventFetcher: attempt_local1656799721_0001_r_000000_0 Thread started: EventFetcher for fetching Map Completion Events
17/04/14 14:21:32 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output of map attempt_local1656799721_0001_m_000000_0 decomp: 2 len: 6 to MEMORY
17/04/14 14:21:32 INFO reduce.InMemoryMapOutput: Read 2 bytes from map-output for attempt_local1656799721_0001_m_000000_0
17/04/14 14:21:32 INFO reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 2, inMemoryMapOutputs.size() -> 1, commitMemory -> 0, usedMemory ->2
17/04/14 14:21:32 INFO reduce.EventFetcher: EventFetcher is interrupted.. Returning
17/04/14 14:21:32 INFO mapred.LocalJobRunner: 1 / 1 copied.
17/04/14 14:21:32 INFO reduce.MergeManagerImpl: finalMerge called with 1 in-memory map-outputs and 0 on-disk map-outputs
17/04/14 14:21:32 INFO mapred.Merger: Merging 1 sorted segments
17/04/14 14:21:32 INFO mapred.Merger: Down to the last merge-pass, with 0 segments left of total size: 0 bytes
17/04/14 14:21:32 INFO reduce.MergeManagerImpl: Merged 1 segments, 2 bytes to disk to satisfy reduce memory limit
17/04/14 14:21:32 INFO reduce.MergeManagerImpl: Merging 1 files, 6 bytes from disk
17/04/14 14:21:32 INFO reduce.MergeManagerImpl: Merging 0 segments, 0 bytes from memory into reduce
17/04/14 14:21:32 INFO mapred.Merger: Merging 1 sorted segments
17/04/14 14:21:32 INFO mapred.Merger: Down to the last merge-pass, with 0 segments left of total size: 0 bytes
17/04/14 14:21:32 INFO mapred.LocalJobRunner: 1 / 1 copied.
17/04/14 14:21:32 INFO Configuration.deprecation: mapred.skip.on is deprecated. Instead, use mapreduce.job.skiprecords
17/04/14 14:21:32 INFO mapred.Task: Task:attempt_local1656799721_0001_r_000000_0 is done. And is in the process of committing
17/04/14 14:21:32 INFO mapred.LocalJobRunner: 1 / 1 copied.
17/04/14 14:21:32 INFO mapred.Task: Task attempt_local1656799721_0001_r_000000_0 is allowed to commit now
17/04/14 14:21:32 INFO output.FileOutputCommitter: Saved output of task 'attempt_local1656799721_0001_r_000000_0' to hdfs://localhost:19000/out19/_temporary/0/task_local1656799721_0001_r_000000
17/04/14 14:21:32 INFO mapred.LocalJobRunner: reduce > reduce
17/04/14 14:21:32 INFO mapred.Task: Task 'attempt_local1656799721_0001_r_000000_0' done.
17/04/14 14:21:32 INFO mapred.LocalJobRunner: Finishing task: attempt_local1656799721_0001_r_000000_0
17/04/14 14:21:32 INFO mapred.LocalJobRunner: reduce task executor complete.
17/04/14 14:21:33 INFO mapreduce.Job: Job job_local1656799721_0001 running in uber mode : false
17/04/14 14:21:33 INFO mapreduce.Job: map 100% reduce 100%
17/04/14 14:21:33 INFO mapreduce.Job: Job job_local1656799721_0001 completed successfully
17/04/14 14:21:33 INFO mapreduce.Job: Counters: 35
File System Counters
FILE: Number of bytes read=17482
FILE: Number of bytes written=591792
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=4822228
HDFS: Number of bytes written=0
HDFS: Number of read operations=13
HDFS: Number of large read operations=0
HDFS: Number of write operations=4
Map-Reduce Framework
Map input records=15000
Map output records=0
Map output bytes=0
Map output materialized bytes=6
Input split bytes=113
Combine input records=0
Combine output records=0
Reduce input groups=0
Reduce shuffle bytes=6
Reduce input records=0
Reduce output records=0
Spilled Records=0
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=0
Total committed heap usage (bytes)=546308096
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=2411114
File Output Format Counters
Bytes Written=0
Code
public static class TokenizerMapper extends Mapper<LongWritable, Text,Text ,Text>{
private Text segment = new Text();
//private ThreeWritableValues cust = new ThreeWritableValues();
private Text word = new Text();
private float balance = 0;
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
String[] line = value.toString().split("\\|");
String cust_key = line[1];
int nation = Integer.parseInt(line[3]);
if((balance > 8000) && ( nation < 15) && (nation > 1)){
segment.set(line[6]);
word.set(cust_key+","+balance);
context.write(segment,word);
}
}
}
public static class AvgReducer extends Reducer<Text,Text,Text,Text> {
public void reduce(Text key, Iterable<Text> values,Context context) throws IOException, InterruptedException {
context.write(key, values.iterator().next());
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "word count");
job.setJarByClass(MapReduceTest.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(AvgReducer.class);
job.setReducerClass(AvgReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
Please help if anyone know something.
There is no output generated at your map phase
Map output records=0
Map output bytes=0
In your TokenizerMapper class, the value of balance is defined as 0.
private float balance = 0;
and in map method, the value of balance still 0 but is checked for > 8000.
if((balance > 8000) && ( nation < 15) && (nation > 1)){
segment.set(line[6]);
word.set(cust_key+","+balance);
context.write(segment,word);
}
The if condition is never met and thus no mapper output and no reducer output.
I am new to the world of big data and hadoop i am trying to run a code availabe in google it consist of four steps such as putting a data in hadoop file system ,then adding index to the data then the major step that is create a reduced data using map and reduce.
I was able to run the first two step :
the code uses xml to handle the location :
the code which i used is http://asterixdb.ics.uci.edu/fuzzyjoin/
when i do the final step that is the fuzzy join it give me a series of errors :
hereby attaching the trace file to :
hduser#ubuntu:/home/midhu/fuzzyjoin$ cd fuzzyjoin-hadoop
hduser#ubuntu:/home/midhu/fuzzyjoin/fuzzyjoin-hadoop$ hadoop jar target/fuzzyjoin-hadoop-0.0.2-SNAPSHOT.jar fuzzyjoin -conf src/main/resources/fuzzyjoin/dblp.quickstart.xml
16/04/03 13:55:38 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Complete-Job started: Sun Apr 03 13:55:42 IST 2016
Multi-Job started: Sun Apr 03 13:55:42 IST 2016
FuzzyJoinDriver(TokensBasic.phase1)
Input Path: {hdfs://localhost:54310/user/hduser/dblp-small/records-000}
Output Path: hdfs://localhost:54310/user/hduser/dblp-small/tokens.phase1-000
Map Jobs: 2
Reduce Jobs: 1
Properties: {fuzzyjoin.similarity.name=Jaccard
fuzzyjoin.similarity.threshold=.5
fuzzyjoin.tokenizer=Word
fuzzyjoin.tokens.package=Scalar
fuzzyjoin.tokens.lengthstats=false
fuzzyjoin.ridpairs.group.class=TokenIdentity
fuzzyjoin.ridpairs.group.factor=1
fuzzyjoin.data.tokens=
fuzzyjoin.data.joinindex=}
Job started: Sun Apr 03 13:55:42 IST 2016
16/04/03 13:55:42 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
16/04/03 13:55:42 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
16/04/03 13:55:42 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
16/04/03 13:55:43 INFO mapred.FileInputFormat: Total input paths to process : 1
16/04/03 13:55:43 INFO mapreduce.JobSubmitter: number of splits:1
16/04/03 13:55:44 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local1780986358_0001
16/04/03 13:55:44 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
16/04/03 13:55:44 INFO mapreduce.Job: Running job: job_local1780986358_0001
16/04/03 13:55:44 INFO mapred.LocalJobRunner: OutputCommitter set in config null
16/04/03 13:55:44 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapred.FileOutputCommitter
16/04/03 13:55:45 INFO mapred.LocalJobRunner: Waiting for map tasks
16/04/03 13:55:45 INFO mapred.LocalJobRunner: Starting task: attempt_local1780986358_0001_m_000000_0
16/04/03 13:55:46 INFO mapreduce.Job: Job job_local1780986358_0001 running in uber mode : false
16/04/03 13:55:46 INFO mapreduce.Job: map 0% reduce 0%
16/04/03 13:55:46 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ]
16/04/03 13:55:46 INFO mapred.MapTask: Processing split: hdfs://localhost:54310/user/hduser/dblp-small/records-000/part-00000:0+36687
16/04/03 13:55:46 INFO mapred.MapTask: numReduceTasks: 1
16/04/03 13:55:49 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
16/04/03 13:55:49 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
16/04/03 13:55:49 INFO mapred.MapTask: soft limit at 83886080
16/04/03 13:55:49 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
16/04/03 13:55:49 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
16/04/03 13:55:49 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
16/04/03 13:55:52 INFO mapred.LocalJobRunner: hdfs://localhost:54310/user/hduser/dblp-small/records-000/part-00000:0+36687 > map
16/04/03 13:55:54 INFO mapred.LocalJobRunner: hdfs://localhost:54310/user/hduser/dblp-small/records-000/part-00000:0+36687 > map
16/04/03 13:55:54 INFO mapred.MapTask: Starting flush of map output
16/04/03 13:55:54 INFO mapred.MapTask: Spilling map output
16/04/03 13:55:54 INFO mapred.MapTask: bufstart = 0; bufend = 15588; bufvoid = 104857600
16/04/03 13:55:54 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 26209408(104837632); length = 4989/6553600
16/04/03 13:55:54 INFO mapred.MapTask: Finished spill 0
16/04/03 13:55:54 INFO mapred.Task: Task:attempt_local1780986358_0001_m_000000_0 is done. And is in the process of committing
16/04/03 13:55:54 INFO mapred.LocalJobRunner: hdfs://localhost:54310/user/hduser/dblp-small/records-000/part-00000:0+36687
16/04/03 13:55:54 INFO mapred.Task: Task 'attempt_local1780986358_0001_m_000000_0' done.
16/04/03 13:55:54 INFO mapred.LocalJobRunner: Finishing task: attempt_local1780986358_0001_m_000000_0
16/04/03 13:55:54 INFO mapred.LocalJobRunner: map task executor complete.
16/04/03 13:55:54 INFO mapred.LocalJobRunner: Waiting for reduce tasks
16/04/03 13:55:54 INFO mapred.LocalJobRunner: Starting task: attempt_local1780986358_0001_r_000000_0
16/04/03 13:55:54 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ]
16/04/03 13:55:54 INFO mapreduce.Job: map 100% reduce 0%
16/04/03 13:55:54 INFO mapred.ReduceTask: Using ShuffleConsumerPlugin: org.apache.hadoop.mapreduce.task.reduce.Shuffle#3209e0
16/04/03 13:55:54 INFO reduce.MergeManagerImpl: MergerManager: memoryLimit=363285696, maxSingleShuffleLimit=90821424, mergeThreshold=239768576, ioSortFactor=10, memToMemMergeOutputsThreshold=10
16/04/03 13:55:54 INFO reduce.EventFetcher: attempt_local1780986358_0001_r_000000_0 Thread started: EventFetcher for fetching Map Completion Events
16/04/03 13:55:56 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output of map attempt_local1780986358_0001_m_000000_0 decomp: 9062 len: 9066 to MEMORY
16/04/03 13:55:56 INFO reduce.InMemoryMapOutput: Read 9062 bytes from map-output for attempt_local1780986358_0001_m_000000_0
16/04/03 13:55:57 INFO reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 9062, inMemoryMapOutputs.size() -> 1, commitMemory -> 0, usedMemory ->9062
16/04/03 13:55:57 INFO reduce.EventFetcher: EventFetcher is interrupted.. Returning
16/04/03 13:55:57 INFO mapred.LocalJobRunner: 1 / 1 copied.
16/04/03 13:55:57 INFO reduce.MergeManagerImpl: finalMerge called with 1 in-memory map-outputs and 0 on-disk map-outputs
16/04/03 13:55:57 INFO mapred.Merger: Merging 1 sorted segments
16/04/03 13:55:57 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 9056 bytes
16/04/03 13:55:57 INFO reduce.MergeManagerImpl: Merged 1 segments, 9062 bytes to disk to satisfy reduce memory limit
16/04/03 13:55:57 INFO reduce.MergeManagerImpl: Merging 1 files, 9066 bytes from disk
16/04/03 13:55:57 INFO reduce.MergeManagerImpl: Merging 0 segments, 0 bytes from memory into reduce
16/04/03 13:55:57 INFO mapred.Merger: Merging 1 sorted segments
16/04/03 13:55:57 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 9056 bytes
16/04/03 13:55:57 INFO mapred.LocalJobRunner: 1 / 1 copied.
16/04/03 13:56:00 INFO mapred.LocalJobRunner: reduce > reduce
16/04/03 13:56:00 INFO mapreduce.Job: map 100% reduce 100%
16/04/03 13:56:01 INFO mapred.Task: Task:attempt_local1780986358_0001_r_000000_0 is done. And is in the process of committing
16/04/03 13:56:01 INFO mapred.LocalJobRunner: reduce > reduce
16/04/03 13:56:01 INFO mapred.Task: Task attempt_local1780986358_0001_r_000000_0 is allowed to commit now
16/04/03 13:56:02 INFO output.FileOutputCommitter: Saved output of task 'attempt_local1780986358_0001_r_000000_0' to hdfs://localhost:54310/user/hduser/dblp-small/tokens.phase1-000/_temporary/0/task_local1780986358_0001_r_000000
16/04/03 13:56:02 INFO mapred.LocalJobRunner: reduce > reduce
16/04/03 13:56:02 INFO mapred.Task: Task 'attempt_local1780986358_0001_r_000000_0' done.
16/04/03 13:56:02 INFO mapred.LocalJobRunner: Finishing task: attempt_local1780986358_0001_r_000000_0
16/04/03 13:56:02 INFO mapred.LocalJobRunner: reduce task executor complete.
16/04/03 13:56:02 INFO mapreduce.Job: Job job_local1780986358_0001 completed successfully
16/04/03 13:56:03 INFO mapreduce.Job: Counters: 38
File System Counters
FILE: Number of bytes read=1080562
FILE: Number of bytes written=1589660
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=73374
HDFS: Number of bytes written=12847
HDFS: Number of read operations=15
HDFS: Number of large read operations=0
HDFS: Number of write operations=18
Map-Reduce Framework
Map input records=100
Map output records=1248
Map output bytes=15588
Map output materialized bytes=9066
Input split bytes=120
Combine input records=1248
Combine output records=597
Reduce input groups=597
Reduce shuffle bytes=9066
Reduce input records=597
Reduce output records=597
Spilled Records=1194
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=176
CPU time spent (ms)=0
Physical memory (bytes) snapshot=0
Virtual memory (bytes) snapshot=0
Total committed heap usage (bytes)=241836032
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=36687
File Output Format Counters
Bytes Written=12847
Job ended: Sun Apr 03 13:56:04 IST 2016
The job took 21.44 seconds.
FuzzyJoinDriver(TokensBasic.phase2)
Input Path: {hdfs://localhost:54310/user/hduser/dblp-small/tokens.phase1-000}
Output Path: hdfs://localhost:54310/user/hduser/dblp-small/tokens-000
Map Jobs: 2
Reduce Jobs: 1
Properties: {fuzzyjoin.similarity.name=Jaccard
fuzzyjoin.similarity.threshold=.5
fuzzyjoin.tokenizer=Word
fuzzyjoin.tokens.package=Scalar
fuzzyjoin.tokens.lengthstats=false
fuzzyjoin.ridpairs.group.class=TokenIdentity
fuzzyjoin.ridpairs.group.factor=1
fuzzyjoin.data.tokens=
fuzzyjoin.data.joinindex=}
Job started: Sun Apr 03 13:56:04 IST 2016
16/04/03 13:56:04 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
16/04/03 13:56:04 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
16/04/03 13:56:05 INFO mapred.FileInputFormat: Total input paths to process : 1
16/04/03 13:56:05 INFO mapreduce.JobSubmitter: number of splits:1
16/04/03 13:56:05 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local954589393_0002
16/04/03 13:56:05 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
16/04/03 13:56:05 INFO mapreduce.Job: Running job: job_local954589393_0002
16/04/03 13:56:05 INFO mapred.LocalJobRunner: OutputCommitter set in config null
16/04/03 13:56:05 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapred.FileOutputCommitter
16/04/03 13:56:05 INFO mapred.LocalJobRunner: Waiting for map tasks
16/04/03 13:56:05 INFO mapred.LocalJobRunner: Starting task: attempt_local954589393_0002_m_000000_0
16/04/03 13:56:05 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ]
16/04/03 13:56:05 INFO mapred.MapTask: Processing split: hdfs://localhost:54310/user/hduser/dblp-small/tokens.phase1-000/part-00000:0+12847
16/04/03 13:56:05 INFO mapred.MapTask: numReduceTasks: 1
16/04/03 13:56:06 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
16/04/03 13:56:06 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
16/04/03 13:56:06 INFO mapred.MapTask: soft limit at 83886080
16/04/03 13:56:06 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
16/04/03 13:56:06 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
16/04/03 13:56:06 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
16/04/03 13:56:06 INFO mapred.LocalJobRunner:
16/04/03 13:56:06 INFO mapred.MapTask: Starting flush of map output
16/04/03 13:56:06 INFO mapred.MapTask: Spilling map output
16/04/03 13:56:06 INFO mapred.MapTask: bufstart = 0; bufend = 7866; bufvoid = 104857600
16/04/03 13:56:06 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 26212012(104848048); length = 2385/6553600
16/04/03 13:56:06 INFO mapred.MapTask: Finished spill 0
16/04/03 13:56:06 INFO mapred.Task: Task:attempt_local954589393_0002_m_000000_0 is done. And is in the process of committing
16/04/03 13:56:06 INFO mapred.LocalJobRunner: hdfs://localhost:54310/user/hduser/dblp-small/tokens.phase1-000/part-00000:0+12847
16/04/03 13:56:06 INFO mapred.Task: Task 'attempt_local954589393_0002_m_000000_0' done.
16/04/03 13:56:06 INFO mapred.LocalJobRunner: Finishing task: attempt_local954589393_0002_m_000000_0
16/04/03 13:56:06 INFO mapred.LocalJobRunner: map task executor complete.
16/04/03 13:56:06 INFO mapred.LocalJobRunner: Waiting for reduce tasks
16/04/03 13:56:06 INFO mapred.LocalJobRunner: Starting task: attempt_local954589393_0002_r_000000_0
16/04/03 13:56:06 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ]
16/04/03 13:56:06 INFO mapred.ReduceTask: Using ShuffleConsumerPlugin: org.apache.hadoop.mapreduce.task.reduce.Shuffle#4950dd
16/04/03 13:56:06 INFO reduce.MergeManagerImpl: MergerManager: memoryLimit=363285696, maxSingleShuffleLimit=90821424, mergeThreshold=239768576, ioSortFactor=10, memToMemMergeOutputsThreshold=10
16/04/03 13:56:06 INFO reduce.EventFetcher: attempt_local954589393_0002_r_000000_0 Thread started: EventFetcher for fetching Map Completion Events
16/04/03 13:56:06 INFO reduce.LocalFetcher: localfetcher#2 about to shuffle output of map attempt_local954589393_0002_m_000000_0 decomp: 9062 len: 9066 to MEMORY
16/04/03 13:56:06 INFO reduce.InMemoryMapOutput: Read 9062 bytes from map-output for attempt_local954589393_0002_m_000000_0
16/04/03 13:56:06 INFO reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 9062, inMemoryMapOutputs.size() -> 1, commitMemory -> 0, usedMemory ->9062
16/04/03 13:56:06 INFO reduce.EventFetcher: EventFetcher is interrupted.. Returning
16/04/03 13:56:06 INFO mapred.LocalJobRunner: 1 / 1 copied.
16/04/03 13:56:06 INFO reduce.MergeManagerImpl: finalMerge called with 1 in-memory map-outputs and 0 on-disk map-outputs
16/04/03 13:56:06 INFO mapred.Merger: Merging 1 sorted segments
16/04/03 13:56:06 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 9056 bytes
16/04/03 13:56:06 INFO reduce.MergeManagerImpl: Merged 1 segments, 9062 bytes to disk to satisfy reduce memory limit
16/04/03 13:56:06 INFO reduce.MergeManagerImpl: Merging 1 files, 9066 bytes from disk
16/04/03 13:56:06 INFO reduce.MergeManagerImpl: Merging 0 segments, 0 bytes from memory into reduce
16/04/03 13:56:06 INFO mapred.Merger: Merging 1 sorted segments
16/04/03 13:56:06 INFO mapreduce.Job: Job job_local954589393_0002 running in uber mode : false
16/04/03 13:56:06 INFO mapreduce.Job: map 100% reduce 0%
16/04/03 13:56:06 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 9056 bytes
16/04/03 13:56:06 INFO mapred.LocalJobRunner: 1 / 1 copied.
16/04/03 13:56:06 INFO mapred.Task: Task:attempt_local954589393_0002_r_000000_0 is done. And is in the process of committing
16/04/03 13:56:06 INFO mapred.LocalJobRunner: 1 / 1 copied.
16/04/03 13:56:06 INFO mapred.Task: Task attempt_local954589393_0002_r_000000_0 is allowed to commit now
16/04/03 13:56:06 INFO output.FileOutputCommitter: Saved output of task 'attempt_local954589393_0002_r_000000_0' to hdfs://localhost:54310/user/hduser/dblp-small/tokens-000/_temporary/0/task_local954589393_0002_r_000000
16/04/03 13:56:06 INFO mapred.LocalJobRunner: reduce > reduce
16/04/03 13:56:06 INFO mapred.Task: Task 'attempt_local954589393_0002_r_000000_0' done.
16/04/03 13:56:06 INFO mapred.LocalJobRunner: Finishing task: attempt_local954589393_0002_r_000000_0
16/04/03 13:56:06 INFO mapred.LocalJobRunner: reduce task executor complete.
16/04/03 13:56:07 INFO mapreduce.Job: map 100% reduce 100%
16/04/03 13:56:07 INFO mapreduce.Job: Job job_local954589393_0002 completed successfully
16/04/03 13:56:07 INFO mapreduce.Job: Counters: 38
File System Counters
FILE: Number of bytes read=2179300
FILE: Number of bytes written=3182466
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=99068
HDFS: Number of bytes written=31172
HDFS: Number of read operations=45
HDFS: Number of large read operations=0
HDFS: Number of write operations=30
Map-Reduce Framework
Map input records=597
Map output records=597
Map output bytes=7866
Map output materialized bytes=9066
Input split bytes=126
Combine input records=0
Combine output records=0
Reduce input groups=18
Reduce shuffle bytes=9066
Reduce input records=597
Reduce output records=597
Spilled Records=1194
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=488
CPU time spent (ms)=0
Physical memory (bytes) snapshot=0
Virtual memory (bytes) snapshot=0
Total committed heap usage (bytes)=336207872
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=12847
File Output Format Counters
Bytes Written=5478
Job ended: Sun Apr 03 13:56:07 IST 2016
The job took 3.563 seconds.
Multi-Job ended: Sun Apr 03 13:56:07 IST 2016
The multi-job took 25.128 seconds.
FuzzyJoinDriver(RIDPairsImproved)
Input Path: {hdfs://localhost:54310/user/hduser/dblp-small/records-000}
Output Path: hdfs://localhost:54310/user/hduser/dblp-small/ridpairs-000
Map Jobs: 2
Reduce Jobs: 1
Properties: {fuzzyjoin.similarity.name=Jaccard
fuzzyjoin.similarity.threshold=.5
fuzzyjoin.tokenizer=Word
fuzzyjoin.tokens.package=Scalar
fuzzyjoin.tokens.lengthstats=false
fuzzyjoin.ridpairs.group.class=TokenIdentity
fuzzyjoin.ridpairs.group.factor=1
fuzzyjoin.data.tokens=dblp-small/tokens-000/part-00000
fuzzyjoin.data.joinindex=}
Job started: Sun Apr 03 13:56:08 IST 2016
16/04/03 13:56:08 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
16/04/03 13:56:08 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
16/04/03 13:56:09 INFO mapred.FileInputFormat: Total input paths to process : 1
16/04/03 13:56:09 INFO mapreduce.JobSubmitter: number of splits:1
16/04/03 13:56:09 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local1951342027_0003
16/04/03 13:56:16 INFO mapred.LocalDistributedCacheManager: Creating symlink: /tmp/mapred/local/1459671970648/part-00000 <- /home/midhu/fuzzyjoin/fuzzyjoin-hadoop/part-00000
16/04/03 13:56:16 INFO mapred.LocalDistributedCacheManager: Localized hdfs://localhost:54310/user/hduser/dblp-small/tokens-000/part-00000 as file:/tmp/mapred/local/1459671970648/part-00000
16/04/03 13:56:17 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
16/04/03 13:56:17 INFO mapreduce.Job: Running job: job_local1951342027_0003
16/04/03 13:56:17 INFO mapred.LocalJobRunner: OutputCommitter set in config null
16/04/03 13:56:17 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapred.FileOutputCommitter
16/04/03 13:56:17 INFO mapred.LocalJobRunner: Waiting for map tasks
16/04/03 13:56:17 INFO mapred.LocalJobRunner: Starting task: attempt_local1951342027_0003_m_000000_0
16/04/03 13:56:17 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ]
16/04/03 13:56:17 INFO mapred.MapTask: Processing split: hdfs://localhost:54310/user/hduser/dblp-small/records-000/part-00000:0+36687
16/04/03 13:56:17 INFO mapred.MapTask: numReduceTasks: 1
16/04/03 13:56:17 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
16/04/03 13:56:17 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
16/04/03 13:56:17 INFO mapred.MapTask: soft limit at 83886080
16/04/03 13:56:17 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
16/04/03 13:56:17 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
16/04/03 13:56:17 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
16/04/03 13:56:17 INFO mapred.LocalJobRunner: map task executor complete.
16/04/03 13:56:17 WARN mapred.LocalJobRunner: job_local1951342027_0003
java.lang.Exception: java.lang.RuntimeException: Error in configuring object
at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
Caused by: java.lang.RuntimeException: Error in configuring object
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:446)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
... 10 more
Caused by: java.lang.RuntimeException: Error in configuring object
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38)
... 15 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
... 18 more
Caused by: java.lang.RuntimeException: java.io.FileNotFoundException: file:/tmp/mapred/local/1459671970648/part-00000 (No such file or directory)
at edu.uci.ics.fuzzyjoin.tokenorder.TokenLoad.loadTokenRank(TokenLoad.java:60)
at edu.uci.ics.fuzzyjoin.tokenorder.TokenLoad.loadTokenRank(TokenLoad.java:40)
at edu.uci.ics.fuzzyjoin.hadoop.ridpairs.token.MapSelfJoin.configure(MapSelfJoin.java:98)
... 23 more
Caused by: java.io.FileNotFoundException: file:/tmp/mapred/local/1459671970648/part-00000 (No such file or directory)
at java.io.FileInputStream.open(Native Method)
at java.io.FileInputStream.<init>(FileInputStream.java:146)
at java.io.FileInputStream.<init>(FileInputStream.java:101)
at edu.uci.ics.fuzzyjoin.tokenorder.TokenLoad.loadTokenRank(TokenLoad.java:45)
... 25 more
16/04/03 13:56:18 INFO mapreduce.Job: Job job_local1951342027_0003 running in uber mode : false
16/04/03 13:56:18 INFO mapreduce.Job: map 0% reduce 0%
16/04/03 13:56:18 INFO mapreduce.Job: Job job_local1951342027_0003 failed with state FAILED due to: NA
16/04/03 13:56:18 INFO mapreduce.Job: Counters: 0
java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:836)
at edu.uci.ics.fuzzyjoin.hadoop.FuzzyJoinDriver.run(FuzzyJoinDriver.java:179)
at edu.uci.ics.fuzzyjoin.hadoop.ridpairs.RIDPairsImproved.main(RIDPairsImproved.java:108)
at edu.uci.ics.fuzzyjoin.hadoop.FuzzyJoin.bib(FuzzyJoin.java:39)
at edu.uci.ics.fuzzyjoin.hadoop.FuzzyJoin.main(FuzzyJoin.java:86)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71)
at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:152)
at edu.uci.ics.fuzzyjoin.hadoop.FuzzyJoinDriver.main(FuzzyJoinDriver.java:121)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
I think this is the configuration error of hadoop in ubuntu i used the configuration from this tutorial
http://www.bogotobogo.com/Hadoop/BigData_hadoop_Install_on_ubuntu_single_node_cluster.php
Finally i succeeded to run the code and correct the error. The error was due to running the mapreduce program locally in the machine i changed it to run in yarn and the code works fine for all the type of datas
I am trying to run the sample example in this tutorial about Hadoop Pipes:
I'm succeeding in compiling and everything. However, after it runs it shows me NullPointerException error. I tried many ways and read many similar questions, but wasn't able to find an actual solution for this problem.
Note: I am running on a single machine in a pseudo-distributed environment.
hadoop pipes -D hadoop.pipes.java.recordreader=true -D hadoop.pipes.java.recordwriters=true -input /input -output /output -program /bin/wordcount
DEPRECATED: Use of this script to execute mapred command is deprecated.
Instead use the mapred command for it.
15/02/18 01:09:02 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
15/02/18 01:09:02 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
15/02/18 01:09:02 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
15/02/18 01:09:03 WARN mapreduce.JobSubmitter: No job jar file set. User classes may not be found. See Job or Job#setJar(String).
15/02/18 01:09:04 INFO mapred.FileInputFormat: Total input paths to process : 1
15/02/18 01:09:04 INFO mapreduce.JobSubmitter: number of splits:1
15/02/18 01:09:04 INFO Configuration.deprecation: hadoop.pipes.java.recordreader is deprecated. Instead, use mapreduce.pipes.isjavarecordreader
15/02/18 01:09:04 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local143452495_0001
15/02/18 01:09:06 INFO mapred.LocalDistributedCacheManager: Localized hdfs://localhost:9000/bin/wordcount as file:/tmp/hadoop-abdulrahman/mapred/local/1424214545411/wordcount
15/02/18 01:09:06 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
15/02/18 01:09:06 INFO mapred.LocalJobRunner: OutputCommitter set in config null
15/02/18 01:09:06 INFO mapreduce.Job: Running job: job_local143452495_0001
15/02/18 01:09:06 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapred.FileOutputCommitter
15/02/18 01:09:06 INFO mapred.LocalJobRunner: Waiting for map tasks
15/02/18 01:09:06 INFO mapred.LocalJobRunner: Starting task: attempt_local143452495_0001_m_000000_0
15/02/18 01:09:06 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ]
15/02/18 01:09:06 INFO mapred.MapTask: Processing split: hdfs://localhost:9000/input/data.txt:0+68
15/02/18 01:09:07 INFO mapred.MapTask: numReduceTasks: 1
15/02/18 01:09:07 INFO mapreduce.Job: Job job_local143452495_0001 running in uber mode : false
15/02/18 01:09:07 INFO mapreduce.Job: map 0% reduce 0%
15/02/18 01:09:07 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
15/02/18 01:09:07 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
15/02/18 01:09:07 INFO mapred.MapTask: soft limit at 83886080
15/02/18 01:09:07 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
15/02/18 01:09:07 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
15/02/18 01:09:07 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
15/02/18 01:09:08 INFO mapred.LocalJobRunner: map task executor complete.
15/02/18 01:09:08 WARN mapred.LocalJobRunner: job_local143452495_0001
java.lang.Exception: java.lang.NullPointerException
at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
Caused by: java.lang.NullPointerException
at org.apache.hadoop.mapred.pipes.Application.<init>(Application.java:104)
at org.apache.hadoop.mapred.pipes.PipesMapRunner.run(PipesMapRunner.java:69)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
15/02/18 01:09:08 INFO mapreduce.Job: Job job_local143452495_0001 failed with state FAILED due to: NA
15/02/18 01:09:08 INFO mapreduce.Job: Counters: 0
Exception in thread "main" java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:836)
at org.apache.hadoop.mapred.pipes.Submitter.runJob(Submitter.java:264)
at org.apache.hadoop.mapred.pipes.Submitter.run(Submitter.java:503)
at org.apache.hadoop.mapred.pipes.Submitter.main(Submitter.java:518)
Edit: I downloaded the sourcecode of hadoop and tracked where the exception is happening, it seems that the exception occurs in the initialization stage, and thus the code inside the mapper/reducer isn't really the problem.
The function in Hadoop that produces the exception is this one:
/** Run a set of tasks and waits for them to complete. */
435 private void runTasks(List<RunnableWithThrowable> runnables,
436 ExecutorService service, String taskType) throws Exception {
437 // Start populating the executor with work units.
438 // They may begin running immediately (in other threads).
439 for (Runnable r : runnables) {
440 service.submit(r);
441 }
442
443 try {
444 service.shutdown(); // Instructs queue to drain.
445
446 // Wait for tasks to finish; do not use a time-based timeout.
447 // (See http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6179024)
448 LOG.info("Waiting for " + taskType + " tasks");
449 service.awaitTermination(Long.MAX_VALUE, TimeUnit.NANOSECONDS);
450 } catch (InterruptedException ie) {
451 // Cancel all threads.
452 service.shutdownNow();
453 throw ie;
454 }
455
456 LOG.info(taskType + " task executor complete.");
457
458 // After waiting for the tasks to complete, if any of these
459 // have thrown an exception, rethrow it now in the main thread context.
460 for (RunnableWithThrowable r : runnables) {
461 if (r.storedException != null) {
462 throw new Exception(r.storedException);
463 }
464 }
465 }
The problem though is that it is storing the exception and then throwing it, which is preventing me from knowing the actual source of the exception.
Any help?
Also, if you need me to post more details please let me know.
Thank you,
So after a lot of research, I found out that the problem was actually caused by this line in pipes/Application.java (line 104):
byte[] password= jobToken.getPassword();
I changed the code and recompiled hadoop:
byte[] password= "no password".getBytes();
if (jobToken != null)
{
password= jobToken.getPassword();
}
I got this from here
This solved the problem, and my program currently runs, but I am facing another problem where the program actually hangs at map 0% reduce 0%
I will open another topic for that question.
Thank you,
My input is many text files. I want my map-reduce program to write all the files-names and the associated sentences with the file names in one output file, where I want to just emit the file-name(key) and the associated sentences(value) from the mapper. The reducer will collect the key and all the values and write the file-name and their associated sentences in the output.
Here is the code of my mapper and reducer:
public class WordCount {
public static class Map extends MapReduceBase implements Mapper<LongWritable,
Text, Text, Text> {
public void map(LongWritable key, Text value, OutputCollector<Text,Text>
output, Reporter reporter) throws IOException {
String filename = new String();
FileSplit filesplit = (FileSplit)reporter.getInputSplit();
filename=filesplit.getPath().getName();
output.collect(new Text(filename), value);
}
}
public static class Reduce extends MapReduceBase implements Reducer<Text, Text,
Text, Text> {
public void reduce(Text key, Iterable<Text> values, OutputCollector<Text,
Text> output, Reporter reporter) throws IOException {
StringBuilder builder = new StringBuilder();
for(Text value : values) {
String str = value.toString();
builder.append(str);
}
String valueToWrite=builder.toString();
output.collect(key, new Text(valueToWrite));
}
#Override
public void reduce(Text arg0, Iterator<Text> arg1,
OutputCollector<Text, Text> arg2, Reporter arg3)
throws IOException {
}
}
public static void main(String[] args) throws Exception {
JobConf conf = new JobConf(WordCount.class);
conf.setJobName("wordcount");
conf.setMapperClass(Map.class);
conf.setReducerClass(Reduce.class);
conf.setJarByClass(WordCount.class);
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(Text.class);
conf.setInputFormat(TextInputFormat.class);
conf.setOutputFormat(TextOutputFormat.class);
conf.setNumReduceTasks(1);
FileInputFormat.setInputPaths(conf, new Path(args[0]));
FileOutputFormat.setOutputPath(conf, new Path(args[1]));
JobClient.runJob(conf);
}
}
The output is as follows:
14/03/21 00:38:27 WARN util.NativeCodeLoader: Unable to load native-hadoop library
for your platform... using builtin-java classes where applicable
14/03/21 00:38:27 WARN mapred.JobClient: Use GenericOptionsParser for parsing the
arguments. Applications should implement Tool for the same.
14/03/21 00:38:27 WARN mapred.JobClient: No job jar file set. User classes may not
be found. See JobConf(Class) or JobConf#setJar(String).
14/03/21 00:38:27 WARN snappy.LoadSnappy: Snappy native library not loaded
14/03/21 00:38:27 INFO mapred.FileInputFormat: Total input paths to process : 2
14/03/21 00:38:27 INFO mapred.JobClient: Running job: job_local_0001
14/03/21 00:38:27 INFO util.ProcessTree: setsid exited with exit code 0
14/03/21 00:38:27 INFO mapred.Task: Using ResourceCalculatorPlugin :
org.apache.hadoop.util.LinuxResourceCalculatorPlugin#4911b910
14/03/21 00:38:27 INFO mapred.MapTask: numReduceTasks: 1
14/03/21 00:38:27 INFO mapred.MapTask: io.sort.mb = 100
14/03/21 00:38:27 INFO mapred.MapTask: data buffer = 79691776/99614720
14/03/21 00:38:27 INFO mapred.MapTask: record buffer = 262144/327680
14/03/21 00:38:27 INFO mapred.MapTask: Starting flush of map output
14/03/21 00:38:27 INFO mapred.MapTask: Finished spill 0
14/03/21 00:38:27 INFO mapred.Task: Task:attempt_local_0001_m_000000_0 is done. And
is in the process of commiting
14/03/21 00:38:28 INFO mapred.JobClient: map 0% reduce 0%
14/03/21 00:38:30 INFO mapred.LocalJobRunner:
file:/root/Desktop/wordcount/sample.txt:0+5371
14/03/21 00:38:30 INFO mapred.Task: Task 'attempt_local_0001_m_000000_0' done.
14/03/21 00:38:30 INFO mapred.Task: Using ResourceCalculatorPlugin :
org.apache.hadoop.util.LinuxResourceCalculatorPlugin#1f8166e5
14/03/21 00:38:30 INFO mapred.MapTask: numReduceTasks: 1
14/03/21 00:38:30 INFO mapred.MapTask: io.sort.mb = 100
14/03/21 00:38:30 INFO mapred.MapTask: data buffer = 79691776/99614720
14/03/21 00:38:30 INFO mapred.MapTask: record buffer = 262144/327680
14/03/21 00:38:30 INFO mapred.MapTask: Starting flush of map output
14/03/21 00:38:30 INFO mapred.MapTask: Finished spill 0
14/03/21 00:38:30 INFO mapred.Task: Task:attempt_local_0001_m_000001_0 is done. And
is in the process of commiting
14/03/21 00:38:31 INFO mapred.JobClient: map 100% reduce 0%
14/03/21 00:38:33 INFO mapred.LocalJobRunner:
file:/root/Desktop/wordcount/sample.txt~:0+587
14/03/21 00:38:33 INFO mapred.Task: Task 'attempt_local_0001_m_000001_0' done.
14/03/21 00:38:33 INFO mapred.Task: Using ResourceCalculatorPlugin :
org.apache.hadoop.util.LinuxResourceCalculatorPlugin#3963b3e
14/03/21 00:38:33 INFO mapred.LocalJobRunner:
14/03/21 00:38:33 INFO mapred.Merger: Merging 2 sorted segments
14/03/21 00:38:33 INFO mapred.Merger: Down to the last merge-pass, with 2 segments
left of total size: 7549 bytes
14/03/21 00:38:33 INFO mapred.LocalJobRunner:
14/03/21 00:38:33 INFO mapred.Task: Task:attempt_local_0001_r_000000_0 is done. And
is in the process of commiting
14/03/21 00:38:33 INFO mapred.LocalJobRunner:
14/03/21 00:38:33 INFO mapred.Task: Task attempt_local_0001_r_000000_0 is allowed to
commit now
14/03/21 00:38:33 INFO mapred.FileOutputCommitter: Saved output of task
'attempt_local_0001_r_000000_0' to file:/root/Desktop/wordcount/output
14/03/21 00:38:36 INFO mapred.LocalJobRunner: reduce > reduce
14/03/21 00:38:36 INFO mapred.Task: Task 'attempt_local_0001_r_000000_0' done.
14/03/21 00:38:37 INFO mapred.JobClient: map 100% reduce 100%
14/03/21 00:38:37 INFO mapred.JobClient: Job complete: job_local_0001
14/03/21 00:38:37 INFO mapred.JobClient: Counters: 21
14/03/21 00:38:37 INFO mapred.JobClient: File Input Format Counters
14/03/21 00:38:37 INFO mapred.JobClient: Bytes Read=5958
14/03/21 00:38:37 INFO mapred.JobClient: File Output Format Counters
14/03/21 00:38:37 INFO mapred.JobClient: Bytes Written=8
14/03/21 00:38:37 INFO mapred.JobClient: FileSystemCounters
14/03/21 00:38:37 INFO mapred.JobClient: FILE_BYTES_READ=26020
14/03/21 00:38:37 INFO mapred.JobClient: FILE_BYTES_WRITTEN=117337
14/03/21 00:38:37 INFO mapred.JobClient: Map-Reduce Framework
14/03/21 00:38:37 INFO mapred.JobClient: Map output materialized bytes=7557
14/03/21 00:38:37 INFO mapred.JobClient: Map input records=122
14/03/21 00:38:37 INFO mapred.JobClient: Reduce shuffle bytes=0
14/03/21 00:38:37 INFO mapred.JobClient: Spilled Records=244
14/03/21 00:38:37 INFO mapred.JobClient: Map output bytes=7301
14/03/21 00:38:37 INFO mapred.JobClient: Total committed heap usage
(bytes)=954925056
14/03/21 00:38:37 INFO mapred.JobClient: CPU time spent (ms)=0
14/03/21 00:38:37 INFO mapred.JobClient: Map input bytes=5958
14/03/21 00:38:37 INFO mapred.JobClient: SPLIT_RAW_BYTES=185
14/03/21 00:38:37 INFO mapred.JobClient: Combine input records=0
14/03/21 00:38:37 INFO mapred.JobClient: Reduce input records=0
14/03/21 00:38:37 INFO mapred.JobClient: Reduce input groups=2
14/03/21 00:38:37 INFO mapred.JobClient: Combine output records=0
14/03/21 00:38:37 INFO mapred.JobClient: Physical memory (bytes) snapshot=0
14/03/21 00:38:37 INFO mapred.JobClient: Reduce output records=0
14/03/21 00:38:37 INFO mapred.JobClient: Virtual memory (bytes) snapshot=0
14/03/21 00:38:37 INFO mapred.JobClient: Map output records=122
When I run the above mapper and reducer with the same configuration of inputformat (keyvaluetextinputformat.class) it does not write anything in the output.
What should I change to achieve my goal?
KeyValueTextInputFormat is not correct input format for your case. If you want to use this input format, each line in your input should contain a key,value pair separated by user specified delimiter or tab by default.But in your case input is "Set of files" and you want output of job to be "filename,content of file".
One of ways to achieve this would be to use TextInputFormat as input format. I have tested below and it works.
Get file name and content of file in map function
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException
{
String filename = new String();
FileSplit filesplit = (FileSplit)context.getInputSplit();
filename=filesplit.getPath().getName();
context.write(new Text(filename), new Text(value));
}
In reduce function we build string of all values which will be contents of file
public void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException
{
StringBuilder builder= new StringBuilder();
for (Text value : values)
{
String str = value.toString();
builder.append(str);
}
String valueToWrite= builder.toString();
context.write(key, new Text(valueToWrite));
}
}
Finally in job driver class set inputformat to our custom format and no of reducers to 1
job.setInputFormatClass(TextInputFormat.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
job.setMapperClass(myMapper.class);
job.setReducerClass(myReducer.class);
job.setNumReduceTasks(1);