HADOOP - Emitting JSON From Map Reduce Job

HADOOP - Emitting JSON From Map Reduce Job - java

I am working on a demo of using map reduce to transform a delimited file into a file of serialized json records. I am using Jackson, but when I run my job, the map portion fails after emitting several seemingly jackson related errors:
$ hadoop jar target/map-demo.jar input output
2013-09-16 15:27:25.046 java[7250:1703] Unable to load realm info from SCDynamicStore
13/09/16 15:27:25 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
13/09/16 15:27:25 INFO input.FileInputFormat: Total input paths to process : 1
13/09/16 15:27:25 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
13/09/16 15:27:25 WARN snappy.LoadSnappy: Snappy native library not loaded
13/09/16 15:27:25 INFO mapred.JobClient: Running job: job_201309161312_0011
13/09/16 15:27:26 INFO mapred.JobClient: map 0% reduce 0%
13/09/16 15:27:30 INFO mapred.JobClient: Task Id : attempt_201309161312_0011_m_000000_0, Status : FAILED
Error: org.codehaus.jackson.map.ObjectMapper.setVisibility(Lorg/codehaus/jackson/annotate/JsonMethod;Lorg/codehaus/jackson/annotate/JsonAutoDetect$Visibility;)Lorg/codehaus/jackson/map/ObjectMapper;
attempt_201309161312_0011_m_000000_0: 2013-09-16 15:27:27.856 java[7286:1703] Unable to load realm info from SCDynamicStore
13/09/16 15:27:32 INFO mapred.JobClient: Task Id : attempt_201309161312_0011_m_000000_1, Status : FAILED
Error: org.codehaus.jackson.map.ObjectMapper.setVisibility(Lorg/codehaus/jackson/annotate/JsonMethod;Lorg/codehaus/jackson/annotate/JsonAutoDetect$Visibility;)Lorg/codehaus/jackson/map/ObjectMapper;
attempt_201309161312_0011_m_000000_1: 2013-09-16 15:27:30.566 java[7304:1703] Unable to load realm info from SCDynamicStore
13/09/16 15:27:35 INFO mapred.JobClient: Task Id : attempt_201309161312_0011_m_000000_2, Status : FAILED
Error: org.codehaus.jackson.map.ObjectMapper.setVisibility(Lorg/codehaus/jackson/annotate/JsonMethod;Lorg/codehaus/jackson/annotate/JsonAutoDetect$Visibility;)Lorg/codehaus/jackson/map/ObjectMapper;
attempt_201309161312_0011_m_000000_2: 2013-09-16 15:27:33.298 java[7334:1703] Unable to load realm info from SCDynamicStore
13/09/16 15:27:39 INFO mapred.JobClient: Job complete: job_201309161312_0011
13/09/16 15:27:40 INFO mapred.JobClient: Counters: 7
13/09/16 15:27:40 INFO mapred.JobClient: Job Counters
13/09/16 15:27:40 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=6476
13/09/16 15:27:40 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
13/09/16 15:27:40 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
13/09/16 15:27:40 INFO mapred.JobClient: Launched map tasks=4
13/09/16 15:27:40 INFO mapred.JobClient: Data-local map tasks=4
13/09/16 15:27:40 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0
13/09/16 15:27:40 INFO mapred.JobClient: Failed map tasks=1
I have a unit test that does the exact same thing as the map reduce job, but single threaded in a local file system. That works well:
Here is my job setup
import java.io.IOException;
import com.example.text.Parser;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class MapDemo {
public static class Map extends Mapper<Object, Text, Text, NullWritable> {
private Text text = new Text();
public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
String line = value.toString();
Record record = Parser.toJson(line);
text.set(json);
context.write(text, NullWritable.get());
}
}
public static class Reduce extends Reducer<Text, NullWritable, Text, NullWritable> {
public void reduce(Text key, Iterable<NullWritable> values, Context context) throws IOException, InterruptedException {
context.write(key, NullWritable.get());
}
}
public static void main(String[] args) throws Exception {
Configuration configuration = new Configuration();
Job job = new Job(configuration, "MapDemo");
job.setJarByClass(MapDemo.class);
job.setMapperClass(Map.class);
job.setReducerClass(Reduce.class);
job.setNumReduceTasks(1);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(NullWritable.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
My toJson method is as follows:
public String toJson() {
mapper.setVisibility(JsonMethod.FIELD, Visibility.ANY);
try {
return mapper.writeValueAsString(this);
} catch (IOException e) {
e.printStackTrace();
}
return null;
}
I am not sure which log files to look at etc. Is there something obvious here that Im doing wrong? What should I do next?

The problem is caused by the call to
mapper.setVisibility(JsonMethod.FIELD, Visibility.ANY);
Removing this makes it work.

Related

MapReduce program in Hadoop that implements a simple “People You Might Know”

The input file contains the adjacency list and has multiple lines in the following format:
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class Friends
{
public class FriendsMap extends Mapper < LongWritable, Text, Text, IntWritable >
{
private Text friendsAB;
private Text friendsBA;
private IntWritable one = new IntWritable(1);
private IntWritable oneLess = new IntWritable(-999999999);
//#SuppressWarnings("null")
#Override
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException
{
String friendsOfA[] = null; //This will be all of the friends of the user in this row
String oneRow[] = value.toString().split("\t,"); //Break the row up into users IDs
String userA = oneRow[0]; //This is the main user for this row
for (int i=1; i < oneRow.length; i++) //Create an array of the rest of the users in this row
{
friendsOfA[i-1] = oneRow[i];
}
for (int i=0; i < oneRow.length; i++) //Output the main user in pairs with all friends plus a lagre negative #
{
friendsAB.set(userA + " " + friendsOfA[i]);
context.write(friendsAB, oneLess);
System.out.println(friendsAB + " " + oneLess);
}
for (int i = 0; i < friendsOfA.length; i++) //Output each friend pair plus the number 1
{
for (int j = i + 1; j < friendsOfA.length; j++)
{
friendsAB.set(friendsOfA[i] + " " + friendsOfA[j]);
friendsBA.set(friendsOfA[j] + " " + friendsOfA[i]);
context.write(friendsAB, one);
context.write(friendsBA, one);
System.out.println(friendsAB + " " + one);
System.out.println(friendsBA + " " + one);
}
}
}
}
class FriendReducer extends Reducer < Text, IntWritable, Text, IntWritable >
{
private IntWritable result = new IntWritable();
#Override
public void reduce( Text key, Iterable < IntWritable > values, Context context) throws IOException, InterruptedException
{
int sum = 0;
for (IntWritable val : values)
{
sum += val.get();
}
if (sum > 1)
{
result.set( sum);
context.write( key, result);
}
//At this point I have all pairs of users with recomenede friends and a count of how many times they each
//friend has been recomended to a user.
//I need to sort by user and then by number of recomendations.
//Then print the user <tab> all recomendations with commas between them.
}
}
public static void main( String[] args) throws Exception
{
Configuration conf = new Configuration();
Job job = Job.getInstance( conf, "Friends");
job.setJarByClass(Friends.class);
FileInputFormat.addInputPath( job, new Path("input"));
FileOutputFormat.setOutputPath( job, new Path("output"));
job.setMapperClass( FriendsMap.class);
job.setCombinerClass( FriendReducer.class);
job.setReducerClass( FriendReducer.class);
job.setOutputKeyClass( Text.class);
job.setOutputValueClass( IntWritable.class);
System.exit( job.waitForCompletion( true) ? 0 : 1);
}
}
This is the errors I am getting in the console.
17/11/15 16:05:51 WARN util.NativeCodeLoader: Unable to load
native-hadoop library for your platform... using builtin-java classes
where applicable 17/11/15 16:06:54 INFO Configuration.deprecation:
session.id is deprecated. Instead, use dfs.metrics.session-id 17/11/15
16:06:54 INFO jvm.JvmMetrics: Initializing JVM Metrics with
processName=JobTracker, sessionId= 17/11/15 16:06:54 WARN
mapred.JobClient: Use GenericOptionsParser for parsing the arguments.
Applications should implement Tool for the same. 17/11/15 16:06:55
WARN mapred.JobClient: No job jar file set. User classes may not be
found. See JobConf(Class) or JobConf#setJar(String). 17/11/15 16:06:55
INFO input.FileInputFormat: Total input paths to process : 2 17/11/15
16:07:05 INFO mapred.JobClient: Running job: job_local426825952_0001
17/11/15 16:07:05 INFO mapred.LocalJobRunner: OutputCommitter set in
config null 17/11/15 16:07:05 INFO mapred.LocalJobRunner:
OutputCommitter is
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter 17/11/15
16:07:05 INFO mapred.LocalJobRunner: Waiting for map tasks 17/11/15
16:07:05 INFO mapred.LocalJobRunner: Starting task:
attempt_local426825952_0001_m_000000_0 17/11/15 16:07:05 WARN
mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is
deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead
17/11/15 16:07:05 INFO util.ProcessTree: setsid exited with exit code
0 17/11/15 16:07:05 INFO mapred.Task: Using ResourceCalculatorPlugin
: org.apache.hadoop.util.LinuxResourceCalculatorPlugin#670217f0
17/11/15 16:07:05 INFO mapred.LocalJobRunner: Starting task:
attempt_local426825952_0001_m_000001_0 17/11/15 16:07:05 WARN
mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is
deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead
17/11/15 16:07:05 INFO mapred.Task: Using ResourceCalculatorPlugin :
org.apache.hadoop.util.LinuxResourceCalculatorPlugin#1688e9ff 17/11/15
16:07:05 INFO mapred.LocalJobRunner: Map task executor complete.
17/11/15 16:07:05 WARN mapred.LocalJobRunner: job_local426825952_0001
java.lang.Exception: java.lang.RuntimeException:
java.lang.NoSuchMethodException: Friends$FriendsMap.() at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:406)
Caused by: java.lang.RuntimeException:
java.lang.NoSuchMethodException: Friends$FriendsMap.() at
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:131)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:631)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330) at
org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:268)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262) at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745) Caused by:
java.lang.NoSuchMethodException: Friends$FriendsMap.() at
java.lang.Class.getConstructor0(Class.java:2849) at
java.lang.Class.getDeclaredConstructor(Class.java:2053) at
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:125)
... 8 more 17/11/15 16:07:06 INFO mapred.JobClient: map 0% reduce 0%
17/11/15 16:07:06 INFO mapred.JobClient: Job complete:
job_local426825952_0001 17/11/15 16:07:06 INFO mapred.JobClient:
Counters: 0
After changing the classes to Static this is the new errors.
17/11/16 04:28:50 WARN util.NativeCodeLoader: Unable to load
native-hadoop library for your platform... using builtin-java classes
where applicable 17/11/16 04:28:52 INFO Configuration.deprecation:
session.id is deprecated. Instead, use dfs.metrics.session-id 17/11/16
04:28:52 INFO jvm.JvmMetrics: Initializing JVM Metrics with
processName=JobTracker, sessionId= 17/11/16 04:28:52 WARN
mapred.JobClient: Use GenericOptionsParser for parsing the arguments.
Applications should implement Tool for the same. 17/11/16 04:28:52
WARN mapred.JobClient: No job jar file set. User classes may not be
found. See JobConf(Class) or JobConf#setJar(String). 17/11/16 04:28:53
INFO input.FileInputFormat: Total input paths to process : 2 17/11/16
04:28:54 INFO mapred.LocalJobRunner: OutputCommitter set in config
null 17/11/16 04:28:54 INFO mapred.JobClient: Running job:
job_local1593958162_0001 17/11/16 04:28:54 INFO mapred.LocalJobRunner:
OutputCommitter is
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter 17/11/16
04:28:54 INFO mapred.LocalJobRunner: Waiting for map tasks 17/11/16
04:28:54 INFO mapred.LocalJobRunner: Starting task:
attempt_local1593958162_0001_m_000000_0 17/11/16 04:28:54 WARN
mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is
deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead
17/11/16 04:28:54 INFO util.ProcessTree: setsid exited with exit code
0 17/11/16 04:28:54 INFO mapred.Task: Using ResourceCalculatorPlugin
: org.apache.hadoop.util.LinuxResourceCalculatorPlugin#57d51956
17/11/16 04:28:54 INFO mapred.MapTask: Processing split:
file:/home/cloudera/workspace/Assignment4/input/Sample4.txt:0+4106187
17/11/16 04:28:54 INFO mapred.MapTask: Map output collector class =
org.apache.hadoop.mapred.MapTask$MapOutputBuffer 17/11/16 04:28:54
INFO mapred.MapTask: io.sort.mb = 100 17/11/16 04:28:55 INFO
mapred.MapTask: data buffer = 79691776/99614720 17/11/16 04:28:55 INFO
mapred.MapTask: record buffer = 262144/327680 17/11/16 04:28:55 INFO
mapred.LocalJobRunner: Starting task:
attempt_local1593958162_0001_m_000001_0 17/11/16 04:28:55 WARN
mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is
deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead
17/11/16 04:28:55 INFO mapred.Task: Using ResourceCalculatorPlugin :
org.apache.hadoop.util.LinuxResourceCalculatorPlugin#774140b3 17/11/16
04:28:55 INFO mapred.MapTask: Processing split:
file:/home/cloudera/workspace/Assignment4/input/Sample4.txt~:0+0
17/11/16 04:28:55 INFO mapred.MapTask: Map output collector class =
org.apache.hadoop.mapred.MapTask$MapOutputBuffer 17/11/16 04:28:55
INFO mapred.MapTask: io.sort.mb = 100 17/11/16 04:28:55 INFO
mapred.JobClient: map 0% reduce 0% 17/11/16 04:28:55 INFO
mapred.MapTask: data buffer = 79691776/99614720 17/11/16 04:28:55 INFO
mapred.MapTask: record buffer = 262144/327680 17/11/16 04:28:55 INFO
mapred.LocalJobRunner: 17/11/16 04:28:55 INFO mapred.MapTask:
Starting flush of map output 17/11/16 04:28:55 INFO mapred.Task:
Task:attempt_local1593958162_0001_m_000001_0 is done. And is in the
process of commiting 17/11/16 04:28:55 INFO mapred.LocalJobRunner:
17/11/16 04:28:55 INFO mapred.Task: Task
'attempt_local1593958162_0001_m_000001_0' done. 17/11/16 04:28:55 INFO
mapred.LocalJobRunner: Finishing task:
attempt_local1593958162_0001_m_000001_0 17/11/16 04:28:55 INFO
mapred.LocalJobRunner: Map task executor complete. 17/11/16 04:28:55
WARN mapred.LocalJobRunner: job_local1593958162_0001
java.lang.Exception: java.lang.NullPointerException at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:406)
Caused by: java.lang.NullPointerException at
Friends$FriendsMap.map(Friends.java:36) at
Friends$FriendsMap.map(Friends.java:1) at
org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:140) at
org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:672) at
org.apache.hadoop.mapred.MapTask.run(MapTask.java:330) at
org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:268)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262) at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745) 17/11/16 04:28:56 INFO
mapred.JobClient: Job complete: job_local1593958162_0001 17/11/16
04:28:56 INFO mapred.JobClient: Counters: 16 17/11/16 04:28:56 INFO
mapred.JobClient: File System Counters 17/11/16 04:28:56 INFO
mapred.JobClient: FILE: Number of bytes read=4674 17/11/16
04:28:56 INFO mapred.JobClient: FILE: Number of bytes
written=139416 17/11/16 04:28:56 INFO mapred.JobClient: FILE:
Number of read operations=0 17/11/16 04:28:56 INFO mapred.JobClient:
FILE: Number of large read operations=0 17/11/16 04:28:56 INFO
mapred.JobClient: FILE: Number of write operations=0 17/11/16
04:28:56 INFO mapred.JobClient: Map-Reduce Framework 17/11/16
04:28:56 INFO mapred.JobClient: Map input records=0 17/11/16
04:28:56 INFO mapred.JobClient: Map output records=0 17/11/16
04:28:56 INFO mapred.JobClient: Map output bytes=0 17/11/16
04:28:56 INFO mapred.JobClient: Input split bytes=125 17/11/16
04:28:56 INFO mapred.JobClient: Combine input records=0 17/11/16
04:28:56 INFO mapred.JobClient: Combine output records=0 17/11/16
04:28:56 INFO mapred.JobClient: Spilled Records=0 17/11/16
04:28:56 INFO mapred.JobClient: CPU time spent (ms)=0 17/11/16
04:28:56 INFO mapred.JobClient: Physical memory (bytes) snapshot=0
17/11/16 04:28:56 INFO mapred.JobClient: Virtual memory (bytes)
snapshot=0 17/11/16 04:28:56 INFO mapred.JobClient: Total
committed heap usage (bytes)=363696128
I think this is the part that is the most troublesome.
Screen Shot of some errors
This is the updated code.
public static class FriendsMap extends Mapper < LongWritable, Text, Text, IntWritable >
{
//#SuppressWarnings("null")
#Override
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException
{
String friendsOfA[]; //This will be all of the friends of the user in this row
friendsOfA = new String[] {};
String friendsAB = "1"; //This will be used to create pairs of users
String friendsBA = "2"; //This will be used to create pairs of users
Text pairA;
Text pairB;
IntWritable one = new IntWritable(1); //1 if they are not an existing pair here
IntWritable oneLess = new IntWritable(-999999999); // if they are an existing pair
String oneRow[] = value.toString().split("\t,"); //Break the row up into users IDs
Text userA = new Text(oneRow[0]); //This is the main user for this row
for (int i=1; i < oneRow.length; i++) //Create an array of the rest of the users in this row
{
friendsOfA[i-1] = oneRow[i];
}
for (int i=0; i < oneRow.length; i++) //Output the main user in pairs with all friends plus a large negative #
{ //We do not want to recommend them as friends because they are friends
Text FOA = new Text (friendsOfA[i]);
friendsAB = (userA + " " + FOA);
Text pair = new Text (friendsAB);
context.write(pair, oneLess);
System.out.println(pair + " " + oneLess);
}
for (int i = 0; i < friendsOfA.length; i++) //Output each friend pair plus the number 1
{ //We want to recommend them as potential friends
for (int j = i + 1; j < friendsOfA.length; j++)
{
Text FOA = new Text (friendsOfA[i]);
Text FOB = new Text (friendsOfA[j]);
friendsAB = (FOA + " " + FOB);
friendsBA = (FOB + " " + FOA);
pairA = new Text (friendsAB);
pairB = new Text (friendsBA);
context.write(pairA, one);
context.write(pairB, one);
System.out.println(pairA + " " + one);
System.out.println(pairB + " " + one);
}
}
}
}
And this is the new set of errors.
17/11/16 11:59:25 WARN util.NativeCodeLoader: Unable to load
native-hadoop library for your platform... using builtin-java classes
where applicable 17/11/16 11:59:27 INFO Configuration.deprecation:
session.id is deprecated. Instead, use dfs.metrics.session-id 17/11/16
11:59:27 INFO jvm.JvmMetrics: Initializing JVM Metrics with
processName=JobTracker, sessionId= 17/11/16 11:59:27 WARN
mapred.JobClient: Use GenericOptionsParser for parsing the arguments.
Applications should implement Tool for the same. 17/11/16 11:59:27
WARN mapred.JobClient: No job jar file set. User classes may not be
found. See JobConf(Class) or JobConf#setJar(String). 17/11/16 11:59:27
INFO input.FileInputFormat: Total input paths to process : 2 17/11/16
11:59:29 INFO mapred.JobClient: Running job: job_local1899187381_0001
17/11/16 11:59:29 INFO mapred.LocalJobRunner: OutputCommitter set in
config null 17/11/16 11:59:29 INFO mapred.LocalJobRunner:
OutputCommitter is
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter 17/11/16
11:59:29 INFO mapred.LocalJobRunner: Waiting for map tasks 17/11/16
11:59:29 INFO mapred.LocalJobRunner: Starting task:
attempt_local1899187381_0001_m_000000_0 17/11/16 11:59:29 WARN
mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is
deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead
17/11/16 11:59:29 INFO util.ProcessTree: setsid exited with exit code
0 17/11/16 11:59:29 INFO mapred.Task: Using ResourceCalculatorPlugin
: org.apache.hadoop.util.LinuxResourceCalculatorPlugin#4f94aaa1
17/11/16 11:59:29 INFO mapred.MapTask: Processing split:
file:/home/cloudera/workspace/Assignment4/input/Sample4.txt:0+4106187
17/11/16 11:59:29 INFO mapred.MapTask: Map output collector class =
org.apache.hadoop.mapred.MapTask$MapOutputBuffer 17/11/16 11:59:29
INFO mapred.MapTask: io.sort.mb = 100 17/11/16 11:59:29 INFO
mapred.MapTask: data buffer = 79691776/99614720 17/11/16 11:59:29 INFO
mapred.MapTask: record buffer = 262144/327680 17/11/16 11:59:29 INFO
mapred.LocalJobRunner: Starting task:
attempt_local1899187381_0001_m_000001_0 17/11/16 11:59:29 WARN
mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is
deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead
17/11/16 11:59:29 INFO mapred.Task: Using ResourceCalculatorPlugin :
org.apache.hadoop.util.LinuxResourceCalculatorPlugin#622ecc38 17/11/16
11:59:29 INFO mapred.MapTask: Processing split:
file:/home/cloudera/workspace/Assignment4/input/Sample4.txt~:0+0
17/11/16 11:59:29 INFO mapred.MapTask: Map output collector class =
org.apache.hadoop.mapred.MapTask$MapOutputBuffer 17/11/16 11:59:29
INFO mapred.MapTask: io.sort.mb = 100 17/11/16 11:59:30 INFO
mapred.JobClient: map 0% reduce 0% 17/11/16 11:59:30 INFO
mapred.MapTask: data buffer = 79691776/99614720 17/11/16 11:59:30 INFO
mapred.MapTask: record buffer = 262144/327680 17/11/16 11:59:30 INFO
mapred.LocalJobRunner: 17/11/16 11:59:30 INFO mapred.MapTask:
Starting flush of map output 17/11/16 11:59:30 INFO mapred.Task:
Task:attempt_local1899187381_0001_m_000001_0 is done. And is in the
process of commiting 17/11/16 11:59:30 INFO mapred.LocalJobRunner:
17/11/16 11:59:30 INFO mapred.Task: Task
'attempt_local1899187381_0001_m_000001_0' done. 17/11/16 11:59:30 INFO
mapred.LocalJobRunner: Finishing task:
attempt_local1899187381_0001_m_000001_0 17/11/16 11:59:30 INFO
mapred.LocalJobRunner: Map task executor complete. 17/11/16 11:59:30
WARN mapred.LocalJobRunner: job_local1899187381_0001
java.lang.Exception: java.lang.ArrayIndexOutOfBoundsException: 0 at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:406)
Caused by: java.lang.ArrayIndexOutOfBoundsException: 0 at
Friends$FriendsMap.map(Friends.java:41) at
Friends$FriendsMap.map(Friends.java:1) at
org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:140) at
org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:672) at
org.apache.hadoop.mapred.MapTask.run(MapTask.java:330) at
org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:268)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262) at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745) 17/11/16 11:59:31 INFO
mapred.JobClient: Job complete: job_local1899187381_0001 17/11/16
11:59:31 INFO mapred.JobClient: Counters: 16 17/11/16 11:59:31 INFO
mapred.JobClient: File System Counters 17/11/16 11:59:31 INFO
mapred.JobClient: FILE: Number of bytes read=4674 17/11/16
11:59:31 INFO mapred.JobClient: FILE: Number of bytes
written=139416 17/11/16 11:59:31 INFO mapred.JobClient: FILE:
Number of read operations=0 17/11/16 11:59:31 INFO mapred.JobClient:
FILE: Number of large read operations=0 17/11/16 11:59:31 INFO
mapred.JobClient: FILE: Number of write operations=0 17/11/16
11:59:31 INFO mapred.JobClient: Map-Reduce Framework 17/11/16
11:59:31 INFO mapred.JobClient: Map input records=0 17/11/16
11:59:31 INFO mapred.JobClient: Map output records=0 17/11/16
11:59:31 INFO mapred.JobClient: Map output bytes=0 17/11/16
11:59:31 INFO mapred.JobClient: Input split bytes=125 17/11/16
11:59:31 INFO mapred.JobClient: Combine input records=0 17/11/16
11:59:31 INFO mapred.JobClient: Combine output records=0 17/11/16
11:59:31 INFO mapred.JobClient: Spilled Records=0 17/11/16
11:59:31 INFO mapred.JobClient: CPU time spent (ms)=0 17/11/16
11:59:31 INFO mapred.JobClient: Physical memory (bytes) snapshot=0
17/11/16 11:59:31 INFO mapred.JobClient: Virtual memory (bytes)
snapshot=0 17/11/16 11:59:31 INFO mapred.JobClient: Total
committed heap usage (bytes)=363618304

You've delcared the classes as inner classes, which might be causing issues. An inner class can only exist within an instance of the enclosing class.
Its probably easier to change them to static classes.
public class Friends {
public static class FriendsMap extends Mapper <...> {}
public static class FriendReducer extends Reducer <...> {}
public static void main( String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "Friends");
job.setJarByClass(Friends.class);
FileInputFormat.addInputPath(job, new Path("input"));
FileOutputFormat.setOutputPath(job, new Path("output"));
job.setMapperClass(FriendsMap.class);
job.setCombinerClass(FriendReducer.class);
job.setReducerClass(FriendReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
System.exit( job.waitForCompletion( true) ? 0 : 1);
}
}

Not able to insert data to Hbase table using MapReduce

I have written a map reduce job to read data from a file and insert it into Hbase table. But the problem I am facing is that only 1 record gets inserted in Hbase table. I am not sure whether this is the last record or any random record since my input file is around 10Gb. The logic I have written, I am sure that the records should be inserted in thousands in the table. I am sharing only the reducer code and Driver class code as I am pretty sure, the problem lies here.Please find the code below:
public static class Reduce extends TableReducer<Text,Text,ImmutableBytesWritable> {
public void reduce(Text key, Iterable<Text> values, Context context)
throws IOException, InterruptedException {
Set<Text> uniques = new HashSet<Text>();
String vis=key.toString();
String[] arr=vis.split(":");
Put put=null;
for (Text val : values){
if (uniques.add(val)) {
put = new Put(arr[0].getBytes());
put.add(Bytes.toBytes("cf"), Bytes.toBytes("column"),Bytes.toBytes(val.toString()));
}
context.write(new ImmutableBytesWritable(arr[0].getBytes()), put);
}
}
}
My Driver class:
Configuration conf = HBaseConfiguration.create();
Job job = new Job(conf, "Blank");
job.setJarByClass(Class_name.class);
job.setMapperClass(Map.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(Text.class);
job.setSortComparatorClass(CompositeKeyComprator.class);
Scan scan = new Scan();
scan.setCaching(500);
scan.setCacheBlocks(false);
job.setReducerClass(Reduce.class);
TableMapReduceUtil.initTableReducerJob(
"Table_name",
Reduce.class,
job);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.waitForCompletion(true);
After running the program in the console, it says that Reduce output records=73579, but in the table only 1 record is inserted.
15/06/19 16:32:41 INFO mapred.JobClient: Job complete: job_201506181703_0020
15/06/19 16:32:41 INFO mapred.JobClient: Counters: 28
15/06/19 16:32:41 INFO mapred.JobClient: Map-Reduce Framework
15/06/19 16:32:41 INFO mapred.JobClient: Spilled Records=147158
15/06/19 16:32:41 INFO mapred.JobClient: Map output materialized bytes=6941462
15/06/19 16:32:41 INFO mapred.JobClient: Reduce input records=73579
15/06/19 16:32:41 INFO mapred.JobClient: Virtual memory (bytes) snapshot=7614308352
15/06/19 16:32:41 INFO mapred.JobClient: Map input records=140543
15/06/19 16:32:41 INFO mapred.JobClient: SPLIT_RAW_BYTES=417
15/06/19 16:32:41 INFO mapred.JobClient: Map output bytes=6794286
15/06/19 16:32:41 INFO mapred.JobClient: Reduce shuffle bytes=6941462
15/06/19 16:32:41 INFO mapred.JobClient: Physical memory (bytes) snapshot=892702720
15/06/19 16:32:41 INFO mapred.JobClient: Reduce input groups=1
15/06/19 16:32:41 INFO mapred.JobClient: Combine output records=0
15/06/19 16:32:41 INFO mapred.JobClient: Reduce output records=73579
15/06/19 16:32:41 INFO mapred.JobClient: Map output records=73579
15/06/19 16:32:41 INFO mapred.JobClient: Combine input records=0
15/06/19 16:32:41 INFO mapred.JobClient: CPU time spent (ms)=10970
15/06/19 16:32:41 INFO mapred.JobClient: Total committed heap usage (bytes)=829947904
15/06/19 16:32:41 INFO mapred.JobClient: File Input Format Counters
15/06/19 16:32:41 INFO mapred.JobClient: Bytes Read=204120920
15/06/19 16:32:41 INFO mapred.JobClient: FileSystemCounters
15/06/19 16:32:41 INFO mapred.JobClient: HDFS_BYTES_READ=204121337
15/06/19 16:32:41 INFO mapred.JobClient: FILE_BYTES_WRITTEN=14198205
15/06/19 16:32:41 INFO mapred.JobClient: FILE_BYTES_READ=6941450
15/06/19 16:32:41 INFO mapred.JobClient: Job Counters
And when I write the reducer output to a file, I get the correct output.But not in the Hbase table.
Do let me know what I am missing here. Thanks in advance.

You are inserting data into HBase using same Row Key under same Column family and Column Qualifier. And as per your counter statistics, you have only 1 reducer group. So, all your data is getting overwritten in the same cell. That's why you are getting only one row in HBase table.

Hadoop's Distributed Cache File program generates no output

We are trying to design a simple program, where the goal is to read the patent data from a file, and check if other countries have cited that patent or not, this is from the text book 'Hadoop in Action' by 'chuck Lam', where we are trying to learn about advanced map/reduce programming.
The hadoop distribution which we have setup is Local Node, and we are executing the program in the Windows environment, using cygwin.
This is the URL http://www.nber.org/patents/ from which we downloaded files : apat63_99.txt and cite75_99.txt.
We are using 'apat63_99.txt' as the distributed cache files, and the 'cite75_99.txt' is in the input folder, which we are passing from the command line parameters.
The problem is that the program is not generating output, the output files which we are seeing has no data in it.
We have tried with the mapper phase as well as the reducer phase output and both are blank.
Here is the code which we have developed for this task:
package com.sample.patent;
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.util.Hashtable;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.filecache.DistributedCache;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;
public class country_cite {
private static Hashtable<String, String> joinData
= new Hashtable<String, String>();
public static class Country_Citation_Class extends
Mapper<Text, Text, Text, Text> {
Path[] cacheFiles;
public void configure(JobConf conf) {
try {
cacheFiles = DistributedCache.getLocalCacheArchives(conf);
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
public void map(Text key, Text value, Context context)
throws IOException, InterruptedException {
if (cacheFiles != null && cacheFiles.length > 0) {
String line;
String[] tokens;
BufferedReader joinReader = new BufferedReader(new FileReader(
cacheFiles[0].toString()));
try {
while ((line = joinReader.readLine()) != null) {
tokens = line.split(",");
joinData.put(tokens[0], tokens[4]);
}
} finally {
joinReader.close();
}
}
if (joinData.get(key) != null)
context.write(key, new Text(joinData.get(key)));
}
}
public static class MyReduceClass extends Reducer<Text, Text, Text, Text> {
public void reduce(Text key, Iterable<Text> values, Context context)
throws IOException, InterruptedException {
String patent_country = joinData.get(key);
if (patent_country != null) {
for (Text val : values) {
String cited_country = joinData.get(val);
if (cited_country != null
&& !cited_country.equals(patent_country)) {
context.write(key, new Text(cited_country));
}
}
}
}
}
public static void main(String[] args) throws Exception {
// TODO Auto-generated method stub
Configuration conf = new Configuration();
DistributedCache.addCacheFile(new Path(args[0]).toUri(),
conf);
String[] otherArgs = new GenericOptionsParser(conf, args)
.getRemainingArgs();
if (otherArgs.length != 3) {
System.err.println("Usage: country_cite <in> <out>");
System.exit(2);
}
Job job = new Job(conf,"country_cite");
job.setJarByClass(country_cite.class);
job.setMapperClass(Country_Citation_Class.class);
job.setInputFormatClass(org.apache.hadoop.mapreduce.lib.input.KeyValueTextInputFormat.class);
// job.setReducerClass(MyReduceClass.class);
job.setNumReduceTasks(0);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
FileInputFormat.addInputPath(job, new Path(otherArgs[1]));
FileOutputFormat.setOutputPath(job, new Path(otherArgs[2]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
The tool is Eclipse and Hadoop's version which we are using is 1.2.1.
These are the command line parameters to run the job:
/cygdrive/c/cygwin64/usr/local/hadoop
$ bin/hadoop jar PatentCitation.jar country_cite apat63_99.txt input output
This is the trace which gets generated while the program executes:
/cygdrive/c/cygwin64/usr/local/hadoop
$ bin/hadoop jar PatentCitation.jar country_cite apat63_99.txt input output
Patch for HADOOP-7682: Instantiating workaround file system
14/06/22 12:39:20 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Patch for HADOOP-7682: Ignoring IOException setting persmission for path "file:/tmp/hadoop-RaoSa/mapred/staging/RaoSa1277400315/.staging": Failed to set permissions of path: \tmp\hadoop-RaoSa\mapred\staging\RaoSa1277400315\.staging to 0700
Patch for HADOOP-7682: Ignoring IOException setting persmission for path "file:/tmp/hadoop-RaoSa/mapred/staging/RaoSa1277400315/.staging/job_local1277400315_0001": Failed to set permissions of path: \tmp\hadoop-RaoSa\mapred\staging\RaoSa1277400315\.staging\job_local1277400315_0001 to 0700
14/06/22 12:39:21 WARN mapred.JobClient: No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String).
14/06/22 12:39:21 INFO input.FileInputFormat: Total input paths to process : 1
14/06/22 12:39:21 WARN snappy.LoadSnappy: Snappy native library not loaded
Patch for HADOOP-7682: Ignoring IOException setting persmission for path "file:/tmp/hadoop-RaoSa/mapred/staging/RaoSa1277400315/.staging/job_local1277400315_0001/job.split": Failed to set permissions of path: \tmp\hadoop-RaoSa\mapred\staging\RaoSa1277400315\.staging\job_local1277400315_0001\job.split to 0644
Patch for HADOOP-7682: Ignoring IOException setting persmission for path "file:/tmp/hadoop-RaoSa/mapred/staging/RaoSa1277400315/.staging/job_local1277400315_0001/job.splitmetainfo": Failed to set permissions of path: \tmp\hadoop-RaoSa\mapred\staging\RaoSa1277400315\.staging\job_local1277400315_0001\job.splitmetainfo to 0644
Patch for HADOOP-7682: Ignoring IOException setting persmission for path "file:/tmp/hadoop-RaoSa/mapred/staging/RaoSa1277400315/.staging/job_local1277400315_0001/job.xml": Failed to set permissions of path: \tmp\hadoop-RaoSa\mapred\staging\RaoSa1277400315\.staging\job_local1277400315_0001\job.xml to 0644
14/06/22 12:39:23 INFO filecache.TrackerDistributedCacheManager: Creating fileapat63_99.txt in /tmp/hadoop-RaoSa/mapred/local/archive/7067728792316735217_-679065598_1881640498-work-5016028422992714806 with rwxr-xr-x
Patch for HADOOP-7682: Ignoring IOException setting persmission for path "/tmp/hadoop-RaoSa/mapred/local/archive/7067728792316735217_-679065598_1881640498-work-5016028422992714806": Failed to set permissions of path: \tmp\hadoop-RaoSa\mapred\local\archive\7067728792316735217_-679065598_1881640498-work-5016028422992714806 to 0755
14/06/22 12:40:06 INFO filecache.TrackerDistributedCacheManager: Cached apat63_99.txt as /tmp/hadoop-RaoSa/mapred/local/archive/7067728792316735217_-679065598_1881640498/fileapat63_99.txt
14/06/22 12:40:08 INFO filecache.TrackerDistributedCacheManager: Cached apat63_99.txt as /tmp/hadoop-RaoSa/mapred/local/archive/7067728792316735217_-679065598_1881640498/fileapat63_99.txt
14/06/22 12:40:09 INFO mapred.JobClient: Running job: job_local1277400315_0001
14/06/22 12:40:10 INFO mapred.LocalJobRunner: Waiting for map tasks
14/06/22 12:40:10 INFO mapred.LocalJobRunner: Starting task: attempt_local1277400315_0001_m_000000_0
14/06/22 12:40:10 INFO mapred.Task: Using ResourceCalculatorPlugin : null
14/06/22 12:40:10 INFO mapred.MapTask: Processing split: file:/C:/cygwin64/usr/local/hadoop/input/cite75_99.txt:0+33554432
14/06/22 12:40:10 INFO mapred.JobClient: map 0% reduce 0%
14/06/22 12:40:15 INFO mapred.Task: Task:attempt_local1277400315_0001_m_000000_0 is done. And is in the process of commiting
14/06/22 12:40:15 INFO mapred.LocalJobRunner:
14/06/22 12:40:15 INFO mapred.Task: Task attempt_local1277400315_0001_m_000000_0 is allowed to commit now
14/06/22 12:40:15 INFO output.FileOutputCommitter: Saved output of task 'attempt_local1277400315_0001_m_000000_0' to output
14/06/22 12:40:15 INFO mapred.LocalJobRunner:
14/06/22 12:40:15 INFO mapred.Task: Task 'attempt_local1277400315_0001_m_000000_0' done.
14/06/22 12:40:15 INFO mapred.LocalJobRunner: Finishing task: attempt_local1277400315_0001_m_000000_0
14/06/22 12:40:15 INFO mapred.LocalJobRunner: Starting task: attempt_local1277400315_0001_m_000001_0
14/06/22 12:40:15 INFO mapred.Task: Using ResourceCalculatorPlugin : null
14/06/22 12:40:15 INFO mapred.MapTask: Processing split: file:/C:/cygwin64/usr/local/hadoop/input/cite75_99.txt:33554432+33554432
14/06/22 12:40:16 INFO mapred.JobClient: map 12% reduce 0%
14/06/22 12:40:21 INFO mapred.Task: Task:attempt_local1277400315_0001_m_000001_0 is done. And is in the process of commiting
14/06/22 12:40:21 INFO mapred.LocalJobRunner:
14/06/22 12:40:21 INFO mapred.Task: Task attempt_local1277400315_0001_m_000001_0 is allowed to commit now
14/06/22 12:40:21 INFO output.FileOutputCommitter: Saved output of task 'attempt_local1277400315_0001_m_000001_0' to output
14/06/22 12:40:21 INFO mapred.LocalJobRunner:
14/06/22 12:40:21 INFO mapred.Task: Task 'attempt_local1277400315_0001_m_000001_0' done.
14/06/22 12:40:21 INFO mapred.LocalJobRunner: Finishing task: attempt_local1277400315_0001_m_000001_0
14/06/22 12:40:21 INFO mapred.LocalJobRunner: Starting task: attempt_local1277400315_0001_m_000002_0
14/06/22 12:40:21 INFO mapred.Task: Using ResourceCalculatorPlugin : null
14/06/22 12:40:21 INFO mapred.MapTask: Processing split: file:/C:/cygwin64/usr/local/hadoop/input/cite75_99.txt:67108864+33554432
14/06/22 12:40:21 INFO mapred.JobClient: map 25% reduce 0%
14/06/22 12:40:26 INFO mapred.Task: Task:attempt_local1277400315_0001_m_000002_0 is done. And is in the process of commiting
14/06/22 12:40:26 INFO mapred.LocalJobRunner:
14/06/22 12:40:26 INFO mapred.Task: Task attempt_local1277400315_0001_m_000002_0 is allowed to commit now
14/06/22 12:40:26 INFO output.FileOutputCommitter: Saved output of task 'attempt_local1277400315_0001_m_000002_0' to output
14/06/22 12:40:26 INFO mapred.LocalJobRunner:
14/06/22 12:40:26 INFO mapred.Task: Task 'attempt_local1277400315_0001_m_000002_0' done.
14/06/22 12:40:26 INFO mapred.LocalJobRunner: Finishing task: attempt_local1277400315_0001_m_000002_0
14/06/22 12:40:26 INFO mapred.LocalJobRunner: Starting task: attempt_local1277400315_0001_m_000003_0
14/06/22 12:40:26 INFO mapred.Task: Using ResourceCalculatorPlugin : null
14/06/22 12:40:26 INFO mapred.MapTask: Processing split: file:/C:/cygwin64/usr/local/hadoop/input/cite75_99.txt:100663296+33554432
14/06/22 12:40:26 INFO mapred.JobClient: map 37% reduce 0%
14/06/22 12:40:29 INFO mapred.LocalJobRunner:
14/06/22 12:40:29 INFO mapred.JobClient: map 42% reduce 0%
14/06/22 12:40:29 INFO mapred.Task: Task:attempt_local1277400315_0001_m_000003_0 is done. And is in the process of commiting
14/06/22 12:40:29 INFO mapred.LocalJobRunner:
14/06/22 12:40:29 INFO mapred.Task: Task attempt_local1277400315_0001_m_000003_0 is allowed to commit now
14/06/22 12:40:29 INFO output.FileOutputCommitter: Saved output of task 'attempt_local1277400315_0001_m_000003_0' to output
14/06/22 12:40:29 INFO mapred.LocalJobRunner:
14/06/22 12:40:29 INFO mapred.Task: Task 'attempt_local1277400315_0001_m_000003_0' done.
14/06/22 12:40:29 INFO mapred.LocalJobRunner: Finishing task: attempt_local1277400315_0001_m_000003_0
14/06/22 12:40:29 INFO mapred.LocalJobRunner: Starting task: attempt_local1277400315_0001_m_000004_0
14/06/22 12:40:29 INFO mapred.Task: Using ResourceCalculatorPlugin : null
14/06/22 12:40:29 INFO mapred.MapTask: Processing split: file:/C:/cygwin64/usr/local/hadoop/input/cite75_99.txt:134217728+33554432
14/06/22 12:40:30 INFO mapred.JobClient: map 50% reduce 0%
14/06/22 12:40:30 INFO mapred.Task: Task:attempt_local1277400315_0001_m_000004_0 is done. And is in the process of commiting
14/06/22 12:40:30 INFO mapred.LocalJobRunner:
14/06/22 12:40:30 INFO mapred.Task: Task attempt_local1277400315_0001_m_000004_0 is allowed to commit now
14/06/22 12:40:30 INFO output.FileOutputCommitter: Saved output of task 'attempt_local1277400315_0001_m_000004_0' to output
14/06/22 12:40:30 INFO mapred.LocalJobRunner:
14/06/22 12:40:30 INFO mapred.Task: Task 'attempt_local1277400315_0001_m_000004_0' done.
14/06/22 12:40:30 INFO mapred.LocalJobRunner: Finishing task: attempt_local1277400315_0001_m_000004_0
14/06/22 12:40:30 INFO mapred.LocalJobRunner: Starting task: attempt_local1277400315_0001_m_000005_0
14/06/22 12:40:30 INFO mapred.Task: Using ResourceCalculatorPlugin : null
14/06/22 12:40:30 INFO mapred.MapTask: Processing split: file:/C:/cygwin64/usr/local/hadoop/input/cite75_99.txt:167772160+33554432
14/06/22 12:40:31 INFO mapred.JobClient: map 62% reduce 0%
14/06/22 12:40:31 INFO mapred.Task: Task:attempt_local1277400315_0001_m_000005_0 is done. And is in the process of commiting
14/06/22 12:40:31 INFO mapred.LocalJobRunner:
14/06/22 12:40:31 INFO mapred.Task: Task attempt_local1277400315_0001_m_000005_0 is allowed to commit now
14/06/22 12:40:31 INFO output.FileOutputCommitter: Saved output of task 'attempt_local1277400315_0001_m_000005_0' to output
14/06/22 12:40:31 INFO mapred.LocalJobRunner:
14/06/22 12:40:31 INFO mapred.Task: Task 'attempt_local1277400315_0001_m_000005_0' done.
14/06/22 12:40:31 INFO mapred.LocalJobRunner: Finishing task: attempt_local1277400315_0001_m_000005_0
14/06/22 12:40:31 INFO mapred.LocalJobRunner: Starting task: attempt_local1277400315_0001_m_000006_0
14/06/22 12:40:31 INFO mapred.Task: Using ResourceCalculatorPlugin : null
14/06/22 12:40:31 INFO mapred.MapTask: Processing split: file:/C:/cygwin64/usr/local/hadoop/input/cite75_99.txt:201326592+33554432
14/06/22 12:40:32 INFO mapred.JobClient: map 75% reduce 0%
14/06/22 12:40:32 INFO mapred.Task: Task:attempt_local1277400315_0001_m_000006_0 is done. And is in the process of commiting
14/06/22 12:40:32 INFO mapred.LocalJobRunner:
14/06/22 12:40:32 INFO mapred.Task: Task attempt_local1277400315_0001_m_000006_0 is allowed to commit now
14/06/22 12:40:32 INFO output.FileOutputCommitter: Saved output of task 'attempt_local1277400315_0001_m_000006_0' to output
14/06/22 12:40:32 INFO mapred.LocalJobRunner:
14/06/22 12:40:32 INFO mapred.Task: Task 'attempt_local1277400315_0001_m_000006_0' done.
14/06/22 12:40:32 INFO mapred.LocalJobRunner: Finishing task: attempt_local1277400315_0001_m_000006_0
14/06/22 12:40:32 INFO mapred.LocalJobRunner: Starting task: attempt_local1277400315_0001_m_000007_0
14/06/22 12:40:32 INFO mapred.Task: Using ResourceCalculatorPlugin : null
14/06/22 12:40:33 INFO mapred.MapTask: Processing split: file:/C:/cygwin64/usr/local/hadoop/input/cite75_99.txt:234881024+29194407
14/06/22 12:40:33 INFO mapred.JobClient: map 87% reduce 0%
14/06/22 12:40:35 INFO mapred.Task: Task:attempt_local1277400315_0001_m_000007_0 is done. And is in the process of commiting
14/06/22 12:40:35 INFO mapred.LocalJobRunner:
14/06/22 12:40:35 INFO mapred.Task: Task attempt_local1277400315_0001_m_000007_0 is allowed to commit now
14/06/22 12:40:35 INFO output.FileOutputCommitter: Saved output of task 'attempt_local1277400315_0001_m_000007_0' to output
14/06/22 12:40:35 INFO mapred.LocalJobRunner:
14/06/22 12:40:35 INFO mapred.Task: Task 'attempt_local1277400315_0001_m_000007_0' done.
14/06/22 12:40:35 INFO mapred.LocalJobRunner: Finishing task: attempt_local1277400315_0001_m_000007_0
14/06/22 12:40:35 INFO mapred.LocalJobRunner: Map task executor complete.
14/06/22 12:40:35 INFO mapred.JobClient: map 100% reduce 0%
14/06/22 12:40:35 INFO mapred.JobClient: Job complete: job_local1277400315_0001
14/06/22 12:40:35 INFO mapred.JobClient: Counters: 9
14/06/22 12:40:35 INFO mapred.JobClient: File Output Format Counters
14/06/22 12:40:35 INFO mapred.JobClient: Bytes Written=64
14/06/22 12:40:35 INFO mapred.JobClient: FileSystemCounters
14/06/22 12:40:35 INFO mapred.JobClient: FILE_BYTES_READ=5009033659
14/06/22 12:40:35 INFO mapred.JobClient: FILE_BYTES_WRITTEN=3820489832
14/06/22 12:40:35 INFO mapred.JobClient: File Input Format Counters
14/06/22 12:40:35 INFO mapred.JobClient: Bytes Read=264104103
14/06/22 12:40:35 INFO mapred.JobClient: Map-Reduce Framework
14/06/22 12:40:35 INFO mapred.JobClient: Map input records=16522439
14/06/22 12:40:35 INFO mapred.JobClient: Spilled Records=0
14/06/22 12:40:35 INFO mapred.JobClient: Total committed heap usage (bytes)=708313088
14/06/22 12:40:35 INFO mapred.JobClient: Map output records=0
14/06/22 12:40:35 INFO mapred.JobClient: SPLIT_RAW_BYTES=952
Kindly let us know where are we going wrong, in case if I have missed any vital information, let me know.
Thanks and Regards

I think that the error is in line if (joinData.get(key) != null). joinData uses String as key and you pass Text as an argument to get so get returns null every time. Try to replace this line with if (joinData.get(key.toString()) != null).
Another mistake is that each Mapper and each Reducer run in their own jvm so Reducer and Mapper can't communicate through the static objects and joinData is empty for every Reducer.

Map-Reduce Programming Error

My input is many text files. I want my map-reduce program to write all the files-names and the associated sentences with the file names in one output file, where I want to just emit the file-name(key) and the associated sentences(value) from the mapper. The reducer will collect the key and all the values and write the file-name and their associated sentences in the output.
Here is the code of my mapper and reducer:
public class WordCount {
public static class Map extends MapReduceBase implements Mapper<LongWritable,
Text, Text, Text> {
public void map(LongWritable key, Text value, OutputCollector<Text,Text>
output, Reporter reporter) throws IOException {
String filename = new String();
FileSplit filesplit = (FileSplit)reporter.getInputSplit();
filename=filesplit.getPath().getName();
output.collect(new Text(filename), value);
}
}
public static class Reduce extends MapReduceBase implements Reducer<Text, Text,
Text, Text> {
public void reduce(Text key, Iterable<Text> values, OutputCollector<Text,
Text> output, Reporter reporter) throws IOException {
StringBuilder builder = new StringBuilder();
for(Text value : values) {
String str = value.toString();
builder.append(str);
}
String valueToWrite=builder.toString();
output.collect(key, new Text(valueToWrite));
}
#Override
public void reduce(Text arg0, Iterator<Text> arg1,
OutputCollector<Text, Text> arg2, Reporter arg3)
throws IOException {
}
}
public static void main(String[] args) throws Exception {
JobConf conf = new JobConf(WordCount.class);
conf.setJobName("wordcount");
conf.setMapperClass(Map.class);
conf.setReducerClass(Reduce.class);
conf.setJarByClass(WordCount.class);
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(Text.class);
conf.setInputFormat(TextInputFormat.class);
conf.setOutputFormat(TextOutputFormat.class);
conf.setNumReduceTasks(1);
FileInputFormat.setInputPaths(conf, new Path(args[0]));
FileOutputFormat.setOutputPath(conf, new Path(args[1]));
JobClient.runJob(conf);
}
}
The output is as follows:
14/03/21 00:38:27 WARN util.NativeCodeLoader: Unable to load native-hadoop library
for your platform... using builtin-java classes where applicable
14/03/21 00:38:27 WARN mapred.JobClient: Use GenericOptionsParser for parsing the
arguments. Applications should implement Tool for the same.
14/03/21 00:38:27 WARN mapred.JobClient: No job jar file set. User classes may not
be found. See JobConf(Class) or JobConf#setJar(String).
14/03/21 00:38:27 WARN snappy.LoadSnappy: Snappy native library not loaded
14/03/21 00:38:27 INFO mapred.FileInputFormat: Total input paths to process : 2
14/03/21 00:38:27 INFO mapred.JobClient: Running job: job_local_0001
14/03/21 00:38:27 INFO util.ProcessTree: setsid exited with exit code 0
14/03/21 00:38:27 INFO mapred.Task: Using ResourceCalculatorPlugin :
org.apache.hadoop.util.LinuxResourceCalculatorPlugin#4911b910
14/03/21 00:38:27 INFO mapred.MapTask: numReduceTasks: 1
14/03/21 00:38:27 INFO mapred.MapTask: io.sort.mb = 100
14/03/21 00:38:27 INFO mapred.MapTask: data buffer = 79691776/99614720
14/03/21 00:38:27 INFO mapred.MapTask: record buffer = 262144/327680
14/03/21 00:38:27 INFO mapred.MapTask: Starting flush of map output
14/03/21 00:38:27 INFO mapred.MapTask: Finished spill 0
14/03/21 00:38:27 INFO mapred.Task: Task:attempt_local_0001_m_000000_0 is done. And
is in the process of commiting
14/03/21 00:38:28 INFO mapred.JobClient: map 0% reduce 0%
14/03/21 00:38:30 INFO mapred.LocalJobRunner:
file:/root/Desktop/wordcount/sample.txt:0+5371
14/03/21 00:38:30 INFO mapred.Task: Task 'attempt_local_0001_m_000000_0' done.
14/03/21 00:38:30 INFO mapred.Task: Using ResourceCalculatorPlugin :
org.apache.hadoop.util.LinuxResourceCalculatorPlugin#1f8166e5
14/03/21 00:38:30 INFO mapred.MapTask: numReduceTasks: 1
14/03/21 00:38:30 INFO mapred.MapTask: io.sort.mb = 100
14/03/21 00:38:30 INFO mapred.MapTask: data buffer = 79691776/99614720
14/03/21 00:38:30 INFO mapred.MapTask: record buffer = 262144/327680
14/03/21 00:38:30 INFO mapred.MapTask: Starting flush of map output
14/03/21 00:38:30 INFO mapred.MapTask: Finished spill 0
14/03/21 00:38:30 INFO mapred.Task: Task:attempt_local_0001_m_000001_0 is done. And
is in the process of commiting
14/03/21 00:38:31 INFO mapred.JobClient: map 100% reduce 0%
14/03/21 00:38:33 INFO mapred.LocalJobRunner:
file:/root/Desktop/wordcount/sample.txt~:0+587
14/03/21 00:38:33 INFO mapred.Task: Task 'attempt_local_0001_m_000001_0' done.
14/03/21 00:38:33 INFO mapred.Task: Using ResourceCalculatorPlugin :
org.apache.hadoop.util.LinuxResourceCalculatorPlugin#3963b3e
14/03/21 00:38:33 INFO mapred.LocalJobRunner:
14/03/21 00:38:33 INFO mapred.Merger: Merging 2 sorted segments
14/03/21 00:38:33 INFO mapred.Merger: Down to the last merge-pass, with 2 segments
left of total size: 7549 bytes
14/03/21 00:38:33 INFO mapred.LocalJobRunner:
14/03/21 00:38:33 INFO mapred.Task: Task:attempt_local_0001_r_000000_0 is done. And
is in the process of commiting
14/03/21 00:38:33 INFO mapred.LocalJobRunner:
14/03/21 00:38:33 INFO mapred.Task: Task attempt_local_0001_r_000000_0 is allowed to
commit now
14/03/21 00:38:33 INFO mapred.FileOutputCommitter: Saved output of task
'attempt_local_0001_r_000000_0' to file:/root/Desktop/wordcount/output
14/03/21 00:38:36 INFO mapred.LocalJobRunner: reduce > reduce
14/03/21 00:38:36 INFO mapred.Task: Task 'attempt_local_0001_r_000000_0' done.
14/03/21 00:38:37 INFO mapred.JobClient: map 100% reduce 100%
14/03/21 00:38:37 INFO mapred.JobClient: Job complete: job_local_0001
14/03/21 00:38:37 INFO mapred.JobClient: Counters: 21
14/03/21 00:38:37 INFO mapred.JobClient: File Input Format Counters
14/03/21 00:38:37 INFO mapred.JobClient: Bytes Read=5958
14/03/21 00:38:37 INFO mapred.JobClient: File Output Format Counters
14/03/21 00:38:37 INFO mapred.JobClient: Bytes Written=8
14/03/21 00:38:37 INFO mapred.JobClient: FileSystemCounters
14/03/21 00:38:37 INFO mapred.JobClient: FILE_BYTES_READ=26020
14/03/21 00:38:37 INFO mapred.JobClient: FILE_BYTES_WRITTEN=117337
14/03/21 00:38:37 INFO mapred.JobClient: Map-Reduce Framework
14/03/21 00:38:37 INFO mapred.JobClient: Map output materialized bytes=7557
14/03/21 00:38:37 INFO mapred.JobClient: Map input records=122
14/03/21 00:38:37 INFO mapred.JobClient: Reduce shuffle bytes=0
14/03/21 00:38:37 INFO mapred.JobClient: Spilled Records=244
14/03/21 00:38:37 INFO mapred.JobClient: Map output bytes=7301
14/03/21 00:38:37 INFO mapred.JobClient: Total committed heap usage
(bytes)=954925056
14/03/21 00:38:37 INFO mapred.JobClient: CPU time spent (ms)=0
14/03/21 00:38:37 INFO mapred.JobClient: Map input bytes=5958
14/03/21 00:38:37 INFO mapred.JobClient: SPLIT_RAW_BYTES=185
14/03/21 00:38:37 INFO mapred.JobClient: Combine input records=0
14/03/21 00:38:37 INFO mapred.JobClient: Reduce input records=0
14/03/21 00:38:37 INFO mapred.JobClient: Reduce input groups=2
14/03/21 00:38:37 INFO mapred.JobClient: Combine output records=0
14/03/21 00:38:37 INFO mapred.JobClient: Physical memory (bytes) snapshot=0
14/03/21 00:38:37 INFO mapred.JobClient: Reduce output records=0
14/03/21 00:38:37 INFO mapred.JobClient: Virtual memory (bytes) snapshot=0
14/03/21 00:38:37 INFO mapred.JobClient: Map output records=122
When I run the above mapper and reducer with the same configuration of inputformat (keyvaluetextinputformat.class) it does not write anything in the output.
What should I change to achieve my goal?

KeyValueTextInputFormat is not correct input format for your case. If you want to use this input format, each line in your input should contain a key,value pair separated by user specified delimiter or tab by default.But in your case input is "Set of files" and you want output of job to be "filename,content of file".
One of ways to achieve this would be to use TextInputFormat as input format. I have tested below and it works.
Get file name and content of file in map function
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException
{
String filename = new String();
FileSplit filesplit = (FileSplit)context.getInputSplit();
filename=filesplit.getPath().getName();
context.write(new Text(filename), new Text(value));
}
In reduce function we build string of all values which will be contents of file
public void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException
{
StringBuilder builder= new StringBuilder();
for (Text value : values)
{
String str = value.toString();
builder.append(str);
}
String valueToWrite= builder.toString();
context.write(key, new Text(valueToWrite));
}
}
Finally in job driver class set inputformat to our custom format and no of reducers to 1
job.setInputFormatClass(TextInputFormat.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
job.setMapperClass(myMapper.class);
job.setReducerClass(myReducer.class);
job.setNumReduceTasks(1);

What is wrong with this Java for HDInsight Hadoop?

I am trying to find out why the below Java does not work when I try to run it on hadoop.
import java.io.IOException;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.Mapper.Context;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
public class PageStat implements Tool {
private Configuration conf;
#Override
public int run(String[] args) throws Exception {
Job job = new Job(getConf());
String jobName = "Page visit statistics MR";
job.setJobName(jobName);
job.setJarByClass(PageStat.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.setMapperClass(PageStat.PageStatMapper.class);
job.setReducerClass(PageStat.PageStatReducer.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
job.setNumReduceTasks(job.getConfiguration().getInt("num.reducer", 1));
int status = job.waitForCompletion(true) ? 0 : 1;
return status;
}
public static void main(String[] args) throws Exception {
int exitCode = ToolRunner.run(new PageStat(), args);
System.exit(exitCode);
}
public void setConf(Configuration conf) {
this.conf = conf;
}
public Configuration getConf() {
return conf;
}
public static class PageStatMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
private Text keyHolder = new Text();
private IntWritable valueHolder = new IntWritable();
#Override
protected void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
String[] items = value.toString().split(",");
if (items.length == 3) {
String url = items[1];
keyHolder.set(url);
Integer duration = Integer.parseInt(items[2]);
valueHolder.set(duration);
context.write(keyHolder, valueHolder);
} else {
context.getCounter("Error", "invalidData").increment(1);
}
}
}
public static class PageStatReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
private Text keyHolder = new Text();
private IntWritable valueHolder = new IntWritable();
private String statType;
private int count;
private int totalTime;
private int avTime;
protected void setup(Context context) throws IOException, InterruptedException {
Configuration conf = context.getConfiguration();
statType = conf.get("page.stat");
}
protected void reduce(Text key, Iterable<IntWritable> values, Context context)
throws IOException, InterruptedException {
count = 0;
totalTime = 0;
for (IntWritable value : values){
++count;
totalTime += value.get();
}
avTime = totalTime / count;
keyHolder.set(key);
if (statType.equals("average")){
valueHolder.set(avTime);
} else {
valueHolder.set(totalTime);
}
context.write(keyHolder, valueHolder);
}
}
}
The error is:
c:\hadoop-training\tutorial02-jobtracker>hadoop jar PageStat.jar
PageStat jobtra cker/input/visit_5000000.txt jobtracker/output
13/07/29 11:24:50 INFO input.FileInputFormat: Total input paths to
process : 1 log4j:ERROR Failed to rename
[c:\Hadoop\hadoop-1.1.0-SNAPSHOT\logs/hadoop.log] t o
[c:\Hadoop\hadoop-1.1.0-SNAPSHOT\logs/hadoop.log.2013-07-26]. 13/07/29
11:24:51 INFO util.NativeCodeLoader: Loaded the native-hadoop library
13/07/29 11:24:51 WARN snappy.LoadSnappy: Snappy native library not
loaded 13/07/29 11:24:54 INFO mapred.JobClient: Running job:
job_201307261340_0001 13/07/29 11:24:55 INFO mapred.JobClient: map 0%
reduce 0% 13/07/29 11:25:24 INFO mapred.JobClient: map 1% reduce 0%
13/07/29 11:25:27 INFO mapred.JobClient: map 6% reduce 0% 13/07/29
11:25:30 INFO mapred.JobClient: map 14% reduce 0% 13/07/29 11:25:35
INFO mapred.JobClient: map 22% reduce 0% 13/07/29 11:25:38 INFO
mapred.JobClient: map 31% reduce 0% 13/07/29 11:25:41 INFO
mapred.JobClient: map 35% reduce 0% 13/07/29 11:25:44 INFO
mapred.JobClient: map 44% reduce 0% 13/07/29 11:25:47 INFO
mapred.JobClient: map 50% reduce 0% 13/07/29 11:26:03 INFO
mapred.JobClient: map 60% reduce 0% 13/07/29 11:26:06 INFO
mapred.JobClient: map 64% reduce 0% 13/07/29 11:26:09 INFO
mapred.JobClient: map 69% reduce 0% 13/07/29 11:26:12 INFO
mapred.JobClient: map 76% reduce 0% 13/07/29 11:26:15 INFO
mapred.JobClient: map 81% reduce 0% 13/07/29 11:26:18 INFO
mapred.JobClient: map 85% reduce 0% 13/07/29 11:26:21 INFO
mapred.JobClient: map 87% reduce 0% 13/07/29 11:26:24 INFO
mapred.JobClient: map 92% reduce 0% 13/07/29 11:26:27 INFO
mapred.JobClient: map 94% reduce 0% 13/07/29 11:26:30 INFO
mapred.JobClient: map 96% reduce 0% 13/07/29 11:26:33 INFO
mapred.JobClient: map 97% reduce 0% 13/07/29 11:26:37 INFO
mapred.JobClient: map 99% reduce 8% 13/07/29 11:26:40 INFO
mapred.JobClient: map 100% reduce 8% 13/07/29 11:26:46 INFO
mapred.JobClient: map 100% reduce 25% 13/07/29 11:26:54 INFO
mapred.JobClient: Task Id : attempt_201307261340_0001_r_0 00000_0,
Status : FAILED java.lang.NullPointerException
at PageStat$PageStatReducer.reduce(PageStat.java:120)
at PageStat$PageStatReducer.reduce(PageStat.java:96)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:177)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:651
)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418)
at org.apache.hadoop.mapred.Child$4.run(Child.java:271)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInforma
tion.java:1135)
at org.apache.hadoop.mapred.Child.main(Child.java:265)
13/07/29 11:26:56 INFO mapred.JobClient: map 100% reduce 0% 13/07/29
11:27:05 INFO mapred.JobClient: map 100% reduce 8% 13/07/29 11:27:08
INFO mapred.JobClient: map 100% reduce 33% 13/07/29 11:27:10 INFO
mapred.JobClient: Task Id : attempt_201307261340_0001_r_0 00000_1,
Status : FAILED java.lang.NullPointerException
at PageStat$PageStatReducer.reduce(PageStat.java:120)
at PageStat$PageStatReducer.reduce(PageStat.java:96)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:177)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:651
)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418)
at org.apache.hadoop.mapred.Child$4.run(Child.java:271)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInforma
tion.java:1135)
at org.apache.hadoop.mapred.Child.main(Child.java:265)
13/07/29 11:27:11 INFO mapred.JobClient: map 100% reduce 0% 13/07/29
11:27:20 INFO mapred.JobClient: map 100% reduce 8% 13/07/29 11:27:23
INFO mapred.JobClient: map 100% reduce 25% 13/07/29 11:27:25 INFO
mapred.JobClient: Task Id : attempt_201307261340_0001_r_0 00000_2,
Status : FAILED java.lang.NullPointerException
at PageStat$PageStatReducer.reduce(PageStat.java:120)
at PageStat$PageStatReducer.reduce(PageStat.java:96)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:177)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:651
)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418)
at org.apache.hadoop.mapred.Child$4.run(Child.java:271)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInforma
tion.java:1135)
at org.apache.hadoop.mapred.Child.main(Child.java:265)
13/07/29 11:27:26 INFO mapred.JobClient: map 100% reduce 0% 13/07/29
11:27:38 INFO mapred.JobClient: map 100% reduce 25% 13/07/29 11:27:41
INFO mapred.JobClient: map 100% reduce 0% 13/07/29 11:27:43 INFO
mapred.JobClient: Job complete: job_201307261340_0001 13/07/29
11:27:43 INFO mapred.JobClient: Counters: 24 13/07/29 11:27:43 INFO
mapred.JobClient: Job Counters 13/07/29 11:27:43 INFO
mapred.JobClient: Launched reduce tasks=4 13/07/29 11:27:43 INFO
mapred.JobClient: SLOTS_MILLIS_MAPS=179086 13/07/29 11:27:43 INFO
mapred.JobClient: Total time spent by all reduces wai ting after
reserving slots (ms)=0 13/07/29 11:27:43 INFO mapred.JobClient:
Total time spent by all maps waitin g after reserving slots (ms)=0
13/07/29 11:27:43 INFO mapred.JobClient: Launched map tasks=4
13/07/29 11:27:43 INFO mapred.JobClient: Data-local map tasks=4
13/07/29 11:27:43 INFO mapred.JobClient: Failed reduce tasks=1
13/07/29 11:27:43 INFO mapred.JobClient:
SLOTS_MILLIS_REDUCES=106513 13/07/29 11:27:43 INFO mapred.JobClient:
FileSystemCounters 13/07/29 11:27:43 INFO mapred.JobClient:
FILE_BYTES_READ=179504086 13/07/29 11:27:43 INFO mapred.JobClient:
HDFS_BYTES_READ=254931072 13/07/29 11:27:43 INFO mapred.JobClient:
FILE_BYTES_WRITTEN=359099432 13/07/29 11:27:43 INFO mapred.JobClient:
File Input Format Counters 13/07/29 11:27:43 INFO mapred.JobClient:
Bytes Read=254930544 13/07/29 11:27:43 INFO mapred.JobClient:
Map-Reduce Framework 13/07/29 11:27:43 INFO mapred.JobClient: Map
output materialized bytes=17949 9502 13/07/29 11:27:43 INFO
mapred.JobClient: Combine output records=0 13/07/29 11:27:43 INFO
mapred.JobClient: Map input records=5000000 13/07/29 11:27:43
INFO mapred.JobClient: Physical memory (bytes) snapshot=85 1607552
13/07/29 11:27:43 INFO mapred.JobClient: Spilled Records=10000000
13/07/29 11:27:43 INFO mapred.JobClient: Map output
bytes=169499478 13/07/29 11:27:43 INFO mapred.JobClient: CPU time
spent (ms)=81308 13/07/29 11:27:43 INFO mapred.JobClient: Total
committed heap usage (bytes)= 746323968 13/07/29 11:27:43 INFO
mapred.JobClient: Virtual memory (bytes) snapshot=988 401664
13/07/29 11:27:43 INFO mapred.JobClient: Combine input records=0
13/07/29 11:27:43 INFO mapred.JobClient: Map output
records=5000000 13/07/29 11:27:43 INFO mapred.JobClient:
SPLIT_RAW_BYTES=528
Thanks!!!

I had a similar problem, you need to use the -D flag to execute:
-Dpage.stat=total
You'll likely see an error:
log4j:WARN No appenders could be found for logger (org.apache.hadoop.hdfs.DFSClient).
log4j:WARN Please initialize the log4j system properly.
That's not the full answer, I'm still getting to the bottom of it myself.

The line numbers in the stack trace don't seem to line up with the source code posted. Has the code changed since this run?
The NullPointerException may be occurring on the if (statType...) line. I don't see anything setting "page.stat" in the configuration, either hard-coded in the run method or passed as an argument in the job submission. This would cause the statType member to be initialized to null.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

HADOOP - Emitting JSON From Map Reduce Job - java

The problem is caused by the call to mapper.setVisibility(JsonMethod.FIELD, Visibility.ANY); Removing this makes it work.

Related

MapReduce program in Hadoop that implements a simple “People You Might Know”

Not able to insert data to Hbase table using MapReduce

Hadoop's Distributed Cache File program generates no output

Map-Reduce Programming Error

What is wrong with this Java for HDInsight Hadoop?

Categories

Resources