Program is generating empty output file. Can anyone please suggest me where am I going wrong.
Any help will be highly appreciated. I tried to put job.setNumReduceTask(0) as I am not using reducer but still output file is empty.
public static class PrizeDisMapper extends Mapper<LongWritable, Text, Text, Pair>{
int rating = 0;
Text CustID;
IntWritable r;
Text MovieID;
public void map(LongWritable key, Text line, Context context
) throws IOException, InterruptedException {
String line1 = line.toString();
String [] fields = line1.split(":");
if(fields.length > 1)
{
String Movieid = fields[0];
String line2 = fields[1];
String [] splitline = line2.split(",");
String Custid = splitline[0];
int rate = Integer.parseInt(splitline[1]);
r = new IntWritable(rate);
CustID = new Text(Custid);
MovieID = new Text(Movieid);
Pair P = new Pair();
context.write(MovieID,P);
}
else
{
return;
}
}
}
public static class IntSumReducer extends Reducer<Text,Pair,Text,Pair> {
private IntWritable result = new IntWritable();
public void reduce(Text key, Iterable<Pair> values,
Context context
) throws IOException, InterruptedException {
for (Pair val : values) {
context.write(key, val);
}
}
public class Pair implements Writable
{
String key;
int value;
public void write(DataOutput out) throws IOException {
out.writeInt(value);
out.writeChars(key);
}
public void readFields(DataInput in) throws IOException {
key = in.readUTF();
value = in.readInt();
}
public void setVal(String aKey, int aValue)
{
key = aKey;
value = aValue;
}
Main class:
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
if (otherArgs.length != 2) {
System.err.println("Usage: wordcount <in> <out>");
System.exit(2);
}
Job job = new Job(conf, "word count");
job.setJarByClass(WordCount.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
job.setInputFormatClass (TextInputFormat.class);
FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Pair.class);
System.exit(job.waitForCompletion(true) ? 0 : 1);
Thanks #Pathmanaban Palsamy and #Chris Gerken for your suggestions. I have modified the code as per your suggestions but still getting empty output file. Can anyone please suggest me configurations in my main class for input and output. Do I need to specify Pair class in input to mapper & how?
I'm guessing the reduce method should be declared as
public void reduce(Text key, Iterable<Pair> values,
Context context
) throws IOException, InterruptedException
You get passed an Iterable (an object from which you can get an Iterator) which you use to iterate over all of the values that were mapped to the given key.
Since no reducer required, I suspect below line
Pair P = new Pair();
context.write(MovieID,P);
empty Pair would be the issue.
also pls check your Driver class you have given correct keyclass and valueclass like
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(Pair.class);
Related
I'm working on a simple map reduce program using the Kaggle data set
https://www.kaggle.com/datasnaek/youtube-new
The dataset contains 40950 records of videos with 16 variables such as video_id , trending_date , title, channel_title, category_id, publish_time, tags, views, likes, dislikes, comment_count, description etc.
The purpose of my MapReduce program is to find all videos which contain "iPhoneX" in its description and has atleast 10,000likes. The final output should only contain (title, video count)
Driver class
package solution;
public class Driver extends Configured implements Tool{
#Override
public int run(String[] args) throws Exception{
if(args.length != 2){
System.out.printf("Usage: Driver <input dir> <output dir> \n");
return -1;
}
Job job = new Job(getConf());
job.setJarByClass(Driver.class);
job.setJobName("iPhoneX");
FileInputFormat.setInputPaths(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.setMapperClass(Mapper.class);
job.setReducerClass(Reducer.class);
//Specify Combiner as the combiner class
job.setCombinerClass(Reducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
if(job.getCombinerClass() == null){
throw new Exception("Combiner not set");
}
boolean success = job.waitForCompletion(true);
return success ? 0 : 1;
}
/* The main method calls the ToolRunner.run method,
* which calls the options parser that interprets Hadoop terminal
* options and puts them into a config object
* */
public static void main(String[] args) throws Exception{
int exitCode = ToolRunner.run(new Configuration(), new Driver(),args);
System.exit(exitCode);
}
}
Reducer class
package solution;
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
public class Reducer extends Reducer<Text, IntWritable, Text, IntWritable>{
#Override
public void reduce(Text key, Iterable<IntWritable> values, Context context)
throws IOException, InterruptedException{
int video_count = 0;
for(IntWritable value : values){
video_count += value.get();
}
context.write(key, new IntWritable(video_count));
}
}
Mapper class
public class Mapper extends Mapper<LongWritable, Text, Text, IntWritable> {
private Text description = new Text();
private IntWritable likes = new IntWritable();
#Override
public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException{
String line = value.toString();
String str[] = line.split("\t");
if(str.length > 3){
description.set(str[8]);
}
// Testing how many times the iPhoneX word is located in the data set
// StringTokenizer itr = new StringTokenizer(line);
//
// while(itr.hasMoreTokens()){
// String token = itr.nextToken();
// if(token.contains("iPhoneX")){
// word.set("iPhoneX Count");
// context.write(word, new IntWritable(1));
// }
// }
}
}
Your code looks fine, but you're going to need to uncomment the part of the mapper that outputs any data, however, your mapper key should just be "iPhone" and you probably want to tokenize the description, not the entire line
You'll also want to extract the number of likes and filter out only those that match the listed condition of the problem set
By the way, you need at least 9 elements to get that position, not just three, so change the condition here
if(str.length >= 9){
description.set(str[8]);
likes = Integer.parseInt(str[...]);
if (likes >= 10000) {
// TODO: find when description string contains iPhoneX
context.write("IPhoneX", count);
}
} else {
return; // skip line
}
Alternatively, rather than pre-aggregating in the mapper, you could just write out (token, 1) for every token that is "iPhoneX", then let the combiner and reducer do the summation for you
I have a mapreduce program who's reduce method outputs a Text as the key and a FloatArrayWritable as the values. However, the values are outputting the array address instead of the values from the toString() method.
The output I am getting is:
IYE marketDataPackage.MarketData#69204998
IYE marketDataPackage.MarketData#69204998
The output should be:
IYE 38.89, 38.50, etc.
Could someone please advise the error in my code? Thanks.
public static class Map extends Mapper<LongWritable, Text, Text, MarketData> {
private Text symbol = new Text();
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while (tokenizer.hasMoreTokens()) {
StringTokenizer tokenizer2 = new StringTokenizer(tokenizer.nextToken().toString(), ",");
symbol.set(tokenizer2.nextToken());
context.write(symbol, new MarketData(tokenizer2.nextToken(), Float.parseFloat(tokenizer2.nextToken())));
}
}
}
public static class Reduce extends Reducer<Text, FloatWritable, Text, FloatArrayWritable> {
public void reduce(Text key, Iterable<MarketData> values, Context context) throws IOException, InterruptedException, ParseException {
Calendar today = Calendar.getInstance();
today.add(Calendar.DAY_OF_MONTH, -45);
Calendar testDate = Calendar.getInstance();
SimpleDateFormat sdf = new SimpleDateFormat("yyyy/m/d");
List<FloatWritable> prices = new ArrayList<FloatWritable>();
for (MarketData m : values) {
testDate.setTime(sdf.parse(m.getTradeDate()));
if (testDate.after(today)) {
prices.add(new FloatWritable(m.getPrice()));
}
}
context.write(key, new FloatArrayWritable(prices.toArray(new FloatWritable[prices.size()])));
}
}
public static void main(String[] args) {
Configuration conf = new Configuration();
Job job = new Job(conf, "Security_Closing_Prices");
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(MarketData.class);
job.setMapperClass(Map.class);
job.setReducerClass(Reduce.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.waitForCompletion(true);
}
FloatArrayWritable class:
public class FloatArrayWritable extends ArrayWritable {
public FloatArrayWritable() {
super(FloatWritable.class);
}
public FloatArrayWritable(FloatWritable[] values) {
super(FloatWritable.class, values);
}
#Override
public FloatWritable[] get() {
return (FloatWritable[]) super.get();
}
#Override
public String toString() {
FloatWritable[] values = get();
String prices = "";
for (FloatWritable f : values) {
prices = prices + f.toString() + ", ";
}
if (prices != null && !prices.isEmpty()) {
prices = prices.substring(0, prices.length() - 2);
}
return prices;
}
}
The MarketData class should override toString(). You don't provide code for that class, but I suspect that it doesn't.
Below is the code for my Implementation of a simple MapReduce Job using a custom writable comparable.
public class MapReduceKMeans {
public static class MapReduceKMeansMapper extends
Mapper<Object, Text, SongDataPoint, Text> {
public void map(Object key, Text value, Context context)
throws InterruptedException, IOException {
String str = value.toString();
// Reading Line one by one from the input CSV.
String split[] = str.split(",");
String trackId = split[0];
String title = split[1];
String artistName = split[2];
SongDataPoint songDataPoint =
new SongDataPoint(new Text(trackId), new Text(title),
new Text(artistName));
context.write(songDataPoint, new Text());
}
}
public static class MapReduceKMeansReducer extends
Reducer<SongDataPoint, Text, Text, NullWritable> {
public void reduce(SongDataPoint key, Iterable<Text> values,
Context context) throws IOException, InterruptedException {
StringBuilder sb = new StringBuilder();
sb.append(key.getTrackId()).append("\t").
append(key.getTitle()).append("\t")
.append(key.getArtistName()).append("\t");
String write = sb.toString();
context.write(new Text(write), NullWritable.get());
}
}
public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
Configuration conf = new Configuration();
String[] otherArgs = new GenericOptionsParser(conf, args)
.getRemainingArgs();
if (otherArgs.length != 2) {
System.err
.println("Usage:<CsV Out Path> <Final Out Path>");
System.exit(2);
}
Job job = new Job(conf, "Song Data Trial");
job.setJarByClass(MapReduceKMeans.class);
job.setMapperClass(MapReduceKMeansMapper.class);
job.setReducerClass(MapReduceKMeansReducer.class);
job.setOutputKeyClass(SongDataPoint.class);
job.setOutputValueClass(Text.class);
FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
When I debug my code reads all the rows in the CSV file but it does not enter the reduce job at all.
I also have made use of the SongDataPoint as my custom writable.
Its code is as below.
public class SongDataPoint implements WritableComparable<SongDataPoint> {
Text trackId;
Text title;
Text artistName;
public SongDataPoint() {
this.trackId = new Text();
this.title = new Text();
this.artistName = new Text();
}
public SongDataPoint(Text trackId, Text title, Text artistName) {
this.trackId = trackId;
this.title = title;
this.artistName = artistName;
}
#Override
public void readFields(DataInput in) throws IOException {
this.trackId.readFields(in);
this.title.readFields(in);
this.artistName.readFields(in);
}
#Override
public void write(DataOutput out) throws IOException {
}
public Text getTrackId() {
return trackId;
}
public void setTrackId(Text trackId) {
this.trackId = trackId;
}
public Text getTitle() {
return title;
}
public void setTitle(Text title) {
this.title = title;
}
public Text getArtistName() {
return artistName;
}
public void setArtistName(Text artistName) {
this.artistName = artistName;
}
#Override
public int compareTo(SongDataPoint o) {
// TODO Auto-generated method stub
int compare = getTrackId().compareTo(o.getTrackId());
return compare;
}
}
Any help is appreciated. Thanks.
Your output key class class as per Driver is SongDataPoint.class and output value class as Text.class but actually you are writing Text as key in Reducer and Nullwritable as value in Reducer.
you should also specify the Mapper output values as following.
job.setMapOutputKeyClass(SongDataPoint.class);
job.setMapOutputValueClass(Text.class);
My write method in my CustomWritable Class was left blank by mistake. It solved the problem after writing the proper code in it.
public void write(DataOutput out) throws IOException {
}
i want calculate average temperature on table test(info:date,info:temp) in hbase and put the result into table result(info:date,info:avg).
However when running the program it gave me an error.
the code is:
public static class mapper1 extends TableMapper<Text,FloatWritable>
{
public static final byte[] Info = "info".getBytes();
public static final byte[] Date = "date".getBytes();
public static final byte[] Temp = "temp".getBytes();
private static Text key=new Text();
public void map(ImmutableBytesWritable row,Result value,Context context)
throws IOException
{
String k1 = new String(value.getValue(Info, Date));
key.set(k1);
byte[] val=value.getValue(Info,Temp);
try
{
context.write(key,new
FloatWritable(Float.parseFloat(Bytes.toString(val))));
}
catch(InterruptedException e)
{
throw new IOException(e);
}
}}
//********************************************************************
public static class reducer1 extends TableReducer<Text,Result,Text>
{
public static final byte[] info = "info".getBytes();
public static final byte[] date = "date".getBytes();
byte[] avg ;
public void reduce(Text key,Iterable<FloatWritable>values, Context context)
throws IOException, InterruptedException
{
float sum=0;
int count=0;
float average=0;
for(FloatWritable val:values)
{
sum+=val.get();
count++;
}
average=(sum/count);
Put put = new Put(Bytes.toBytes(key.toString()));
put.add(info, date, Bytes.toBytes(average));
System.out.println("For\t"+count+"\t average is:"+average);
context.write(key,put);
}
}
//*********************************************************************
public static void main(String args[]) throws
IOException,ClassNotFoundException, InterruptedException, NullPointerException
{
Configuration config=HBaseConfiguration.create();
config.set("hbase.zookeeper.quorum", "localhost");
HTable table1 = new HTable(config, "test");
HTable table2 = new HTable(config, "result");
Job job=new Job(config,"AVG");
Scan scan=new Scan();
scan.addFamily("info".getBytes());
scan.setFilter(new FirstKeyOnlyFilter());
TableMapReduceUtil.initTableMapperJob(
"test",
scan,
mapper1.class,
Text.class,
FloatWritable.class,
job);
TableMapReduceUtil.initTableReducerJob(
"result",
reducer1.class,
job);
job.setNumReduceTasks(1);
boolean b = job.waitForCompletion(true);
if (!b) {
throw new IOException("error with job!");
}
}
}
The error message is:
Exception in thread "main" java.lang.NullPointerException
at org.apache.hadoop.net.DNS.reverseDns(DNS.java:92)
at org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.reverseDNS(TableInputFormatBase.java:223)
at
org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.getSplits(TableInputFormatBase.java:189)
at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:452)
at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:469)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:366)
at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1218)
at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1215)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:416)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1367)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1215)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1236)
at TempVar.AVG.main(AVG.java:126)
can you help me?
It seems the computer name returned by zookeeper where hbase should be is not recognized
either configure the DNS correctly or if you don't use that add the mapping form names to ip addresses in the /etc/hosts file
I need to load data from text file to Map Reduce, I have searched the web, but I didn't find any right solution for my work.
Is there any method or class which reads a text /csv file from a system and store the data into HBASE Table.
For reading from text file first of all the text file should be in hdfs.
You need to specify input format and outputformat for job
Job job = new Job(conf, "example");
FileInputFormat.addInputPath(job, new Path("PATH to text file"));
job.setInputFormatClass(TextInputFormat.class);
job.setMapperClass(YourMapper.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(Text.class);
TableMapReduceUtil.initTableReducerJob("hbase_table_name", YourReducer.class, job);
job.waitForCompletion(true);
YourReducer should extends org.apache.hadoop.hbase.mapreduce.TableReducer<Text, Text, Text>
Sample reducer code
public class YourReducer extends TableReducer<Text, Text, Text> {
private byte[] rawUpdateColumnFamily = Bytes.toBytes("colName");
/**
* Called once at the beginning of the task.
*/
#Override
protected void setup(Context context) throws IOException, InterruptedException {
// something that need to be done at start of reducer
}
#Override
public void reduce(Text keyin, Iterable<Text> values, Context context) throws IOException, InterruptedException {
// aggregate counts
int valuesCount = 0;
for (Text val : values) {
valuesCount += 1;
// put date in table
Put put = new Put(keyin.toString().getBytes());
long explicitTimeInMs = new Date().getTime();
put.add(rawUpdateColumnFamily, Bytes.toBytes("colName"), explicitTimeInMs,val.toString().getBytes());
context.write(keyin, put);
}
}
}
Sample mapper class
public static class YourMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while (tokenizer.hasMoreTokens()) {
word.set(tokenizer.nextToken());
context.write(word, one);
}
}
}