Below is the code for my Implementation of a simple MapReduce Job using a custom writable comparable.
public class MapReduceKMeans {
public static class MapReduceKMeansMapper extends
Mapper<Object, Text, SongDataPoint, Text> {
public void map(Object key, Text value, Context context)
throws InterruptedException, IOException {
String str = value.toString();
// Reading Line one by one from the input CSV.
String split[] = str.split(",");
String trackId = split[0];
String title = split[1];
String artistName = split[2];
SongDataPoint songDataPoint =
new SongDataPoint(new Text(trackId), new Text(title),
new Text(artistName));
context.write(songDataPoint, new Text());
}
}
public static class MapReduceKMeansReducer extends
Reducer<SongDataPoint, Text, Text, NullWritable> {
public void reduce(SongDataPoint key, Iterable<Text> values,
Context context) throws IOException, InterruptedException {
StringBuilder sb = new StringBuilder();
sb.append(key.getTrackId()).append("\t").
append(key.getTitle()).append("\t")
.append(key.getArtistName()).append("\t");
String write = sb.toString();
context.write(new Text(write), NullWritable.get());
}
}
public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
Configuration conf = new Configuration();
String[] otherArgs = new GenericOptionsParser(conf, args)
.getRemainingArgs();
if (otherArgs.length != 2) {
System.err
.println("Usage:<CsV Out Path> <Final Out Path>");
System.exit(2);
}
Job job = new Job(conf, "Song Data Trial");
job.setJarByClass(MapReduceKMeans.class);
job.setMapperClass(MapReduceKMeansMapper.class);
job.setReducerClass(MapReduceKMeansReducer.class);
job.setOutputKeyClass(SongDataPoint.class);
job.setOutputValueClass(Text.class);
FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
When I debug my code reads all the rows in the CSV file but it does not enter the reduce job at all.
I also have made use of the SongDataPoint as my custom writable.
Its code is as below.
public class SongDataPoint implements WritableComparable<SongDataPoint> {
Text trackId;
Text title;
Text artistName;
public SongDataPoint() {
this.trackId = new Text();
this.title = new Text();
this.artistName = new Text();
}
public SongDataPoint(Text trackId, Text title, Text artistName) {
this.trackId = trackId;
this.title = title;
this.artistName = artistName;
}
#Override
public void readFields(DataInput in) throws IOException {
this.trackId.readFields(in);
this.title.readFields(in);
this.artistName.readFields(in);
}
#Override
public void write(DataOutput out) throws IOException {
}
public Text getTrackId() {
return trackId;
}
public void setTrackId(Text trackId) {
this.trackId = trackId;
}
public Text getTitle() {
return title;
}
public void setTitle(Text title) {
this.title = title;
}
public Text getArtistName() {
return artistName;
}
public void setArtistName(Text artistName) {
this.artistName = artistName;
}
#Override
public int compareTo(SongDataPoint o) {
// TODO Auto-generated method stub
int compare = getTrackId().compareTo(o.getTrackId());
return compare;
}
}
Any help is appreciated. Thanks.
Your output key class class as per Driver is SongDataPoint.class and output value class as Text.class but actually you are writing Text as key in Reducer and Nullwritable as value in Reducer.
you should also specify the Mapper output values as following.
job.setMapOutputKeyClass(SongDataPoint.class);
job.setMapOutputValueClass(Text.class);
My write method in my CustomWritable Class was left blank by mistake. It solved the problem after writing the proper code in it.
public void write(DataOutput out) throws IOException {
}
Related
I trying to implemented secondary sort,
And see that url as eg.:https://www.safaribooksonline.com/library/view/data-algorithms/9781491906170/ch01.html
But my problem it's different, I have a list of product, the year and month and the price like that:
201505011000######PEN DRIVE00951
201505011000######PEN DRIVE00952
201505011000######PEN DRIVE00458
201505011000######PEN DRIVE00459
201505011000#######NOTEBOOK11470
201605011000#######NOTEBOOK21471
201705011000#######NOTEBOOK21472
201705011000###GAVETA DE HD01472
201703011000###GAVETA DE HD01473
201705011000###GAVETA DE HD01474
Where for eg.: 201505 represent the year and the month, after the # sign I had the product name, and in the and the price 01470 represent 14,70.
What I need to do is get the lower price for each product and show the Year and month of that Price. But I don't know to do that, what I can show are the Lower price and the product.
Here is my program:
MAPPER
public class GroupMR {
public static class GroupMapper extends Mapper<LongWritable, Text, Product, IntWritable> {
Product prdt = new Product();
Text cntText = new Text();
Text YearMonthText = new Text();
IntWritable price = new IntWritable();
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
String line = value.toString();
String produto = line.substring(13, 27);//Nome do produto
produto = produto.substring(produto.lastIndexOf("#")+1);
String ano = line.substring(0, 6);
int valor = Integer.parseInt(line.substring(27, 32));
cntText.set(new Text(produto));
YearMonthText.set(ano);
price.set(valor);
Product prdt = new Product(cntText, YearMonthText);
context.write(prdt, price);
}
}
REDUCER
public static class GroupReducer extends Reducer<Product, IntWritable, Product, IntWritable> {
public void reduce(Product key, Iterator<IntWritable> values, Context context) throws IOException,
InterruptedException {
int minValue = Integer.MAX_VALUE;
while (values.hasNext()) {
minValue = Math.min(minValue,values.next().get());
}
context.write(key, new IntWritable(minValue));
}
}
COMPARABLE
private static class Product implements WritableComparable<Product> {
Text Product;
Text YearMonth;
public Product(Text Product, Text YearMonth) {
this.Product = Product;
this.YearMonth = YearMonth;
}
public Product() {
this.Product = new Text();
this.YearMonth = new Text();
}
public void write(DataOutput out) throws IOException {
this.Product.write(out);
this.YearMonth.write(out);
}
public void readFields(DataInput in) throws IOException {
this.Product.readFields(in);
this.YearMonth.readFields(in);
}
public int compareTo(Product pric) {
if (pric == null)
return 0;
int intcnt = Product.compareTo(pric.Product);
return intcnt;
}
#Override
public String toString() {
return Product.toString() + " DATA: " + YearMonth.toString();
}
}
DRIVER
public static void main(String[] args)
throws IOException, ClassNotFoundException, InterruptedException {
FileUtils.deleteDirectory(new File("/Local/data/output"));
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "GroupMR");
job.setJarByClass(GroupMR.class);
job.setMapperClass(GroupMapper.class);
job.setReducerClass(GroupReducer.class);
job.setOutputKeyClass(Product.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(args[1]));
FileOutputFormat.setOutputPath(job, new Path(args[2]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
RESULT
201605011000######PEN DRIVE00950
201505011000######PEN DRIVE00951
201505011000######PEN DRIVE00952
201505011000######PEN DRIVE00458
201505011000######PEN DRIVE00459
201505011000#######NOTEBOOK11470
201605011000#######NOTEBOOK21471
201705011000#######NOTEBOOK21472
201705011000###GAVETA DE HD01472
201703011000###GAVETA DE HD01473
201705011000###GAVETA DE HD01474
I think the problem is in the Reduce and in the CompareTo But I have no idea how to make. Someone could help me with it?
I'm trying to use Time_Ant10s(custom ArrayWritable class) as the Reducer's output.
I refer to this nice question: MapReduce Output ArrayWritable, but I get NullPointerException in context.write() in the last line of the Reducer.
I suppose get() in Time_Ant10s.toString() may return null, but I have no idea why this happens. Could you help me?
Main Method
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "something");
// general
job.setJarByClass(CommutingTime1.class);
job.setMapperClass(Mapper1.class);
job.setReducerClass(Reducer1.class);
job.setNumReduceTasks(1);
job.setInputFormatClass (TextInputFormat.class);
// mapper output
job.setMapOutputKeyClass(Date_Uid.class);
job.setMapOutputValueClass(Time_Ant10.class);
// reducer output
job.setOutputFormatClass(CommaTextOutputFormat.class);
job.setOutputKeyClass(IntWritable.class);
job.setOutputValueClass(Time_Ant10s.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
Mapper
public static class Mapper1 extends Mapper<LongWritable, Text, Date_Uid, Time_Ant10> {
/* map as <date_uid, time_ant10> */
// omitted
}
}
Reducer
public static class Reducer1 extends Reducer<Date_Uid, Time_Ant10, IntWritable, Time_Ant10s> {
/* <date_uid, time_ant10> -> <date, time_ant10s> */
private IntWritable date = new IntWritable();
#Override
protected void reduce(Date_Uid date_uid, Iterable<Time_Ant10> time_ant10s, Context context) throws IOException, InterruptedException {
date.set(date_uid.getDate());
// count ants
int num = 0;
for(Time_Ant10 time_ant10 : time_ant10s){
num++;
}
if(num>=1){
Time_Ant10[] temp = new Time_Ant10[num];
int i=0;
for(Time_Ant10 time_ant10 : time_ant10s){
String time = time_ant10.getTimeStr();
int ant10 = time_ant10.getAnt10();
temp[i] = new Time_Ant10(time, ant10);
i++;
}
context.write(date, new Time_Ant10s(temp));
}
}
}
Writer
public static class CommaTextOutputFormat extends TextOutputFormat<IntWritable, Time_Ant10s> {
#Override
public RecordWriter<IntWritable, Time_Ant10s> getRecordWriter(TaskAttemptContext job) throws IOException, InterruptedException {
Configuration conf = job.getConfiguration();
String extension = ".txt";
Path file = getDefaultWorkFile(job, extension);
FileSystem fs = file.getFileSystem(conf);
FSDataOutputStream fileOut = fs.create(file, false);
return new LineRecordWriter<IntWritable, Time_Ant10s>(fileOut, ",");
}
}
Custom Writables
// Time
public static class Time implements Writable {
private int h, m, s;
public Time() {}
public Time(int h, int m, int s) {
this.h = h;
this.m = m;
this.s = s;
}
public Time(String time) {
String[] hms = time.split(":", 0);
this.h = Integer.parseInt(hms[0]);
this.m = Integer.parseInt(hms[1]);
this.s = Integer.parseInt(hms[2]);
}
public void set(int h, int m, int s) {
this.h = h;
this.m = m;
this.s = s;
}
public void set(String time) {
String[] hms = time.split(":", 0);
this.h = Integer.parseInt(hms[0]);
this.m = Integer.parseInt(hms[1]);
this.s = Integer.parseInt(hms[2]);
}
public int[] getTime() {
int[] time = new int[3];
time[0] = this.h;
time[1] = this.m;
time[2] = this.s;
return time;
}
public String getTimeStr() {
return String.format("%1$02d:%2$02d:%3$02d", this.h, this.m, this.s);
}
public int getTimeInt() {
return this.h * 10000 + this.m * 100 + this.s;
}
#Override
public void readFields(DataInput in) throws IOException {
h = in.readInt();
m = in.readInt();
s = in.readInt();
}
#Override
public void write(DataOutput out) throws IOException {
out.writeInt(h);
out.writeInt(m);
out.writeInt(s);
}
}
// Time_Ant10
public static class Time_Ant10 implements Writable {
private Time time;
private int ant10;
public Time_Ant10() {
this.time = new Time();
}
public Time_Ant10(Time time, int ant10) {
this.time = time;
this.ant10 = ant10;
}
public Time_Ant10(String time, int ant10) {
this.time = new Time(time);
this.ant10 = ant10;
}
public void set(Time time, int ant10) {
this.time = time;
this.ant10 = ant10;
}
public void set(String time, int ant10) {
this.time = new Time(time);
this.ant10 = ant10;
}
public int[] getTime() {
return this.time.getTime();
}
public String getTimeStr() {
return this.time.getTimeStr();
}
public int getTimeInt() {
return this.time.getTimeInt();
}
public int getAnt10() {
return this.ant10;
}
#Override
public void readFields(DataInput in) throws IOException {
time.readFields(in);
ant10 = in.readInt();
}
#Override
public void write(DataOutput out) throws IOException {
time.write(out);
out.writeInt(ant10);
}
}
// Time_Ant10s
public static class Time_Ant10s extends ArrayWritable {
public Time_Ant10s(){
super(Time_Ant10.class);
}
public Time_Ant10s(Time_Ant10[] time_ant10s){
super(Time_Ant10.class, time_ant10s);
}
#Override
public Time_Ant10[] get() {
return (Time_Ant10[]) super.get();
}
#Override
public String toString() {
int time, ant10;
Time_Ant10[] time_ant10s = get();
String output = "";
for(Time_Ant10 time_ant10: time_ant10s){
time = time_ant10.getTimeInt();
ant10 = time_ant10.getAnt10();
output += time + "," + ant10 + ",";
}
return output;
}
}
// Data_Uid
public static class Date_Uid implements WritableComparable<Date_Uid> {
// omitted
}
Error Message
java.lang.Exception: java.lang.NullPointerException
at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:529)
Caused by: java.lang.NullPointerException
at CommutingTime1$Time_Ant10s.toString(CommutingTime1.java:179)
at org.apache.hadoop.mapreduce.lib.output.TextOutputFormat$LineRecordWriter.writeObject(TextOutputFormat.java:85)
at org.apache.hadoop.mapreduce.lib.output.TextOutputFormat$LineRecordWriter.write(TextOutputFormat.java:104)
at org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.write(ReduceTask.java:558)
at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
at org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.write(WrappedReducer.java:105)
at CommutingTime1$Reducer1.reduce(CommutingTime1.java:323)
at CommutingTime1$Reducer1.reduce(CommutingTime1.java:291)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389)
at org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
I found that the problem is Iterable in reduce can't be iterated twice. So I refer to this page and change the reducer and Time_Ant10s as below. Now everything is going fine.
#redflar3: Thank you very much for giving me hints. I totally misunderstood where my code has bug.
Reducer
public static class Reducer1 extends Reducer<Date_Uid, Time_Ant10, IntWritable, Time_Ant10s> {
private IntWritable date = new IntWritable();
#Override
protected void reduce(Date_Uid date_uid, Iterable<Time_Ant10> time_ant10s, Context context) throws IOException, InterruptedException {
String time = "";
int ant10;
date.set(date_uid.getDate());
ArrayList<Time_Ant10> temp_list = new ArrayList<Time_Ant10>();
for (Time_Ant10 time_ant10 : time_ant10s){
time = time_ant10.getTimeStr();
ant10 = time_ant10.getAnt10();
temp_list.add(new Time_Ant10(time, ant10));
}
if(temp_list.size() >= 1){
Time_Ant10[] temp_array = temp_list.toArray(new Time_Ant10[temp_list.size()]);
context.write(date, new Time_Ant10s(temp_array));
}
}
}
Time_Ant10s
public static class Time_Ant10s extends ArrayWritable {
public Time_Ant10s(){
super(Time_Ant10.class);
}
public Time_Ant10s(Time_Ant10[] time_ant10s){
super(Time_Ant10.class, time_ant10s);
}
#Override
public Time_Ant10[] get() {
return (Time_Ant10[]) super.get();
}
#Override
public String toString() {
int time, ant10;
Time_Ant10[] time_ant10s = get();
String output = "";
for(Time_Ant10 time_ant10: time_ant10s){
time = time_ant10.getTimeInt();
ant10 = time_ant10.getAnt10();
output += time + "," + ant10 + ",";
}
return output;
}
}
I have a mapreduce program who's reduce method outputs a Text as the key and a FloatArrayWritable as the values. However, the values are outputting the array address instead of the values from the toString() method.
The output I am getting is:
IYE marketDataPackage.MarketData#69204998
IYE marketDataPackage.MarketData#69204998
The output should be:
IYE 38.89, 38.50, etc.
Could someone please advise the error in my code? Thanks.
public static class Map extends Mapper<LongWritable, Text, Text, MarketData> {
private Text symbol = new Text();
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while (tokenizer.hasMoreTokens()) {
StringTokenizer tokenizer2 = new StringTokenizer(tokenizer.nextToken().toString(), ",");
symbol.set(tokenizer2.nextToken());
context.write(symbol, new MarketData(tokenizer2.nextToken(), Float.parseFloat(tokenizer2.nextToken())));
}
}
}
public static class Reduce extends Reducer<Text, FloatWritable, Text, FloatArrayWritable> {
public void reduce(Text key, Iterable<MarketData> values, Context context) throws IOException, InterruptedException, ParseException {
Calendar today = Calendar.getInstance();
today.add(Calendar.DAY_OF_MONTH, -45);
Calendar testDate = Calendar.getInstance();
SimpleDateFormat sdf = new SimpleDateFormat("yyyy/m/d");
List<FloatWritable> prices = new ArrayList<FloatWritable>();
for (MarketData m : values) {
testDate.setTime(sdf.parse(m.getTradeDate()));
if (testDate.after(today)) {
prices.add(new FloatWritable(m.getPrice()));
}
}
context.write(key, new FloatArrayWritable(prices.toArray(new FloatWritable[prices.size()])));
}
}
public static void main(String[] args) {
Configuration conf = new Configuration();
Job job = new Job(conf, "Security_Closing_Prices");
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(MarketData.class);
job.setMapperClass(Map.class);
job.setReducerClass(Reduce.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.waitForCompletion(true);
}
FloatArrayWritable class:
public class FloatArrayWritable extends ArrayWritable {
public FloatArrayWritable() {
super(FloatWritable.class);
}
public FloatArrayWritable(FloatWritable[] values) {
super(FloatWritable.class, values);
}
#Override
public FloatWritable[] get() {
return (FloatWritable[]) super.get();
}
#Override
public String toString() {
FloatWritable[] values = get();
String prices = "";
for (FloatWritable f : values) {
prices = prices + f.toString() + ", ";
}
if (prices != null && !prices.isEmpty()) {
prices = prices.substring(0, prices.length() - 2);
}
return prices;
}
}
The MarketData class should override toString(). You don't provide code for that class, but I suspect that it doesn't.
Program is generating empty output file. Can anyone please suggest me where am I going wrong.
Any help will be highly appreciated. I tried to put job.setNumReduceTask(0) as I am not using reducer but still output file is empty.
public static class PrizeDisMapper extends Mapper<LongWritable, Text, Text, Pair>{
int rating = 0;
Text CustID;
IntWritable r;
Text MovieID;
public void map(LongWritable key, Text line, Context context
) throws IOException, InterruptedException {
String line1 = line.toString();
String [] fields = line1.split(":");
if(fields.length > 1)
{
String Movieid = fields[0];
String line2 = fields[1];
String [] splitline = line2.split(",");
String Custid = splitline[0];
int rate = Integer.parseInt(splitline[1]);
r = new IntWritable(rate);
CustID = new Text(Custid);
MovieID = new Text(Movieid);
Pair P = new Pair();
context.write(MovieID,P);
}
else
{
return;
}
}
}
public static class IntSumReducer extends Reducer<Text,Pair,Text,Pair> {
private IntWritable result = new IntWritable();
public void reduce(Text key, Iterable<Pair> values,
Context context
) throws IOException, InterruptedException {
for (Pair val : values) {
context.write(key, val);
}
}
public class Pair implements Writable
{
String key;
int value;
public void write(DataOutput out) throws IOException {
out.writeInt(value);
out.writeChars(key);
}
public void readFields(DataInput in) throws IOException {
key = in.readUTF();
value = in.readInt();
}
public void setVal(String aKey, int aValue)
{
key = aKey;
value = aValue;
}
Main class:
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
if (otherArgs.length != 2) {
System.err.println("Usage: wordcount <in> <out>");
System.exit(2);
}
Job job = new Job(conf, "word count");
job.setJarByClass(WordCount.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
job.setInputFormatClass (TextInputFormat.class);
FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Pair.class);
System.exit(job.waitForCompletion(true) ? 0 : 1);
Thanks #Pathmanaban Palsamy and #Chris Gerken for your suggestions. I have modified the code as per your suggestions but still getting empty output file. Can anyone please suggest me configurations in my main class for input and output. Do I need to specify Pair class in input to mapper & how?
I'm guessing the reduce method should be declared as
public void reduce(Text key, Iterable<Pair> values,
Context context
) throws IOException, InterruptedException
You get passed an Iterable (an object from which you can get an Iterator) which you use to iterate over all of the values that were mapped to the given key.
Since no reducer required, I suspect below line
Pair P = new Pair();
context.write(MovieID,P);
empty Pair would be the issue.
also pls check your Driver class you have given correct keyclass and valueclass like
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(Pair.class);
This question is unlikely to help any future visitors; it is only relevant to a small geographic area, a specific moment in time, or an extraordinarily narrow situation that is not generally applicable to the worldwide audience of the internet. For help making this question more broadly applicable, visit the help center.
Closed 10 years ago.
I'm working since short time with Hadoop and trying to implement a join in Java. It doesn't matter if Map-Side or Reduce-Side. I took Reduce-Side join since it was supposed to be easier to implement. I know that Java is not the best choice for joins, aggregations etc. and should better pick Hive or Pig which I have done already. However I'm working on a research project and I have to use all of those 3 languages in order to deliver a comparison.
Anyway, I have two input files with different structure. One is key|value and the other one is key|value1;value2;value3;value4. One record from each input file looks like following:
Input1: 1;2010-01-10T00:00:01
Input2: 1;23;Blue;2010-01-11T00:00:01;9999-12-31T23:59:59
I followed the example in the Hadoop Definitve Guide book, but it didn't work for me. I'm posting my java files here, so you can see what's wrong.
public class LookupReducer extends Reducer<TextPair,Text,Text,Text> {
private String result = "";
private String msisdn;
private String attribute, product;
private long trans_dt_long, start_dt_long, end_dt_long;
private String trans_dt, start_dt, end_dt;
#Override
public void reduce(TextPair key, Iterable<Text> values, Context context)
throws IOException, InterruptedException {
context.progress();
//value without key to remember
Iterator<Text> iter = values.iterator();
for (Text val : values) {
Text recordNoKey = val; //new Text(iter.next());
String valSplitted[] = recordNoKey.toString().split(";");
//if the input is coming from CDR set corresponding values
if(key.getSecond().toString().equals(CDR.CDR_TAG))
{
trans_dt = recordNoKey.toString();
trans_dt_long = dateToLong(recordNoKey.toString());
}
//if the input is coming from Attributes set corresponding values
else if(key.getSecond().toString().equals(Attribute.ATT_TAG))
{
attribute = valSplitted[0];
product = valSplitted[1];
start_dt = valSplitted[2];
start_dt_long = dateToLong(valSplitted[2]);
end_dt = valSplitted[3];
end_dt_long = dateToLong(valSplitted[3]);;
}
Text record = val; //iter.next();
//System.out.println("RECORD: " + record);
Text outValue = new Text(recordNoKey.toString() + ";" + record.toString());
if(start_dt_long < trans_dt_long && trans_dt_long < end_dt_long)
{
//concat output columns
result = attribute + ";" + product + ";" + trans_dt;
context.write(key.getFirst(), new Text(result));
System.out.println("KEY: " + key);
}
}
}
private static long dateToLong(String date){
DateFormat formatter = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
Date parsedDate = null;
try {
parsedDate = formatter.parse(date);
} catch (ParseException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
long dateInLong = parsedDate.getTime();
return dateInLong;
}
public static class TextPair implements WritableComparable<TextPair> {
private Text first;
private Text second;
public TextPair(){
set(new Text(), new Text());
}
public TextPair(String first, String second){
set(new Text(first), new Text(second));
}
public TextPair(Text first, Text second){
set(first, second);
}
public void set(Text first, Text second){
this.first = first;
this.second = second;
}
public Text getFirst() {
return first;
}
public void setFirst(Text first) {
this.first = first;
}
public Text getSecond() {
return second;
}
public void setSecond(Text second) {
this.second = second;
}
#Override
public void readFields(DataInput in) throws IOException {
// TODO Auto-generated method stub
first.readFields(in);
second.readFields(in);
}
#Override
public void write(DataOutput out) throws IOException {
// TODO Auto-generated method stub
first.write(out);
second.write(out);
}
#Override
public int hashCode(){
return first.hashCode() * 163 + second.hashCode();
}
#Override
public boolean equals(Object o){
if(o instanceof TextPair)
{
TextPair tp = (TextPair) o;
return first.equals(tp.first) && second.equals(tp.second);
}
return false;
}
#Override
public String toString(){
return first + ";" + second;
}
#Override
public int compareTo(TextPair tp) {
// TODO Auto-generated method stub
int cmp = first.compareTo(tp.first);
if(cmp != 0)
return cmp;
return second.compareTo(tp.second);
}
public static class FirstComparator extends WritableComparator {
protected FirstComparator(){
super(TextPair.class, true);
}
#Override
public int compare(WritableComparable comp1, WritableComparable comp2){
TextPair pair1 = (TextPair) comp1;
TextPair pair2 = (TextPair) comp2;
int cmp = pair1.getFirst().compareTo(pair2.getFirst());
if(cmp != 0)
return cmp;
return -pair1.getSecond().compareTo(pair2.getSecond());
}
}
public static class GroupComparator extends WritableComparator {
protected GroupComparator()
{
super(TextPair.class, true);
}
#Override
public int compare(WritableComparable comp1, WritableComparable comp2)
{
TextPair pair1 = (TextPair) comp1;
TextPair pair2 = (TextPair) comp2;
return pair1.compareTo(pair2);
}
}
}
}
public class Joiner extends Configured implements Tool {
public static final String DATA_SEPERATOR =";"; //Define the symbol for seperating the output data
public static final String NUMBER_OF_REDUCER = "1"; //Define the number of the used reducer jobs
public static final String COMPRESS_MAP_OUTPUT = "true"; //if the output from the mapping process should be compressed, set COMPRESS_MAP_OUTPUT = "true" (if not set it to "false")
public static final String
USED_COMPRESSION_CODEC = "org.apache.hadoop.io.compress.SnappyCodec"; //set the used codec for the data compression
public static final boolean JOB_RUNNING_LOCAL = true; //if you run the Hadoop job on your local machine, you have to set JOB_RUNNING_LOCAL = true
//if you run the Hadoop job on the Telefonica Cloud, you have to set JOB_RUNNING_LOCAL = false
public static final String OUTPUT_PATH = "/home/hduser"; //set the folder, where the output is saved. Only needed, if JOB_RUNNING_LOCAL = false
public static class KeyPartitioner extends Partitioner<TextPair, Text> {
#Override
public int getPartition(/*[*/TextPair key/*]*/, Text value, int numPartitions) {
System.out.println("numPartitions: " + numPartitions);
return (/*[*/key.getFirst().hashCode()/*]*/ & Integer.MAX_VALUE) % numPartitions;
}
}
private static Configuration hadoopconfig() {
Configuration conf = new Configuration();
conf.set("mapred.textoutputformat.separator", DATA_SEPERATOR);
conf.set("mapred.compress.map.output", COMPRESS_MAP_OUTPUT);
//conf.set("mapred.map.output.compression.codec", USED_COMPRESSION_CODEC);
conf.set("mapred.reduce.tasks", NUMBER_OF_REDUCER);
return conf;
}
#Override
public int run(String[] args) throws Exception {
// TODO Auto-generated method stub
if ((args.length != 3) && (JOB_RUNNING_LOCAL)) {
System.err.println("Usage: Lookup <CDR-inputPath> <Attribute-inputPath> <outputPath>");
System.exit(2);
}
//starting the Hadoop job
Configuration conf = hadoopconfig();
Job job = new Job(conf, "Join cdrs and attributes");
job.setJarByClass(Joiner.class);
MultipleInputs.addInputPath(job, new Path(args[0]), TextInputFormat.class, CDRMapper.class);
MultipleInputs.addInputPath(job, new Path(args[1]), TextInputFormat.class, AttributeMapper.class);
//FileInputFormat.addInputPath(job, new Path(otherArgs[0])); //expecting a folder instead of a file
if(JOB_RUNNING_LOCAL)
FileOutputFormat.setOutputPath(job, new Path(args[2]));
else
FileOutputFormat.setOutputPath(job, new Path(OUTPUT_PATH));
job.setPartitionerClass(KeyPartitioner.class);
job.setGroupingComparatorClass(TextPair.FirstComparator.class);
job.setReducerClass(LookupReducer.class);
job.setMapOutputKeyClass(TextPair.class);
job.setMapOutputValueClass(Text.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
return job.waitForCompletion(true) ? 0 : 1;
}
public static void main(String[] args) throws Exception {
int exitCode = ToolRunner.run(new Joiner(), args);
System.exit(exitCode);
}
}
public class Attribute {
public static final String ATT_TAG = "1";
public static class AttributeMapper
extends Mapper<LongWritable, Text, TextPair, Text>{
private static Text values = new Text();
//private Object output = new Text();
#Override
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
//partition the input line by the separator semicolon
String[] attributes = value.toString().split(";");
String valuesInString = "";
if(attributes.length != 5)
System.err.println("Input column number not correct. Expected 5, provided " + attributes.length
+ "\n" + "Check the input file");
if(attributes.length == 5)
{
//setting the values with the input values read above
valuesInString = attributes[1] + ";" + attributes[2] + ";" + attributes[3] + ";" + attributes[4];
values.set(valuesInString);
//writing out the key and value pair
context.write( new TextPair(new Text(String.valueOf(attributes[0])), new Text(ATT_TAG)), values);
}
}
}
}
public class CDR {
public static final String CDR_TAG = "0";
public static class CDRMapper
extends Mapper<LongWritable, Text, TextPair, Text>{
private static Text values = new Text();
private Object output = new Text();
#Override
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
//partition the input line by the separator semicolon
String[] cdr = value.toString().split(";");
//setting the values with the input values read above
values.set(cdr[1]);
//output = CDR_TAG + cdr[1];
//writing out the key and value pair
context.write( new TextPair(new Text(String.valueOf(cdr[0])), new Text(CDR_TAG)), values);
}
}
}
I would be glad if anyone could at least post a link for a tutorial or a simple example where such a join functionality is implemented. I searched a lot, but either the code was not complete or there was not enough explanation.
To be honest, I have no idea what your code is trying to do, but that's probably because I'd do it in a different way and not familiar with the API's you're using.
I would start from scratch as follows:
Create a mapper to read one of the files. For each line read, write a key value pair to the context. The key is a Text created from the key and the value is another Text created by concatenating a "1" with the entire input record.
Create another mapper for the other file. This mapper acts just like the first mapper, but the value is a Text created by concatenating a "2" with the entire input record.
Write a reducer to do the join. The reduce() method will get all records written for a specific key. You can tell which input file (and therefore the data format for the record) by looking to see whether the value starts with a "1" or a "2". Once you know whether or not you have one, the other or both record types, you can write whatever logic you need to merge the data from the one or two records.
By the way, you use the MultipleInputs class to configure more than one mapper in your job/driver class.