using string split failed in hadoop mapper - java

Below code is written in Mapper of Hadoop:
String[] s = value.toString().split("\\s+");
String date = s[1];
Error occurs in s[1], ArrayIndexOutofBoundsException.
Does regex not work in the hadoop?

this is due to blank or space comes as a line, you have to filter it.
if(s.length>1){
String[] s = value.toString().split("\\s+");
String date = s[1];
}
your problem solution
// Map function:
public static class Map extends MapReduceBase implements Mapper<LongWritable, Text, Text, DoubleWritable> {
// private final static IntWritable one = new IntWritable(1);
//private Text word = new Text();
double temp;
public void map(LongWritable key, Text value, OutputCollector<Text, DoubleWritable> output, Reporter reporter) throws IOException {
String line = value.toString();
line=line.replaceAll("U","");
int a=line.length();
if(a>2)
{
int spec=line.indexOf(' ');
String s=line.substring(spec,spec+9);
String b=line.substring(spec+10,a);
StringTokenizer tokenizer = new StringTokenizer(b);
while (tokenizer.hasMoreTokens()) {
{
temp=Double.valueOf(tokenizer.nextToken().toString());
}
output.collect(new Text(s), new DoubleWritable(temp));
}
}
}
}
// Reduce function:
public static class Reduce extends MapReduceBase implements Reducer<Text, DoubleWritable, Text, DoubleWritable> {
public void reduce(Text key, Iterator<DoubleWritable> values, OutputCollector<Text, DoubleWritable> output, Reporter reporter) throws IOException {
Double maxValue = Double.MIN_VALUE;
Double minvalue=Double.MAX_VALUE;
Double a;
while (values.hasNext())
{
a=values.next().get();
maxValue = Math.max(maxValue,a);
minvalue=Math.min(minvalue,a);
if(maxValue>40)
{
output.collect(key,new DoubleWritable(maxValue));
}
/* if(minvalue<10)
{
output.collect(key, new DoubleWritable(a));
} */
}
output.collect(new Text(key+"Max"), new DoubleWritable(maxValue));
output.collect(new Text(key+"Min"),new DoubleWritable(minvalue));
}
}

Related

Map Reduce - How to group and aggregate multiple attributes in a single job

I am currently struggling a bit with MapReduce.
I have the following dataset:
1,John,Computer
2,Anne,Computer
3,John,Mobile
4,Julia,Mobile
5,Jack,Mobile
6,Jack,TV
7,John,Computer
8,Jack,TV
9,Jack,TV
10,Anne,Mobile
11,Anne,Computer
12,Julia,Mobile
Now I want to apply MapReduce with grouping and
aggregation on this data set, in order that the output
doesn't only show how many times which person bought something,
but also what the product is, which the person ordered most.
So the output should look like:
John 3 Computer
Anne 3 Mobile
Jack 4 TV
Julia 2 Mobile
My current implementation of the mapper as well as reducer
looks like that, which perfectly returns how many orders were
made by the individuals, however, I am really clueless how
to get the desired output.
static class CountMatchesMapper extends Mapper<Object,Text,Text,IntWritable> {
#Override
protected void map(Object key, Text value, Context ctx) throws IOException, InterruptedException {
String row = value.toString();
String[] row_part = row.split(",");
try{
ctx.write(new Text(row_part[1]), new IntWritable(1));
catch (IOException e) {
}
catch (InterruptedException e) {
}
}
}
}
static class CountMatchesReducer extends Reducer<Text,IntWritable,Text,IntWritable> {
#Override
protected void reduce(Text key, Iterable<IntWritable> values, Context ctx) throws IOException, InterruptedException {
int i = 0;
for (IntWritable value : values) i += value.get();
try{
ctx.write(key, new IntWritable(i));
}
catch (IOException e) {
}
catch (InterruptedException e) {
}
}
}
I would really appreciate any efficient solution and help.
Thanks in advance!
If I understand correctly what you want, I think the 2nd output line should be:
Anne 3 Computer
based on the input. Anne has bought 3 products in total: 2 Computers and 1 Mobile.
I have here a very basic and simplistic approach, which doesn't take into account edge cases etc, but could give you some direction:
static class CountMatchesMapper extends Mapper<LongWritable, Text, Text, Text> {
private Text outputKey = new Text();
private Text outputValue = new Text();
#Override
protected void map(LongWritable key, Text value, Context ctx) throws IOException, InterruptedException {
String row = value.toString();
String[] row_part = row.split(",");
outputKey.set(row_part[1]);
outputValue.set(row_part[2]);
ctx.write(outputKey, outputValue);
}
}
static class CountMatchesReducer extends Reducer<Text, Text, Text, NullWritable> {
private Text output = new Text();
#Override
protected void reduce(Text key, Iterable<Text> values, Context ctx) throws IOException, InterruptedException {
HashMap<String, Integer> productCounts = new HashMap();
int totalProductsBought = 0;
for (Text value : values) {
String productBought = value.toString();
int count = 0;
if (productCounts.containsKey(productBought)) {
count = productCounts.get(productBought);
}
productCounts.put(productBought, count + 1);
totalProductsBought += 1;
}
String topProduct = getTopProductForPerson(productCounts);
output.set(key.toString() + " " + totalProductsBought + " " + topProduct);
ctx.write(output, NullWritable.get());
}
private String getTopProductForPerson(Map<String, Integer> productCounts) {
String topProduct = "";
int maxCount = 0;
for (Map.Entry<String, Integer> productCount : productCounts.entrySet()) {
if (productCount.getValue() > maxCount) {
maxCount = productCount.getValue();
topProduct = productCount.getKey();
}
}
return topProduct;
}
}
The above will give the output that you described.
If you want a proper solution that scales etc then probably you need a composite key and custom GroupComparator. This way you will be able to add Combiner as well and make it much more efficient. However, the approach above should work for an average case.

Hadoop: How to start 2 Mapper and 2 reducer

i'm trying to develop and Hadoop App. i want to start 2 Mapper and 2 Reducer in my main method. But the i keep getting a cast error, which bring me to ask how can i do this?
Mapper1:
#SuppressWarnings("javadoc")
public class IntervallMapper1 extends Mapper<LongWritable, Text, Text, LongWritable> {
private static Logger logger = Logger.getLogger(IntervallMapper1.class.getName());
private static Category categoriy;
private static Value value;
private String[] values = new String[4];
private final static LongWritable one = new LongWritable(1);
#Override
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
if(!this.categoriy.valueIsMissing(value.toString())){ // Luftdruck und Windstärke vorhanden...
this.logger.info("Key: " + values[0] + values[1]);
values = this.value.getValues(value.toString());
context.write(new Text(values[0] + values[1]), this.one); // Station-Datum als Key und Value = 1
}
}
}
Reducer1:
#SuppressWarnings("javadoc")
public class IntervallReducer1 extends Reducer<Text, LongWritable, Text, LongWritable> {
private static Logger logger = Logger.getLogger(IntervallReducer1.class.getName());
private String key = null;
private static LongWritable result = new LongWritable();
private long sum;
#Override
protected void reduce(Text key, Iterable<LongWritable> values, Context context)
throws IOException, InterruptedException {
for (LongWritable value : values) {
if(this.key == null){
logger.info("Erster Durchlauf");
System.out.println("---> " + value.get());
sum = value.get();
this.key = key.toString().substring(0, 10);
} else if (key.toString().contains(this.key)) { // TODO: key.toString().substring(0, 10)
logger.info("Key bereit vorhanden");
System.out.println("---> " + sum);
sum += value.get();
} else { // Falls Key nicht bereit vorhanden
logger.info("Key nicht vorhanden");
result.set(sum);
logger.info("Value: " + sum);
context.write(new Text(this.key), result);
this.key = key.toString().substring(0, 10);
sum = value.get();
}
}
}
}
Mapper2:
#SuppressWarnings("javadoc")
public class IntervallMapper1 extends Mapper<LongWritable, Text, Text, LongWritable> {
private static Logger logger = Logger.getLogger(IntervallMapper1.class.getName());
private static Category categoriy;
private static Value value;
private String[] values = new String[4];
private final static LongWritable one = new LongWritable(1);
#Override
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
if(!this.categoriy.valueIsMissing(value.toString())){ // Luftdruck und Windstärke vorhanden...
this.logger.info("Key: " + values[0] + values[1]);
values = this.value.getValues(value.toString());
context.write(new Text(values[0] + values[1]), this.one); // Station-Datum als Key und Value = 1
}
}
}
Main:
#SuppressWarnings("javadoc")
public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
Job job = Job.getInstance(new Configuration());
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(LongWritable.class);
job.setMapperClass(IntervallMapper1.class);
// job.setCombinerClass(IntervallReducer1.class);
job.setReducerClass(IntervallReducer1.class);
job.setMapperClass(IntervallMapper2.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
FileInputFormat.setInputPaths(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.setJarByClass(IntervallStart.class);
job.waitForCompletion(true);
}
Error:
Error: java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be cast to org.apache.hadoop.io.Text
at ncdcW03.IntervallMapper2.map(IntervallMapper2.java:1)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

Reduce does not start, After map completes

Below is the code for my Implementation of a simple MapReduce Job using a custom writable comparable.
public class MapReduceKMeans {
public static class MapReduceKMeansMapper extends
Mapper<Object, Text, SongDataPoint, Text> {
public void map(Object key, Text value, Context context)
throws InterruptedException, IOException {
String str = value.toString();
// Reading Line one by one from the input CSV.
String split[] = str.split(",");
String trackId = split[0];
String title = split[1];
String artistName = split[2];
SongDataPoint songDataPoint =
new SongDataPoint(new Text(trackId), new Text(title),
new Text(artistName));
context.write(songDataPoint, new Text());
}
}
public static class MapReduceKMeansReducer extends
Reducer<SongDataPoint, Text, Text, NullWritable> {
public void reduce(SongDataPoint key, Iterable<Text> values,
Context context) throws IOException, InterruptedException {
StringBuilder sb = new StringBuilder();
sb.append(key.getTrackId()).append("\t").
append(key.getTitle()).append("\t")
.append(key.getArtistName()).append("\t");
String write = sb.toString();
context.write(new Text(write), NullWritable.get());
}
}
public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
Configuration conf = new Configuration();
String[] otherArgs = new GenericOptionsParser(conf, args)
.getRemainingArgs();
if (otherArgs.length != 2) {
System.err
.println("Usage:<CsV Out Path> <Final Out Path>");
System.exit(2);
}
Job job = new Job(conf, "Song Data Trial");
job.setJarByClass(MapReduceKMeans.class);
job.setMapperClass(MapReduceKMeansMapper.class);
job.setReducerClass(MapReduceKMeansReducer.class);
job.setOutputKeyClass(SongDataPoint.class);
job.setOutputValueClass(Text.class);
FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
When I debug my code reads all the rows in the CSV file but it does not enter the reduce job at all.
I also have made use of the SongDataPoint as my custom writable.
Its code is as below.
public class SongDataPoint implements WritableComparable<SongDataPoint> {
Text trackId;
Text title;
Text artistName;
public SongDataPoint() {
this.trackId = new Text();
this.title = new Text();
this.artistName = new Text();
}
public SongDataPoint(Text trackId, Text title, Text artistName) {
this.trackId = trackId;
this.title = title;
this.artistName = artistName;
}
#Override
public void readFields(DataInput in) throws IOException {
this.trackId.readFields(in);
this.title.readFields(in);
this.artistName.readFields(in);
}
#Override
public void write(DataOutput out) throws IOException {
}
public Text getTrackId() {
return trackId;
}
public void setTrackId(Text trackId) {
this.trackId = trackId;
}
public Text getTitle() {
return title;
}
public void setTitle(Text title) {
this.title = title;
}
public Text getArtistName() {
return artistName;
}
public void setArtistName(Text artistName) {
this.artistName = artistName;
}
#Override
public int compareTo(SongDataPoint o) {
// TODO Auto-generated method stub
int compare = getTrackId().compareTo(o.getTrackId());
return compare;
}
}
Any help is appreciated. Thanks.
Your output key class class as per Driver is SongDataPoint.class and output value class as Text.class but actually you are writing Text as key in Reducer and Nullwritable as value in Reducer.
you should also specify the Mapper output values as following.
job.setMapOutputKeyClass(SongDataPoint.class);
job.setMapOutputValueClass(Text.class);
My write method in my CustomWritable Class was left blank by mistake. It solved the problem after writing the proper code in it.
public void write(DataOutput out) throws IOException {
}

Map Reduce job generating empty output file

Program is generating empty output file. Can anyone please suggest me where am I going wrong.
Any help will be highly appreciated. I tried to put job.setNumReduceTask(0) as I am not using reducer but still output file is empty.
public static class PrizeDisMapper extends Mapper<LongWritable, Text, Text, Pair>{
int rating = 0;
Text CustID;
IntWritable r;
Text MovieID;
public void map(LongWritable key, Text line, Context context
) throws IOException, InterruptedException {
String line1 = line.toString();
String [] fields = line1.split(":");
if(fields.length > 1)
{
String Movieid = fields[0];
String line2 = fields[1];
String [] splitline = line2.split(",");
String Custid = splitline[0];
int rate = Integer.parseInt(splitline[1]);
r = new IntWritable(rate);
CustID = new Text(Custid);
MovieID = new Text(Movieid);
Pair P = new Pair();
context.write(MovieID,P);
}
else
{
return;
}
}
}
public static class IntSumReducer extends Reducer<Text,Pair,Text,Pair> {
private IntWritable result = new IntWritable();
public void reduce(Text key, Iterable<Pair> values,
Context context
) throws IOException, InterruptedException {
for (Pair val : values) {
context.write(key, val);
}
}
public class Pair implements Writable
{
String key;
int value;
public void write(DataOutput out) throws IOException {
out.writeInt(value);
out.writeChars(key);
}
public void readFields(DataInput in) throws IOException {
key = in.readUTF();
value = in.readInt();
}
public void setVal(String aKey, int aValue)
{
key = aKey;
value = aValue;
}
Main class:
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
if (otherArgs.length != 2) {
System.err.println("Usage: wordcount <in> <out>");
System.exit(2);
}
Job job = new Job(conf, "word count");
job.setJarByClass(WordCount.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
job.setInputFormatClass (TextInputFormat.class);
FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Pair.class);
System.exit(job.waitForCompletion(true) ? 0 : 1);
Thanks #Pathmanaban Palsamy and #Chris Gerken for your suggestions. I have modified the code as per your suggestions but still getting empty output file. Can anyone please suggest me configurations in my main class for input and output. Do I need to specify Pair class in input to mapper & how?
I'm guessing the reduce method should be declared as
public void reduce(Text key, Iterable<Pair> values,
Context context
) throws IOException, InterruptedException
You get passed an Iterable (an object from which you can get an Iterator) which you use to iterate over all of the values that were mapped to the given key.
Since no reducer required, I suspect below line
Pair P = new Pair();
context.write(MovieID,P);
empty Pair would be the issue.
also pls check your Driver class you have given correct keyclass and valueclass like
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(Pair.class);

Read text file from System to Hbase MapReduce

I need to load data from text file to Map Reduce, I have searched the web, but I didn't find any right solution for my work.
Is there any method or class which reads a text /csv file from a system and store the data into HBASE Table.
For reading from text file first of all the text file should be in hdfs.
You need to specify input format and outputformat for job
Job job = new Job(conf, "example");
FileInputFormat.addInputPath(job, new Path("PATH to text file"));
job.setInputFormatClass(TextInputFormat.class);
job.setMapperClass(YourMapper.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(Text.class);
TableMapReduceUtil.initTableReducerJob("hbase_table_name", YourReducer.class, job);
job.waitForCompletion(true);
YourReducer should extends org.apache.hadoop.hbase.mapreduce.TableReducer<Text, Text, Text>
Sample reducer code
public class YourReducer extends TableReducer<Text, Text, Text> {
private byte[] rawUpdateColumnFamily = Bytes.toBytes("colName");
/**
* Called once at the beginning of the task.
*/
#Override
protected void setup(Context context) throws IOException, InterruptedException {
// something that need to be done at start of reducer
}
#Override
public void reduce(Text keyin, Iterable<Text> values, Context context) throws IOException, InterruptedException {
// aggregate counts
int valuesCount = 0;
for (Text val : values) {
valuesCount += 1;
// put date in table
Put put = new Put(keyin.toString().getBytes());
long explicitTimeInMs = new Date().getTime();
put.add(rawUpdateColumnFamily, Bytes.toBytes("colName"), explicitTimeInMs,val.toString().getBytes());
context.write(keyin, put);
}
}
}
Sample mapper class
public static class YourMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while (tokenizer.hasMoreTokens()) {
word.set(tokenizer.nextToken());
context.write(word, one);
}
}
}

Categories