Map Reduce Array Out of Bounds Exception - java

I am very confused why this is happening. I have been working on this for some time and I just don't understand.
My Map code works as I am able to verify the output in the directory it is in.
This is the method:
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
String stateKeyword = value.toString();
String[] pieces = new String[] {stateKeyword};
for (String element : pieces) {
String name = element.split(":")[0].trim();
String id = element.split(":")[1].trim();
Integer rank = Integer.parseInt(element.split(":")[2].trim());
context.write(new Text(name), new Text(id + ":" + rank));
}
}
So my Output will have the concatenation of the id and rank field. I can see it in the output file if I print the value normally.
However, any split manipulation I execute throws aArrayOutOfBoundsException and I can't understand why. I even do a check if the value contains a ":" and it will print but it won't split. But when I don't make this check I get the exception.
Here is my reduce:
public void reduce(Text key, Iterable values, Context context) throws IOException, InterruptedException {
List<String> elements = new ArrayList<String>();
Text word = new Text();
for (Text val : values) {
if (val.toString().contains(":")) {
String state = val.toString().split(":")[0];
word.set(state);
}
context.write(key, word);
}
}
My output in my file looks like this:
Name id:rank
Name id:rank
Name id:rank
...
...
...
But why can't I split off the id and rank?

To avoid ArrayOutOfBoundsException, check array size before getting value out of the array. Something like this will be more appropriate:
String[] temp = element.split(":");
if(element.size()==2){
String name = temp[0].trim();
String id = temp[1].trim();
}

Related

Hadoop Reducer does not work

I am having trouble with a MapReduce Job. My map function does run and it produces the desired output. However, the reduce function does not run. It seems like the function never gets called. I am using Text as keys and Text as values. But I don't think that this causes the problem.
The input file is formatted as follows:
2015-06-06,2015-06-06,40.80239868164062,-73.93379211425781,40.72591781616211,-73.98358154296875,7.71,35.72
2015-06-06,2015-06-06,40.71020126342773,-73.96302032470703,40.72967529296875,-74.00226593017578,3.11,2.19
2015-06-05,2015-06-05,40.68404388427734,-73.97597503662109,40.67932510375977,-73.95581817626953,1.13,1.29
...
I want to extract the second date of a line as Text and use it as key for the reduce. The value for the key will be a combination of the last two float values in the same line.
i.e.: 2015-06-06 7.71 35.72
2015-06-06 9.71 66.72
So that the value part can be viewed as two columns separated by a blank.
That actually works and I get an output file with many same keys but different values.
Now I want to sum up the both of the float columns for each key, so that after the reduce I get a date as key with the summed up columns as value.
Problem: reduce does not run.
See the code below:
Mapper
public class Aggregate {
public static class EarnDistMapper extends Mapper<Object, Text, Text, Text> {
public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
String [] splitResult = value.toString().split(",");
String dropOffDate = "";
String compEarningDist = "";
//dropoffDate at pos 1 as key
dropOffDate = splitResult[1];
//distance at pos length-2 and earnings at pos length-1 as values separated by space
compEarningDist = splitResult[splitResult.length -2] + " " + splitResult[splitResult.length-1];
context.write(new Text(dropOffDate), new Text(compEarningDist));
}
}
Reducer
public static class EarnDistReducer extends Reducer<Text,Text,Text,Text> {
public void reduce(Text key, Iterator<Text> values, Context context) throws IOException, InterruptedException {
float sumDistance = 0;
float sumEarnings = 0;
String[] splitArray;
while (values.hasNext()){
splitArray = values.next().toString().split("\\s+");
//distance first
sumDistance += Float.parseFloat(splitArray[0]);
sumEarnings += Float.parseFloat(splitArray[1]);
}
//combine result to text
context.write(key, new Text(Float.toString(sumDistance) + " " + Float.toString(sumEarnings)));
}
}
Job
public static void main(String[] args) throws Exception{
// TODO
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "Taxi dropoff");
job.setJarByClass(Aggregate.class);
job.setMapperClass(EarnDistMapper.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(Text.class);
job.setCombinerClass(EarnDistReducer.class);
job.setReducerClass(EarnDistReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
Thank you for your help!!
You have the signature of the reduce method wrong. You have:
public void reduce(Text key, Iterator<Text> values, Context context) {
It should be:
public void reduce(Text key, Iterable<Text> values, Context context) {

Text not converting to String Hadoop Java

I am trying to convert Text to String in my reduce function but its not working. I tried the same logic in Map function and it worked perfectly, but when I tried to apply this logic in my reduce function it is giving error: java.lang.ArrayIndexOutOfBoundsException 1
My Map code is like this
public static class OutDegreeMapper2
extends Mapper<Object, Text, Text, Text>
{
private Text word = new Text();
private Text word2 = new Text();
public void map(Object key, Text value, Context context
) throws IOException, InterruptedException
{
String oneLine = value.toString();
String[] parts = oneLine.split("\t");
word.set(parts[0]);
String join = parts[1]+",from2";
word2.set(join);
context.write(word, word2);
}
}
My reduce function is like this
public static class OutDegreeReducer
extends Reducer<Text,Text,Text,Text>
{
private Text word = new Text();
String merge ="";
public void reduce(Text key, Iterable<Text> values,
Context context
) throws IOException, InterruptedException
{
for(Text val:values)
{
String[] x = val.toString().split(",");
if(x[1].contains("from2")){
merge+= x[0];
}
}
word.set(merge);
context.write(key, word);
}
}
Kindly tell me why split is working in map function but not in reducer?
Very likely here
String[] parts = oneLine.split("\t");
word.set(parts[0]);
String join = parts[1]+",from2";
or here
String[] x = val.toString().split(",");
if(x[1].contains("from2")){
merge+= x[0];
}
when read x[1] or parts[1] throws the ArrayIndexOutOfBoundsException because there is no , and \t inside the string.
I suggest to check the size of the array before access the element 1.
Looking at the stacktrace you should be able to understand where is throwing the exception.
Instead of
if(x.length() > 1 && x[1].contains("from2")){
merge+= x[0];
}
Do this:
if(x.length() > 1 && x[1].contains("from2")){
merge+= x[0];
}

How to create a custom output format in Hadoop

I am trying to create a variation of the word count hadoop program in which it reads multiple files in a directory and outputs the frequency of each word. The thing is, I want it to output a word followed by the file name is came from and the frequency from that file. for example:
word1
( file1, 10)
( file2, 3)
( file3, 20)
So for word1 (say the word "and"). It finds it 10 times is file1, 3 times in file2, ect. Right now it is outputing only a key value pair
StringTokenizer itr = new StringTokenizer(chapter);
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
context.write(word, one);
I can get the file name by
String fileName = ((FileSplit) context.getInputSplit()).getPath().getName();
But I do not understand how to format the way I want. I've been looking into OutputCollector, but I am unsure of how to use it exactly.
EDIT: This is my mapper and recuder
public static class TokenizerMapper
extends Mapper<Object, Text, Text, Text>{
private Text word = new Text();
public void map(Object key, Text value, Context context
) throws IOException, InterruptedException {
//Take out all non letters and make all lowercase
String chapter = value.toString();
chapter = chapter.toLowerCase();
chapter = chapter.replaceAll("[^a-z]"," ");
//This is the file name
String fileName = ((FileSplit) context.getInputSplit()).getPath().getName();
StringTokenizer itr = new StringTokenizer(chapter);
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
context.write(word, new Text(fileName)); //
}
}
}
public static class IntSumReducer
extends Reducer<Text,Text,Text,Text> { second
public void reduce(Text key, Iterable<Text> values, Context context)
throws IOException, InterruptedException {
Map<String, Integer> files = new HashMap<String, Integer>();
for (Text val : values) {
if (files.containsKey(val.toString())) {
files.put(val.toString(), files.get(val.toString())+1);
} else {
files.put(val.toString(), 1);
}
}
String outputString="";
for (String file : files.keySet()) {
outputString = outputString + "\n<" + file + ", " + files.get(file) + ">"; //files.get(file)
}
context.write(key, new Text(outputString));
}
}
This is outputting for the word "a" for example:
a
(
(chap02, 53), 1)
(
(chap18, 50), 1)
I am unsure of why its making a key value pair a key for a value 1 for each entry.
I don't think you need a custom output format at all for this. So long as you pass the filename along to the reducer, you should be able to do this simply by modifying the String that you use within a TextOutputFormat type operation. Explanation is below.
In the mapper get the filename, and append it to a textInputFormat as below
String fileName = ((FileSplit) context.getInputSplit()).getPath().getName();
context.write(key,new Text(fileName));
Then in the reducer do something like the following:
public void reduce(Text key, Iterable<Text> values, Context context)
throws IOException, InterruptedException {
Map<String, Integer> files = new HashMap<String, Integer>();
for (Text val : values) {
if (files.containsKey(val.toString())) {
files.put(val.toString(), files.get(val.toString()) + 1);
} else {
files.put(val.toString(), 1);
}
}
String outputString = key.toString();
for (String file : files.keySet()) {
outputString += "\n( " + file + ", " + files.get(file) + ")";
}
context.write(key, new Text(outputString));
}
This reducer appends "\n" to the beginning of every line, in order to force the display formatting to be exactly what you want.
This seems much simpler than writing your own outputformat.

Array is Returning NULL

My piece of code..
package com.xchanging.selenium.utility;
import java.io.IOException;
import java.util.LinkedHashMap;
import org.apache.commons.lang3.ArrayUtils;
public class DataProviderConvertor extends ReadExcel {
public static Object[][] convertData(String sheetName, String testCaseName)
throws IOException {
LinkedHashMap<String, String> table = ReadExcel.getData(sheetName,
testCaseName);
String[] myStringArray = new String[table.size()];
for (String key : table.values()) {
System.out.println("Keyvalues " + key.toString());
String value = key.toString();
ArrayUtils.add(myStringArray, value);
}
System.out.println("1st Index: " myStringArray[0]);
}
}
It is returning
Keyvalues Y
Keyvalues ie
Keyvalues QC
Keyvalues Yes
Keyvalues Rework Checklist Comments
Keyvalues Yes
Keyvalues MRI Updated Comments
1st Index: null
I am expecting 6 elements in this array but all are returning NULL.. Why it is not returning the expected values..??
How about much simpler way.
public static Object[][] convertData(String sheetName, String testCaseName)
throws IOException {
LinkedHashMap<String, String> table = ReadExcel.getData(sheetName,
testCaseName);
String[] myStringArray = table.values().toArray( new String[ 0 ] );
System.out.println("1st Index: " + myStringArray[0]);
}
Try
for (String key : table.values()) {
System.out.println("Keyvalues " + key.toString());
String value = key.toString();
myStringArray =ArrayUtils.addAll(myStringArray, value);
}
Or
int cnt=0;
for (String key : table.values()) {
System.out.println("Keyvalues " + key.toString());
String value = key.toString();
myStringArray[cnt++] =value;
}
ArrayUtil.add method copies the array and then adds the new element at the end of the new copied array.
So, i think that's where the problem lies in your code.
That is why myStringArray is showing the size as 0.As the myStringArray is copied and a new string array is formed and then the element is added to it.
You can try this -:
int index =0;
for (String key : table.values()) {
System.out.println("Keyvalues " + key.toString());
String value = key.toString();
myStringArray[index++] = value;
}
ArrayUtils.add(myStringArray, value);
This method creates and returns a new array with the value added to it.
You need to assign the result:
myStringArray = ArrayUtils.add(myStringArray, value);
It has to be done that way because arrays cannot be resized.
Just two changes required, code is perfectly fine otherwise.
1.) String[] myStringArray = new String[table.size()];, change to
String[] myStringArray = null;
2.) ArrayUtils.add(myStringArray, value); change to
myStringArray = ArrayUtils.add(myStringArray, value);
Rest you can read and debug about this method in API.

MapReduce Program in Java

I have the following Mapper
private Text sentiment = new Text();
public void map(LongWritable key, Text value, OutputCollector<Text, Text> output, Reporter reporter)
throws IOException {
String allPages = value.toString();
String[] tokens = allPages.split(":::");
for(int i=0;i<(tokens.length-1);i++)
{
String articleID="";
sentiment.set(tokens[i].trim());
articleID = tokens[0].trim();
System.out.println("articleID "+articleID);
Text articleIDValue = new Text(articleID);
output.collect(sentiment,articleIDValue);
}
String line = "";
for(int j=1;j<tokens.length;j++){
line = line + " "+tokens[j];
System.out.println("line.... "+line);
}
Text lineText = new Text(line.trim());
output.collect(new Text(tokens[0]),lineText);
}
Sample Input: abc ::: In a market that's awash in tech IPOs, this one is different.
should store keyValue pair as (abc,In a market that's awash in tech IPOs, this one is different.)
Right now this stores as (abc,abc).. Where am I going wrong?
I suspect you're seeing the result of the first collect() call in which both key and value are set from tokens[0] ("abc").

Categories