Debug MapReduce code to find the cause of NULLPointerException error - java

I am trying to run a MapReduce code to calculate average satisfaction level per department. Below is the sample data. The file has headers and in order to remove the headers I placed these in another txt file and used it to check if the row values are not equal to headers.
Sample Data
satisfaction_level,last_evaluation,number_project,average_montly_hours,time_spend_company,Work_accident,left,promotion_last_5years,sales,salary
0.38,0.53,2,157,3,0,1,0,sales,low
0.8,0.86,5,262,6,0,1,0,sales,medium
0.11,0.88,7,272,4,0,1,0,sales,medium
0.72,0.87,5,223,5,0,1,0,sales,low
0.37,0.52,2,159,3,0,1,0,sales,low
0.41,0.5,2,153,3,0,1,0,sales,low
0.1,0.77,6,247,4,0,1,0,sales,low
0.92,0.85,5,259,5,0,1,0,sales,low
0.89,1,5,224,5,0,1,0,sales,low
Below is my mapper code.
public class SatisfactionLevelMapper extends Mapper<LongWritable, Text, Text, FloatWritable>
{
String header;
#Override
protected void setup(Mapper.Context context) throws IOException, InterruptedException
/* Headers are placed in a separate header.txt file for which the location is specified in the driver */
{
BufferedReader bufferedReader = new BufferedReader(new FileReader("header.txt"));
header = bufferedReader.readLine();
}
public void map(LongWritable key, Text text, Context context) throws IOException, InterruptedException
{
String row = text.toString();
String [] values = row.trim().split(","); //dataset is a CSV file with 10 columns
float satisfaction = 0.0f;
String dept = null;
try
{
if(values.length == 9 && header != row)
{
satisfaction = Float.parseFloat(values[0]); }
}
catch (Exception e)
{
e.printStackTrace();
}
context.write(new Text(dept), new FloatWritable(satisfaction));
}
}
I am getting a NullPointerException Error as below.
Error: java.lang.NullPointerException
at org.apache.hadoop.io.Text.encode(Text.java:450)
at org.apache.hadoop.io.Text.set(Text.java:198)
at org.apache.hadoop.io.Text.<init>(Text.java:88)
at com.df.hra.SatisfactionLevelMapper.map SatisfactionLevelMapper.java:62)
at com.df.hra.SatisfactionLevelMapper.map(SatisfactionLevelMapper.java:1)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Is there a way to figure out which part of my code has the issue?
I am new to Java and trying to figure out a way to debug MapReduce codes. Any help would be appreciated. Thanks!

You cannot initialize a Text value with null, as you're doing here (with dept). Consider using an empty String ("") instead.

Related

CSV converter Spring boot

I am writing a handler in my spring boot application to convert object into CSV . If the user wants only header , only CSV header should be returned. there are two other options - header_and_body and body_only.
for some reason, header_only returns only empty file. Why is that ? I tried to add p.println() for the header_only option, still only empty file is returned
The complete code is below: Well not complete , I kept getting this error in Stack Overflow " It looks like your post is mostly code; please add some more details." So I removed few irrelevant code for the topic under discussion.
public class CsvHttpMessageConverter extends AbstractHttpMessageConverter<CsvResponse> {
#Override
protected void writeInternal(CsvResponse response, HttpOutputMessage output)
throws IOException, HttpMessageNotWritableException {
output.getHeaders().setContentType(MEDIA_TYPE);
output.getHeaders().set("Content-Disposition", "attachment; filename=\"" + response.getFilename() + "\"");
OutputStream out = output.getBody();
writeRecords(response, out, response.getOption());
out.close();
}
private void writeRecords(CsvResponse response, OutputStream out, String option) throws IOException {
PrintWriter p = new PrintWriter(out);
List<DynamicMarketData> list = response.getRecords();
String header = mdhTotemConfiguration.getColumnListForCommodities();
if (option.equals("header_only") || option.equals("header_and_body"))
{
p.print(header);
}
if (option.equals("body_only") || option.equals("header_and_body"))
{
for (DynamicMarketData elem : list) {
p.println();
Iterator<Map.Entry<String, String>> entries = elem.getDynamicMarketData().entrySet().iterator();
do {
Map.Entry<String, String> entry = entries.next();
p.print(entry.getValue());
if (entries.hasNext() == true)
p.print(",");
} while (entries.hasNext());
}
}
}
}

How to solve org.apache.hadoop.io.LongWritable cannot be cast to org.apache.hadoop.io.Text

I am trying to analyze a retail store data where i want to solve the breakdown of sales by city ,Here is my data
Date Time City Product-Cat Sale-Value Payment-Mode
2012-01-01 09:20 Fort Worth Women's Clothing 153.57 Visa
2012-01-01 09:00 San Jose Mens Clothing 214.05 Rupee
2012-01-01 09:00 San Diego Music 76.43 Amex
2012-01-01 09:00 New York Cameras 45.76 Visa
Now i want to calculate sales break down by product category across all the stores
Here is the Mapper and reducer and the main class
public class RetailDataAnalysis {
public static class RetailDataAnalysisMapper extends Mapper<Text,Text,Text,Text>{
// when trying with LongWritable Key
public void map(LongWritable key,Text Value,Context context) throws IOException, InterruptedException{
String analyser [] = Value.toString().split(",");
Text productCategory = new Text(analyser[3]);
Text salesPrice = new Text(analyser[4]);
context.write(productCategory, salesPrice);
}
// When trying with Text key
public void map(Text key,Text Value,Context context) throws IOException, InterruptedException{
String analyser [] = Value.toString().split(",");
Text productCategory = new Text(analyser[3]);
Text salesPrice = new Text(analyser[4]);
context.write(productCategory, salesPrice);
}
}
public static class RetailDataAnalysisReducer extends Reducer<Text,Text,Text,Text>{
protected void reduce(Text key,Iterable<Text> values,Context context)throws IOException, InterruptedException{
String csv ="";
for(Text value:values){
if(csv.length()>0){
csv+= ",";
}
csv+=value.toString();
}
context.write(key, new Text(csv));
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
String [] otherArgs = new GenericOptionsParser(conf,args).getRemainingArgs();
if(otherArgs.length<2){
System.out.println("Usage Retail Data ");
System.exit(2);
}
Job job= new Job(conf,"Retail Data Analysis");
job.setJarByClass(RetailDataAnalysis.class);
job.setMapperClass(RetailDataAnalysisMapper.class);
job.setCombinerClass(RetailDataAnalysisReducer.class);
job.setReducerClass(RetailDataAnalysisReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
for(int i=0;i<otherArgs.length-1;++i){
FileInputFormat.addInputPath(job, new Path(otherArgs[i]));
}
FileOutputFormat.setOutputPath(job, new Path(otherArgs[otherArgs.length-1]));
System.exit(job.waitForCompletion(true)?0:1);
}
}
And the exception i am getting is when using LongWritable Key,
18/04/11 09:15:40 INFO mapreduce.Job: Task Id : attempt_1523355254827_0008_m_000000_2, Status : FAILED
Error: java.io.IOException: Type mismatch in key from map: expected org.apache.hadoop.io.Text, received org.apache.hadoop.io.LongWritable
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1069)
Exception i am getting when trying to use Text key
Error: java.io.IOException: Type mismatch in key from map: expected org.apache.hadoop.io.Text, received org.apache.hadoop.io.LongWritable
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1069)
at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:712)
at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
at org.apache.hadoop.mapreduce.Mapper.map(Mapper.java:124)
Please help me to solve this,i am very new to hadoop.
You may need different input format class. By default used is TextInputFormat which split the file line by line and gives line number as LongWritable and the line as Text.
You can specify the input format class this way:
job.setInputFormatClass(TextInputFormat.class);
In your case, if you do not need the key, just the values, you can use LongWritable as key:
public static class RetailDataAnalysisMapper extends Mapper<LongWritable, Text, Text, Text> {
public void map(LongWritable key, Text Value, Context context) throws IOException, InterruptedException {
//...
}
}
Edit:
Here is whole code after modyfing to use LongWritable as key:
public class RetailDataAnalysis {
public static class RetailDataAnalysisMapper extends Mapper<LongWritable, Text, Text, Text> {
public void map(LongWritable key, Text Value, Context context) throws IOException, InterruptedException {
String analyser[] = Value.toString().split(",");
Text productCategory = new Text(analyser[3]);
Text salesPrice = new Text(analyser[4]);
context.write(productCategory, salesPrice);
}
}
public static class RetailDataAnalysisReducer extends Reducer<Text, Text, Text, Text> {
protected void reduce(Text key, Iterable<Text> values, Context context)
throws IOException, InterruptedException {
String csv = "";
for (Text value : values) {
if (csv.length() > 0) {
csv += ",";
}
csv += value.toString();
}
context.write(key, new Text(csv));
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
if (otherArgs.length < 2) {
System.out.println("Usage Retail Data ");
System.exit(2);
}
Job job = new Job(conf, "Retail Data Analysis");
job.setJarByClass(RetailDataAnalysis.class);
job.setMapperClass(RetailDataAnalysisMapper.class);
job.setCombinerClass(RetailDataAnalysisReducer.class);
job.setReducerClass(RetailDataAnalysisReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
for (int i = 0; i < otherArgs.length - 1; ++i) {
FileInputFormat.addInputPath(job, new Path(otherArgs[i]));
}
FileOutputFormat.setOutputPath(job, new Path(otherArgs[otherArgs.length - 1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
Also if you are splitting the data by ,, your data should be a csv, like this:
2012-01-01 09:20,Fort Worth,Women's Clothing,153.57,Visa
2012-01-01 09:00,San Jose,Mens Clothing,214.05,Rupee
2012-01-01 09:00,San Diego,Music,76.43,Amex
2012-01-01 09:00,New York,Cameras,5.76,Visa
Not space separated as you specified it in your question.
When you read a file using Map Reduce, the file input format ( the default one ) reads an entire line and sends it to the mapper in the format of , so the input to the mapper becomes :-
public static class RetailDataAnalysisMapper extends Mapper<LongWritable,Text,Text,Text>
In case you need to read as
public static class RetailDataAnalysisMapper extends Mapper<Text,Text,Text,Text>
you would need to change the file input format and use your custom file input format along with the custom record reader.
Then you need to add the following line in the driver code.
job.setInputFormatClass("your custom input format".class);
Hadoop understands everything in the form of
so when you read a file, the offset becomes the LongWritable key and the value read becomes the value.
So you need to use the default signature of Mapper<LongWritable,Text, <anything>,<anything> >

ClassCastException:java.lang.Exception: java.lang.ClassCastException in mapred

I am writing a mapreduce application which takes input in (key, value) format and just displays the same data as output from reducer.
This is the sample input:
1500s 1
1960s 1
Aldus 1
In the below code, I am specifying the input format using <<>> and specified the delimiter as tab in the main(). When I run the code, I am running into the error message:
java.lang.Exception: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.io.LongWritable
at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
Caused by: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.io.LongWritable
at cscie63.examples.WordDesc$KVMapper.map(WordDesc.java:1)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
Tried different things to debug but nothing helped.
public class WordDesc {
public static class KVMapper
extends Mapper<Text, LongWritable, Text, LongWritable>{
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(Text key, LongWritable value , Context context
) throws IOException, InterruptedException {
context.write(key,value);
}
}
public static class KVReducer
extends Reducer<Text,LongWritable,Text,LongWritable> {
private IntWritable result = new IntWritable();
public void reduce(Text key, LongWritable value,
Context context
) throws IOException, InterruptedException {
context.write(key, value);
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
conf.set("mapreduce.input.keyvaluelinerecordreader.key.value.separator", "\t");
String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
if (otherArgs.length < 2) {
System.err.println("Usage: wordcount <in> [<in>...] <out>");
System.exit(2);
}
Job job = new Job(conf, "word desc");
job.setInputFormatClass(KeyValueTextInputFormat.class);
job.setJarByClass(WordDesc.class);
job.setMapperClass(KVMapper.class);
job.setCombinerClass(KVReducer.class);
job.setReducerClass(KVReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(LongWritable.class);
for (int i = 0; i < otherArgs.length - 1; ++i) {
FileInputFormat.addInputPath(job, new Path(otherArgs[i]));
}
FileOutputFormat.setOutputPath(job,
new Path(otherArgs[otherArgs.length - 1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
I guess that this line job.setInputFormatClass(KeyValueTextInputFormat.class); tells your program to treat your input as key value pairs of Text. Therefore, when you require your input value to be a LongWritable you get this Exception.
A quick fix would be to read your input as Text and then, if you want to use a LongWritable, parse it using:
public static class KVMapper
extends Mapper<Text, Text, Text, LongWritable>{
private final static LongWritable val = new LongWritable();
public void map(Text key, Text value, Context context) {
val.set(Long.parseLong(value.toString()));
context.write(key,val);
}
}
What it does is the following: value is Text, then value.toString() gives the String representation of this Text and then Long.parseLong() parses this String as long. Finally, val.set(), transforms it to a a LongWritable.
By the way, I don't think that you need a Reducer for that... You could make it faster by setting the number of reduce tasks to 0.

Job failed Exception hadoop

I am using multi text output formate to create multiple files of a single file i.e each line on new file.
This is my code:
public class MOFExample extends Configured implements Tool {
private static double count = 0;
static class KeyBasedMultipleTextOutputFormat extends
MultipleTextOutputFormat<Text, Text> {
#Override
protected String generateFileNameForKeyValue(Text key, Text value,
String name) {
return count++ + "_";// + name;
}
}
/**
* The main job driver.
*/
public int run(final String[] args) throws Exception {
Path csvInputs = new Path(args[0]);
Path outputDir = new Path(args[1]);
JobConf jobConf = new JobConf(super.getConf());
jobConf.setJarByClass(MOFExample.class);
jobConf.setMapperClass(IdentityMapper.class);
jobConf.setInputFormat(KeyValueTextInputFormat.class);
jobConf.setOutputFormat(KeyBasedMultipleTextOutputFormat.class);
jobConf.setOutputValueClass(Text.class);
jobConf.setOutputKeyClass(Text.class);
FileInputFormat.setInputPaths(jobConf, csvInputs);
FileOutputFormat.setOutputPath(jobConf, outputDir);
//jobConf.setNumMapTasks(4);
jobConf.setNumReduceTasks(4);
return JobClient.runJob(jobConf).isSuccessful() ? 0 : 1;
}
public static void main(final String[] args) throws Exception {
int res = ToolRunner.run(new Configuration(), new MOFExample(), args);
System.exit(res);
}
}
This code runs fine on small text file but when the number of lines of input file are greater than 1900 which is yet not a large file it throws an exception:
Exception in thread "main" java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:836)
at MOFExample.run(MOFExample.java:57)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at MOFExample.main(MOFExample.java:61)
I also tried this tutorial but this one returns empty output directory without any exception when the input file is large however this one also worked fine with small input file.
Note: I am using Single-Node Cluster

Error delivering a csv file through the browser

What my application is doing is creating a large csv file (its a report) and the idea is to deliver the contents of the csv file without actually saving a file for it. Here's my code
String csvData; //this is the string that contains the csv contents
byte[] csvContents = csvData.getBytes();
response.contentType = "text/csv";
response.headers.put("Content-Disposition", new Header(
"Content-Disposition", "attachment;" + "test.csv"));
response.headers.put("Cache-Control", new Header("Cache-Control",
"max-age=0"));
response.out.write(csvContents);
ok();
The csv files that are being generated are rather large and the error i am getting is
org.jboss.netty.handler.codec.frame.TooLongFrameException: An HTTP line is larger than 4096 bytes.
Whats the best way to overcome this issue?
My tech stack is java 6 with play framework 1.2.5.
Note: the origin of the response object is play.mvc.Controller.response
Please use
ServletOutputStream
like
String csvData; //this is the string that contains the csv contents
byte[] csvContents = csvData.getBytes();
ServletOutputStream sos = response.getOutputStream();
response.setContentType("text/csv");
response.setHeader("Content-Disposition", "attachment; filename=test.csv");
sos.write(csvContents);
We use this to show the results of an action directly in the browser,
window.location='data:text/csv;charset=utf8,' + encodeURIComponent(your-csv-data);
I am not sure about the out of memory error but I would at least try this:
request.format = "csv";
renderBinary(new ByteArrayInputStream(csvContents));
Apparently netty complains that the http-header is too long - maybe it somehow thinks that your file is part of the header, see also
http://lists.jboss.org/pipermail/netty-users/2010-November/003596.html
as nylund states, using renderBinary should do the trick.
We use writeChunk oursleves to output large reports on the fly, like:
Controller:
public static void getReport() {
final Report report = new Report(code, from, to );
try {
while (report.hasMoreData()) {
final String data = await(report.getData());
response.writeChunk(data);
}
} catch (final Exception e) {
final Throwable cause = e.getCause();
if (cause != null && cause.getMessage().contains("HTTP output stream closed")) {
logger.warn(e, "user cancelled download");
} else {
logger.error(e, "error retrieving data");
}
}
}
in report code
public class Report {
public Report(final String code, final Date from, final Date to) {
}
public boolean hasMoreData() {
// find out if there is more data
}
public Future<String> getData() {
final Job<String> queryJob = new Job<String>() {
#Override
public String doJobWithResult() throws Exception {
// grab data (e.g read form db) and return it
return data;
}
};
return queryJob.now();
}
}

Categories