error while read/write (Hadoop) - java

I followed this tutorial to install hadoop. Well, everything works fine if I use \usr\local\hadoop as hadoop.tmp.dir. since I have very little space in this partition, I tried to set this value to /NEW_partition/(ext4) but I was always getting some java error. I guess this is because hadoop is not able to write to this partition. How can I make it work?
::::EDIT::::
complete execution result.
hadoop#FreeLnx:/usr/local/hadoop-0.20.203.0$ bin/hadoop jar hadoop-examples-0.20.203.0.jar wordcount /MY_STORAGE/tmp1/gutnb /MY_STORAGE/tmp1/gutnbou
12/02/12 02:56:00 INFO input.FileInputFormat: Total input paths to process : 3
12/02/12 02:56:00 INFO mapred.JobClient: Running job: job_201202120255_0001
12/02/12 02:56:01 INFO mapred.JobClient: map 0% reduce 0%
12/02/12 02:56:09 INFO mapred.JobClient: Task Id : attempt_201202120255_0001_m_000004_0, Status : FAILED
java.lang.Throwable: Child Error
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:271)
Caused by: java.io.IOException: Task process exit with nonzero status of 126.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:258)
12/02/12 02:56:09 WARN mapred.JobClient: Error reading task outputhttp://FreeLnx:50060/tasklog?plaintext=true&attemptid=attempt_201202120255_0001_m_000004_0&filter=stdout
12/02/12 02:56:09 WARN mapred.JobClient: Error reading task outputhttp://FreeLnx:50060/tasklog?plaintext=true&attemptid=attempt_201202120255_0001_m_000004_0&filter=stderr
12/02/12 02:56:15 INFO mapred.JobClient: Task Id : attempt_201202120255_0001_m_000004_1, Status : FAILED
java.lang.Throwable: Child Error
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:271)
Caused by: java.io.IOException: Task process exit with nonzero status of 126.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:258)
12/02/12 02:56:15 WARN mapred.JobClient: Error reading task outputhttp://FreeLnx:50060/tasklog?plaintext=true&attemptid=attempt_201202120255_0001_m_000004_1&filter=stdout
12/02/12 02:56:15 WARN mapred.JobClient: Error reading task outputhttp://FreeLnx:50060/tasklog?plaintext=true&attemptid=attempt_201202120255_0001_m_000004_1&filter=stderr
12/02/12 02:56:21 INFO mapred.JobClient: Task Id : attempt_201202120255_0001_m_000004_2, Status : FAILED
java.lang.Throwable: Child Error
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:271)
Caused by: java.io.IOException: Task process exit with nonzero status of 126.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:258)
12/02/12 02:56:21 WARN mapred.JobClient: Error reading task outputhttp://FreeLnx:50060/tasklog?plaintext=true&attemptid=attempt_201202120255_0001_m_000004_2&filter=stdout
12/02/12 02:56:21 WARN mapred.JobClient: Error reading task outputhttp://FreeLnx:50060/tasklog?plaintext=true&attemptid=attempt_201202120255_0001_m_000004_2&filter=stderr
12/02/12 02:56:33 INFO mapred.JobClient: Task Id : attempt_201202120255_0001_m_000003_0, Status : FAILED
java.lang.Throwable: Child Error
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:271)
Caused by: java.io.IOException: Task process exit with nonzero status of 126.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:258)
12/02/12 02:56:33 WARN mapred.JobClient: Error reading task outputhttp://FreeLnx:50060/tasklog?plaintext=true&attemptid=attempt_201202120255_0001_m_000003_0&filter=stdout
12/02/12 02:56:33 WARN mapred.JobClient: Error reading task outputhttp://FreeLnx:50060/tasklog?plaintext=true&attemptid=attempt_201202120255_0001_m_000003_0&filter=stderr
12/02/12 02:56:39 INFO mapred.JobClient: Task Id : attempt_201202120255_0001_m_000003_1, Status : FAILED
java.lang.Throwable: Child Error
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:271)
Caused by: java.io.IOException: Task process exit with nonzero status of 126.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:258)
12/02/12 02:56:39 WARN mapred.JobClient: Error reading task outputhttp://FreeLnx:50060/tasklog?plaintext=true&attemptid=attempt_201202120255_0001_m_000003_1&filter=stdout
12/02/12 02:56:39 WARN mapred.JobClient: Error reading task outputhttp://FreeLnx:50060/tasklog?plaintext=true&attemptid=attempt_201202120255_0001_m_000003_1&filter=stderr
12/02/12 02:56:45 INFO mapred.JobClient: Task Id : attempt_201202120255_0001_m_000003_2, Status : FAILED
java.lang.Throwable: Child Error
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:271)
Caused by: java.io.IOException: Task process exit with nonzero status of 126.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:258)
12/02/12 02:56:45 WARN mapred.JobClient: Error reading task outputhttp://FreeLnx:50060/tasklog?plaintext=true&attemptid=attempt_201202120255_0001_m_000003_2&filter=stdout
12/02/12 02:56:45 WARN mapred.JobClient: Error reading task outputhttp://FreeLnx:50060/tasklog?plaintext=true&attemptid=attempt_201202120255_0001_m_000003_2&filter=stderr
12/02/12 02:56:51 INFO mapred.JobClient: Job complete: job_201202120255_0001
12/02/12 02:56:51 INFO mapred.JobClient: Counters: 4
12/02/12 02:56:51 INFO mapred.JobClient: Job Counters
12/02/12 02:56:51 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=24115
12/02/12 02:56:51 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
12/02/12 02:56:51 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
12/02/12 02:56:51 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0

You would probably need to check your permissions for the configured mapred.local.dir directories, the default of which is ${hadoop.tmp.dir}/mapred/local. The parent directory and all the contents must be owned by the user that runs the TaskTracker daemon, for tasks to be able to write transient data (and do other things) within it effectively.

Related

Hadoop kNN join algorithm stuck at map 100% reduce 0%

15/06/11 10:31:51 INFO mapreduce.Job: map 100% reduce 0%
I am trying to run open source kNN join MapReduce hbrj algorithm on a Hadoop 2.6.0 for single node cluster - pseudo-distributed operation installed on my laptop (OSX). (The source can be found here: http://www.cs.utah.edu/~lifeifei/knnj/). This algorithm is comprised of two MapReduce phases where the second phase uses the first phase's output files as its input. The first phase maps and reduces successfully - I can also look into the output files and everything seems right. However, when running the second phase the job is said to finish successfully even though it never reduces or even enters that stage I believe.
Here is what gets printed as I run phase 2 (I am including everything in hopes that it can be useful)
2015-06-11 10:31:47.526 java[3918:305930] Unable to load realm info from SCDynamicStore
15/06/11 10:31:48 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/06/11 10:31:49 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
15/06/11 10:31:49 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
15/06/11 10:31:49 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
15/06/11 10:31:49 INFO mapred.FileInputFormat: Total input paths to process : 64
15/06/11 10:31:49 INFO mapreduce.JobSubmitter: number of splits:64
15/06/11 10:31:50 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local1089761712_0001
15/06/11 10:31:50 INFO mapred.LocalJobRunner: OutputCommitter set in config null
15/06/11 10:31:50 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
15/06/11 10:31:50 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapred.FileOutputCommitter
15/06/11 10:31:50 INFO mapreduce.Job: Running job: job_local1089761712_0001
15/06/11 10:31:50 INFO mapred.LocalJobRunner: Waiting for map tasks
15/06/11 10:31:50 INFO mapred.LocalJobRunner: Starting task: attempt_local1089761712_0001_m_000000_0
15/06/11 10:31:50 INFO util.ProcfsBasedProcessTree: ProcfsBasedProcessTree currently is supported only on Linux.
15/06/11 10:31:50 INFO mapred.Task: Using ResourceCalculatorProcessTree : null
15/06/11 10:31:50 INFO mapred.MapTask: Processing split: hdfs://localhost:9000/user/sasha/hbrj/output/part-00042:0+872
15/06/11 10:31:50 INFO mapred.MapTask: numReduceTasks: 0
15/06/11 10:31:50 INFO mapred.LocalJobRunner:
15/06/11 10:31:50 INFO mapred.Task: Task:attempt_local1089761712_0001_m_000000_0 is done. And is in the process of committing
15/06/11 10:31:50 INFO mapred.LocalJobRunner:
15/06/11 10:31:50 INFO mapred.Task: Task attempt_local1089761712_0001_m_000000_0 is allowed to commit now
15/06/11 10:31:50 INFO output.FileOutputCommitter: Saved output of task 'attempt_local1089761712_0001_m_000000_0' to hdfs://localhost:9000/user/sasha/hbrj/output2/_temporary/0/task_local1089761712_0001_m_000000
15/06/11 10:31:50 INFO mapred.MapTask: numReduceTasks: 0
15/06/11 10:31:50 INFO mapred.LocalJobRunner:
15/06/11 10:31:50 INFO mapred.Task: Task:attempt_local1089761712_0001_m_000000_0 is done. And is in the process of committing
15/06/11 10:31:50 INFO mapred.LocalJobRunner:
15/06/11 10:31:50 INFO mapred.Task: Task attempt_local1089761712_0001_m_000000_0 is allowed to commit now
15/06/11 10:31:50 INFO output.FileOutputCommitter: Saved output of task 'attempt_local1089761712_0001_m_000000_0' to hdfs://localhost:9000/user/sasha/hbrj/output2/_temporary/0/task_local1089761712_0001_m_000000
15/06/11 10:31:50 INFO mapred.LocalJobRunner: hdfs://localhost:9000/user/sasha/hbrj/output/part-00042:0+872
15/06/11 10:31:50 INFO mapred.Task: Task 'attempt_local1089761712_0001_m_000000_0' done.
continues in this fashion until...
15/06/11 10:31:51 INFO mapred.LocalJobRunner: Finishing task: attempt_local1089761712_0001_m_000012_0
15/06/11 10:31:51 INFO mapred.LocalJobRunner: Starting task: attempt_local1089761712_0001_m_000013_0
15/06/11 10:31:51 INFO util.ProcfsBasedProcessTree: ProcfsBasedProcessTree currently is supported only on Linux.
15/06/11 10:31:51 INFO mapred.Task: Using ResourceCalculatorProcessTree : null
15/06/11 10:31:51 INFO mapred.MapTask: Processing split: hdfs://localhost:9000/user/sasha/hbrj/output/part-00015:0+646
15/06/11 10:31:51 INFO mapred.MapTask: numReduceTasks: 0
15/06/11 10:31:51 INFO mapreduce.Job: Job job_local1089761712_0001 running in uber mode : false
15/06/11 10:31:51 INFO mapreduce.Job: map 100% reduce 0%
15/06/11 10:31:51 INFO mapred.LocalJobRunner:
15/06/11 10:31:51 INFO mapred.Task: Task:attempt_local1089761712_0001_m_000013_0 is done. And is in the process of committing
15/06/11 10:31:51 INFO mapred.LocalJobRunner:
15/06/11 10:31:51 INFO mapred.Task: Task attempt_local1089761712_0001_m_000013_0 is allowed to commit now
15/06/11 10:31:51 INFO output.FileOutputCommitter: Saved output of task 'attempt_local1089761712_0001_m_000013_0' to hdfs://localhost:9000/user/sasha/hbrj/output2/_temporary/0/task_local1089761712_0001_m_000013
15/06/11 10:31:51 INFO mapred.LocalJobRunner: hdfs://localhost:9000/user/sasha/hbrj/output/part-00015:0+646
15/06/11 10:31:51 INFO mapred.Task: Task 'attempt_local1089761712_0001_m_000013_0' done.
15/06/11 10:31:51 INFO mapred.LocalJobRunner: Finishing task: attempt_local1089761712_0001_m_000013_0
15/06/11 10:31:51 INFO mapred.LocalJobRunner: Starting task: attempt_local1089761712_0001_m_000014_0
Starting task ... Finishing task
repeats in the manner as seen below (which is the last task) and the job is said to complete successfully:
15/06/11 10:31:53 INFO mapred.MapTask: numReduceTasks: 0
15/06/11 10:31:53 INFO mapred.LocalJobRunner:
15/06/11 10:31:53 INFO mapred.Task: Task:attempt_local1089761712_0001_m_000063_0 is done. And is in the process of committing
15/06/11 10:31:53 INFO mapred.LocalJobRunner:
15/06/11 10:31:53 INFO mapred.Task: Task attempt_local1089761712_0001_m_000063_0 is allowed to commit now
15/06/11 10:31:53 INFO output.FileOutputCommitter: Saved output of task 'attempt_local1089761712_0001_m_000063_0' to hdfs://localhost:9000/user/sasha/hbrj/output2/_temporary/0/task_local1089761712_0001_m_000063
15/06/11 10:31:53 INFO mapred.LocalJobRunner: hdfs://localhost:9000/user/sasha/hbrj/output/part-00004:0+178
15/06/11 10:31:53 INFO mapred.Task: Task 'attempt_local1089761712_0001_m_000063_0' done.
15/06/11 10:31:53 INFO mapred.LocalJobRunner: Finishing task: attempt_local1089761712_0001_m_000063_0
15/06/11 10:31:53 INFO mapred.LocalJobRunner: map task executor complete.
15/06/11 10:31:54 INFO mapreduce.Job: Job job_local1089761712_0001 completed successfully
15/06/11 10:31:54 INFO mapreduce.Job: Counters: 20
File System Counters
FILE: Number of bytes read=96487226
FILE: Number of bytes written=106993472
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=1157797
HDFS: Number of bytes written=884212
HDFS: Number of read operations=8576
HDFS: Number of large read operations=0
HDFS: Number of write operations=4224
Map-Reduce Framework
Map input records=793
Map output records=793
Input split bytes=6848
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=42
Total committed heap usage (bytes)=12124160000
File Input Format Counters
Bytes Read=28599
File Output Format Counters
Bytes Written=21848
What I have done so far:
I have found a similar question here: Hadoop WordCount example stuck at map 100% reduce 0% and followed some of the advice given. In particular:
At one point I configured Yarn so that I could go into localhost:8088 and monitor the job. All the mappers worked correctly - no failures, and the job ended abruptly right after the last mapper was successful, that is no reducers were ever started. Mapper was shown as 100% and reduce as 0%.
Using this command:
cat /path/to/logs/*.log | grep ERROR
returned nothing.
As I can see the output of the mapper stage I believe the problem does not lie there.
I have tried debugging: putting print statements in the configure method of the Reduce and the reduce method itself. None of them got printed when I rerun the file.
Additional note: Since the algorithm I am using was published and is supposed to work, I believe the problem could possibly arise from the fact that the code is 3 years old and was written for Hadoop 0.20.2 version but I guess I should not be too sure of this.
I understand that this is a specific question but I hope that someone could point me in the right direction. I would be happy to include anything else you might find useful. Any help is greatly appreciated!

CDH5.2: MR, Unable to initialize any output collector

Cloudera CDH5.2 Quickstart VM
Cloudera Manager showing all nodes state = GREEN
I've jared on Eclipse a MR job including all relevant cloudera jars in the Build Path:
avro-1.7.6-cdh5.2.0.jar,
avro-mapred-1.7.6-cdh5.2.0-hadoop2.jar,
hadoop-common-2.5.0-cdh5.2.0.jar,
hadoop-mapreduce-client-core-2.5.0-cdh5.2.0.jar
I've run the following job
hadoop jar jproject1.jar avro00.AvroUserPrefCount -libjars ${LIBJARS} avro/00/in avro/00/out
I get the following error, is it a Java heap problem, any comments ? Thank you in advance
14/11/14 01:02:40 INFO client.RMProxy: Connecting to ResourceManager at quickstart.cloudera/127.0.0.1:8032
14/11/14 01:02:43 INFO input.FileInputFormat: Total input paths to process : 1
14/11/14 01:02:43 INFO mapreduce.JobSubmitter: number of splits:1
14/11/14 01:02:44 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1415950730849_0001
14/11/14 01:02:45 INFO impl.YarnClientImpl: Submitted application application_1415950730849_0001
14/11/14 01:02:45 INFO mapreduce.Job: The url to track the job: http://quickstart.cloudera:8088/proxy/application_1415950730849_0001/
14/11/14 01:02:45 INFO mapreduce.Job: Running job: job_1415950730849_0001
14/11/14 01:03:04 INFO mapreduce.Job: Job job_1415950730849_0001 running in uber mode : false
14/11/14 01:03:04 INFO mapreduce.Job: map 0% reduce 0%
14/11/14 01:03:11 INFO mapreduce.Job: Task Id : attempt_1415950730849_0001_m_000000_0, Status : FAILED
Error: java.io.IOException: Unable to initialize any output collector
at org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:412)
at org.apache.hadoop.mapred.MapTask.access$100(MapTask.java:81)
at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:695)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:767)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
...
...
Checking the full task log of the failed attempt attempt_1415950730849_0001_m_000000_0 will help tell why you ran into the given exception.
The most common reason of observing such an error is a misconfigured value of io.sort.mb in your job. Its value must never be anywhere close to (or higher than) the configured map task heap size, and must also not currently exceed ~2000 MB (Java array maximum size).
An upstream improvement of making the error more clear on the true failure was also filed and resolved recently, via MAPREDUCE-6194.
I encountered the same issue yesterday. I checked the syslog for the particular map task which was failing, which suggested that I was getting another exception in that task which was triggering this error. In my case this was an invalid parsing, and when I corrected that issue, this error was fixed.
Closer examination of the log for the failed task should give you the root cause for the issue.

Why hadoop just stuck there after I set the map compression properties?

Here's the code snippet that works fine:
Configuration conf = new Configuration();
//PROBLEM PART!!!!!
//conf.setBoolean("mapred.compress.map.output", true);
//conf.set("mapred.output.compression.type", "BLOCK");
//conf.setClass("mapred.map.output.compression.codec", GzipCodec.class, CompressionCodec.class);
Job job = new Job(conf, "WordCount");
job.setJarByClass(WordCount.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
job.setMapperClass(WordCountMap.class);
job.setReducerClass(Reduce.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.waitForCompletion(true);
But if I enable the PROBLEM PART in the code snippet above, the output of console will stuck at:
13/12/26 18:08:06 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
13/12/26 18:08:06 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
13/12/26 18:08:06 WARN mapred.JobClient: No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String).
13/12/26 18:08:06 INFO input.FileInputFormat: Total input paths to process : 20
13/12/26 18:08:06 WARN snappy.LoadSnappy: Snappy native library not loaded
13/12/26 18:08:06 INFO mapred.JobClient: Running job: job_local1943436108_0001
13/12/26 18:08:06 INFO mapred.LocalJobRunner: Waiting for map tasks
13/12/26 18:08:06 INFO mapred.LocalJobRunner: Starting task: attempt_local1943436108_0001_m_000000_0
13/12/26 18:08:07 INFO util.ProcessTree: setsid exited with exit code 0
13/12/26 18:08:07 INFO mapred.Task: Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin#731d2572
13/12/26 18:08:07 INFO mapred.MapTask: Processing split: hdfs://localhost:9000/user/jude/input/capacity-scheduler.xml:0+7457
13/12/26 18:08:07 INFO mapred.MapTask: io.sort.mb = 100
13/12/26 18:08:07 INFO mapred.MapTask: data buffer = 79691776/99614720
13/12/26 18:08:07 INFO mapred.MapTask: record buffer = 262144/327680
13/12/26 18:08:07 INFO mapred.MapTask: Starting flush of map output
13/12/26 18:08:07 INFO compress.CodecPool: Got brand-new compressor
13/12/26 18:08:07 INFO mapred.MapTask: Starting flush of map output
13/12/26 18:08:07 INFO mapred.JobClient: map 0% reduce 0%
13/12/26 18:08:12 INFO mapred.LocalJobRunner:
13/12/26 18:08:13 INFO mapred.JobClient: map 5% reduce 0%
//no more
I just intend to compress the output of map, is there anything wrong with my code? Thanks a lot!
Using compression requires Hadoop to use native libraries for your platform, but apparently you don't have them (or have not configured the path to the libraries correctly). This is the message that is explaining the problem:
[...] NativeCodeLoader: Unable to load native-hadoop library for your platform...
Possible solutions:
Most common problem is to have the 32-bit libraries on a 64-bit architecture. You can download the pre-compiled 64 native libraries OR compile them yourself using mvn package -Pdist,**native**,docs.
Or, you may need to configure the path to the native libraries correctly; see these other questions on how to do this: use -Djava.libray.path, or LD_LIBRARY_PATH.

Words Count output shows mapred instead of mapreduce

I just configured my Ubuntu 13.10 to work in pseudo-distributed mode for my mapreduce code development. I had installed hadoop 0.20.2 version of hadoop. Everything sis running fine and I am able to start all five deamons as well.
On same machine I had downloaded eclipse and added all hadoop based libraries into it. I am able to run my map reduce word count example also from eclipse IDE directly. Only thing which is bothering me is that while I run my word count example it prints in console something like this:
13/09/23 16:11:05 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your
platform... using builtin-java classes where applicable
13/09/23 16:11:05 WARN mapred.JobClient: No job jar file set. User classes may not be found. See
JobConf(Class) or JobConf#setJar(String).
13/09/23 16:11:05 INFO input.FileInputFormat: Total input paths to process : 1
13/09/23 16:11:06 INFO mapred.JobClient: Running job: job_local_0001
13/09/23 16:11:06 INFO util.ProcessTree: setsid exited with exit code 0
13/09/23 16:11:06 INFO mapred.Task: Using ResourceCalculatorPlugin :
org.apache.hadoop.util.LinuxResourceCalculatorPlugin#c931fc
13/09/23 16:11:06 INFO mapred.MapTask: io.sort.mb = 100
13/09/23 16:11:07 INFO mapred.JobClient: map 0% reduce 0%
13/09/23 16:11:07 INFO mapred.MapTask: data buffer = 79691776/99614720
13/09/23 16:11:07 INFO mapred.MapTask: record buffer = 262144/327680
13/09/23 16:11:08 INFO mapred.MapTask: Starting flush of map output
13/09/23 16:11:08 INFO mapred.MapTask: Finished spill 0
13/09/23 16:11:08 INFO mapred.Task: Task:attempt_local_0001_m_000000_0 is done. And is in the
process of commiting
13/09/23 16:11:09 INFO mapred.LocalJobRunner:
13/09/23 16:11:09 INFO mapred.Task: Task 'attempt_local_0001_m_000000_0' done.
13/09/23 16:11:09 INFO mapred.Task: Using ResourceCalculatorPlugin :
org.apache.hadoop.util.LinuxResourceCalculatorPlugin#1342ba4
13/09/23 16:11:09 INFO mapred.LocalJobRunner:
13/09/23 16:11:09 INFO mapred.Merger: Merging 1 sorted segments
13/09/23 16:11:10 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total
size: 48 bytes
13/09/23 16:11:10 INFO mapred.LocalJobRunner:
13/09/23 16:11:10 INFO mapred.Task: Task:attempt_local_0001_r_000000_0 is done. And is in the
process of commiting
13/09/23 16:11:10 INFO mapred.LocalJobRunner:
13/09/23 16:11:10 INFO mapred.Task: Task attempt_local_0001_r_000000_0 is allowed to commit now
13/09/23 16:11:10 INFO output.FileOutputCommitter: Saved output of task
'attempt_local_0001_r_000000_0' to outputWords
13/09/23 16:11:10 INFO mapred.JobClient: map 100% reduce 0%
13/09/23 16:11:12 INFO mapred.LocalJobRunner: reduce > reduce
13/09/23 16:11:12 INFO mapred.Task: Task 'attempt_local_0001_r_000000_0' done.
13/09/23 16:11:12 WARN mapred.LocalJobRunner: job_local_0001
java.lang.NoClassDefFoundError: org/apache/commons/httpclient/HttpMethod
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:284)
Caused by: java.lang.ClassNotFoundException: org.apache.commons.httpclient.HttpMethod
at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
at java.lang.ClassLoader.loadClass(ClassLoader.java:321)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
at java.lang.ClassLoader.loadClass(ClassLoader.java:266)
... 1 more
Exception in thread "Thread-1" java.lang.NoClassDefFoundError:
org/apache/commons/httpclient/HttpMethod
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:300)
Caused by: java.lang.ClassNotFoundException: org.apache.commons.httpclient.HttpMethod
at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
at java.lang.ClassLoader.loadClass(ClassLoader.java:321)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
at java.lang.ClassLoader.loadClass(ClassLoader.java:266)
... 1 more
13/09/23 16:11:13 INFO mapred.JobClient: map 100% reduce 100%
13/09/23 16:11:13 INFO mapred.JobClient: Job complete: job_local_0001
13/09/23 16:11:13 INFO mapred.JobClient: Counters: 20
13/09/23 16:11:13 INFO mapred.JobClient: File Output Format Counters
13/09/23 16:11:13 INFO mapred.JobClient: Bytes Written=42
13/09/23 16:11:13 INFO mapred.JobClient: FileSystemCounters
13/09/23 16:11:13 INFO mapred.JobClient: FILE_BYTES_READ=534
13/09/23 16:11:13 INFO mapred.JobClient: FILE_BYTES_WRITTEN=63640
13/09/23 16:11:13 INFO mapred.JobClient: File Input Format Counters
13/09/23 16:11:13 INFO mapred.JobClient: Bytes Read=63
13/09/23 16:11:13 INFO mapred.JobClient: Map-Reduce Framework
13/09/23 16:11:13 INFO mapred.JobClient: Map output materialized bytes=52
13/09/23 16:11:13 INFO mapred.JobClient: Map input records=4
13/09/23 16:11:13 INFO mapred.JobClient: Reduce shuffle bytes=0
13/09/23 16:11:13 INFO mapred.JobClient: Spilled Records=8
13/09/23 16:11:13 INFO mapred.JobClient: Map output bytes=110
13/09/23 16:11:13 INFO mapred.JobClient: Total committed heap usage (bytes)=231350272
13/09/23 16:11:13 INFO mapred.JobClient: CPU time spent (ms)=0
13/09/23 16:11:13 INFO mapred.JobClient: SPLIT_RAW_BYTES=124
13/09/23 16:11:13 INFO mapred.JobClient: Combine input records=12
13/09/23 16:11:13 INFO mapred.JobClient: Reduce input records=4
13/09/23 16:11:13 INFO mapred.JobClient: Reduce input groups=4
13/09/23 16:11:13 INFO mapred.JobClient: Combine output records=4
13/09/23 16:11:13 INFO mapred.JobClient: Physical memory (bytes) snapshot=0
13/09/23 16:11:13 INFO mapred.JobClient: Reduce output records=4
13/09/23 16:11:13 INFO mapred.JobClient: Virtual memory (bytes) snapshot=0
13/09/23 16:11:13 INFO mapred.JobClient: Map output records=12
In above output if you see there are few things I am not sure are correct:
its printing mapred.JobClient: Mapred is old library of hadoop so how can i make it to mapreduce ( alreday added recent new library into eclipse still getting same mapred message)
Why is this error coming : java.lang.NoClassDefFoundError
I am able to get output directory generated with proper results also.
Let me know if need any other details.
Hope to get an answer.
Happy hadooping!!!
You are getting --
Exception in thread "Thread-1" java.lang.NoClassDefFoundError:
org/apache/commons/httpclient/HttpMethod
because, you haven't included some dependent jars in your classpath.
Try including the following too located inside your lib/ directory and retry --
commons-httpclient-3.1.jar
commons-cli-1.2.jar
commons-logging-1.0.4.jar
commons-logging-api-1.0.4.jar
log4j-1.2.15.jar
commons-cli-1.2.jar
jackson-core-asl-1.5.2.jar
jackson-mapper-asl-1.5.2.jar
If including these doesn't work, please include all the jars in the lib/ directory.
Furthermore, mapred.JobClient is not deprecated and referenced by Hadoop (both mapred API or mapreduce API).

running an elementary mapreduce job with java on hadoop

I am just getting started with linux/java/hadoop/EMR.
I am following this neat book.
The assignment is to run:
bin/hadoop jar hadoop-cookbook-chapter1.jar chapter1.WordCount input output
And this is the response that I get:
alex#HadoopMachine:/usr/share/hadoop$ sudo hadoop jar hadoop-cookbook-chapter1.jar chapter1.WordCount input output
13/05/01 01:01:08 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
13/05/01 01:01:08 INFO input.FileInputFormat: Total input paths to process : 1
13/05/01 01:01:08 WARN snappy.LoadSnappy: Snappy native library not loaded
13/05/01 01:01:09 INFO mapred.JobClient: Running job: job_local_0001
13/05/01 01:01:09 INFO util.ProcessTree: setsid exited with exit code 0
13/05/01 01:01:09 INFO mapred.Task: Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin#1c04d881
13/05/01 01:01:09 INFO mapred.MapTask: io.sort.mb = 100
13/05/01 01:01:09 WARN mapred.LocalJobRunner: job_local_0001
java.lang.OutOfMemoryError: Java heap space
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.<init>(MapTask.java:949)
at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:674)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:756)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
13/05/01 01:01:10 INFO mapred.JobClient: map 0% reduce 0%
13/05/01 01:01:10 INFO mapred.JobClient: Job complete: job_local_0001
13/05/01 01:01:10 INFO mapred.JobClient: Counters: 0
Frankly, since I have almost no java background, I do not even know where to start debugging.
I would be most grateful for any guidance on how to tackle this issue.
update
after following greedybuddha's advice i am getting:
alex#HadoopMachine:/usr/share/hadoop$ sudo hadoop jar hadoop-cookbook-chapter1.jar chapter1.WordCount -Dmapred.child.java.opts=-Xmx1G input output
[sudo] password for alex:
13/05/01 11:03:54 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
13/05/01 11:03:54 INFO input.FileInputFormat: Total input paths to process : 1
13/05/01 11:03:54 WARN snappy.LoadSnappy: Snappy native library not loaded
13/05/01 11:03:54 INFO mapred.JobClient: Running job: job_local_0001
13/05/01 11:03:54 INFO util.ProcessTree: setsid exited with exit code 0
13/05/01 11:03:54 INFO mapred.Task: Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin#35756b65
13/05/01 11:03:54 INFO mapred.MapTask: io.sort.mb = 100
13/05/01 11:03:54 WARN mapred.LocalJobRunner: job_local_0001
java.lang.OutOfMemoryError: Java heap space
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.<init>(MapTask.java:949)
at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:674)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:756)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
13/05/01 11:03:55 INFO mapred.JobClient: map 0% reduce 0%
13/05/01 11:03:55 INFO mapred.JobClient: Job complete: job_local_0001
13/05/01 11:03:55 INFO mapred.JobClient: Counters: 0
Java needs a certain amount of memory to run programs. When a program uses too much, it will throw the error you are having. The solution is to tell java to allocate more memory for the program. In this case you should be able to tell hadoop to allocate you the memory. Try the following.
bin/hadoop jar hadoop-cookbook-chapter1.jar chapter1.WordCount -Dmapred.child.java.opts=-Xmx1G input output
the option -Xmx1G says allow up 1 Gigabyte.
This other stackoverflow question is also very similar.
out of Memory Error in Hadoop

Categories