I have simple Apache Spark App where I read files from hdfs and after that i pipe it to external process. When I read a big amount a data (in my case files have about 241MB) and i don't specify min number of partitions or specify min number to 4 i'm getting following error:
Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 0.0 failed 4 times, most recent failure: Lost task 1.3 in stage 0.0 (TID 6, ip-172-31-36-43.us-west-2.compute.internal): ExecutorLostFailure (executor 6 lost)
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1203)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1192)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1191)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1191)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693)
at scala.Option.foreach(Option.scala:236)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:693)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1393)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1354)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
When I specify min number of partitions to 10 or above i don't getting this error. Can anyone tell me what's wrong and avoid it? I didn't get error that subprocess exited with error code so I think it's problem with Spark configuration.
stderr from worker:
15/05/03 10:41:29 INFO executor.CoarseGrainedExecutorBackend: Registered signal handlers for [TERM, HUP, INT]
15/05/03 10:41:30 INFO spark.SecurityManager: Changing view acls to: root
15/05/03 10:41:30 INFO spark.SecurityManager: Changing modify acls to: root
15/05/03 10:41:30 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root)
15/05/03 10:41:30 INFO slf4j.Slf4jLogger: Slf4jLogger started
15/05/03 10:41:30 INFO Remoting: Starting remoting
15/05/03 10:41:31 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://driverPropsFetcher#ip-172-31-36-43.us-west-2.compute.internal:46832]
15/05/03 10:41:31 INFO util.Utils: Successfully started service 'driverPropsFetcher' on port 46832.
15/05/03 10:41:31 INFO remote.RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
15/05/03 10:41:31 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.
15/05/03 10:41:31 INFO spark.SecurityManager: Changing view acls to: root
15/05/03 10:41:31 INFO spark.SecurityManager: Changing modify acls to: root
15/05/03 10:41:31 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root)
15/05/03 10:41:31 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remoting shut down.
15/05/03 10:41:31 INFO slf4j.Slf4jLogger: Slf4jLogger started
15/05/03 10:41:31 INFO Remoting: Starting remoting
15/05/03 10:41:31 INFO util.Utils: Successfully started service 'sparkExecutor' on port 37039.
15/05/03 10:41:31 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkExecutor#ip-172-31-36-43.us-west-2.compute.internal:37039]
15/05/03 10:41:31 INFO util.AkkaUtils: Connecting to MapOutputTracker: akka.tcp://sparkDriver#ip-172-31-35-111.us-west-2.compute.internal:48730/user/MapOutputTracker
15/05/03 10:41:31 INFO util.AkkaUtils: Connecting to BlockManagerMaster: akka.tcp://sparkDriver#ip-172-31-35-111.us-west-2.compute.internal:48730/user/BlockManagerMaster
15/05/03 10:41:31 INFO storage.DiskBlockManager: Created local directory at /mnt/spark/spark-cbaf9bff-4d12-4847-9135-9667ba27dccb/spark-ad82597c-4b55-46fc-9063-5d1196d6e0b0/spark-e99f55c6-5bcb-4d1b-b014-aaec94fe6cc5/blockmgr-cda1922d-ea50-4630-a834-bfb637ecdaa0
15/05/03 10:41:31 INFO storage.DiskBlockManager: Created local directory at /mnt2/spark/spark-0c6c912f-3aa1-4c54-9970-7a75d22899e8/spark-71d64ae7-36bc-49e0-958e-e7e2c1432027/spark-56d9e077-4585-4fd7-8a48-5227943d9004/blockmgr-29c5d068-f19d-4f41-85fc-11960c77a8a3
15/05/03 10:41:31 INFO storage.MemoryStore: MemoryStore started with capacity 445.4 MB
15/05/03 10:41:32 INFO util.AkkaUtils: Connecting to OutputCommitCoordinator: akka.tcp://sparkDriver#ip-172-31-35-111.us-west-2.compute.internal:48730/user/OutputCommitCoordinator
15/05/03 10:41:32 INFO executor.CoarseGrainedExecutorBackend: Connecting to driver: akka.tcp://sparkDriver#ip-172-31-35-111.us-west-2.compute.internal:48730/user/CoarseGrainedScheduler
15/05/03 10:41:32 INFO worker.WorkerWatcher: Connecting to worker akka.tcp://sparkWorker#ip-172-31-36-43.us-west-2.compute.internal:54983/user/Worker
15/05/03 10:41:32 INFO worker.WorkerWatcher: Successfully connected to akka.tcp://sparkWorker#ip-172-31-36-43.us-west-2.compute.internal:54983/user/Worker
15/05/03 10:41:32 INFO executor.CoarseGrainedExecutorBackend: Successfully registered with driver
15/05/03 10:41:32 INFO executor.Executor: Starting executor ID 6 on host ip-172-31-36-43.us-west-2.compute.internal
15/05/03 10:41:32 INFO netty.NettyBlockTransferService: Server created on 33000
15/05/03 10:41:32 INFO storage.BlockManagerMaster: Trying to register BlockManager
15/05/03 10:41:32 INFO storage.BlockManagerMaster: Registered BlockManager
15/05/03 10:41:32 INFO util.AkkaUtils: Connecting to HeartbeatReceiver: akka.tcp://sparkDriver#ip-172-31-35-111.us-west-2.compute.internal:48730/user/HeartbeatReceiver
15/05/03 10:41:32 INFO executor.CoarseGrainedExecutorBackend: Got assigned task 6
15/05/03 10:41:32 INFO executor.Executor: Running task 1.3 in stage 0.0 (TID 6)
15/05/03 10:41:32 INFO executor.Executor: Fetching http://172.31.35.111:34347/jars/proteinsApacheSpark-0.0.1.jar with timestamp 1430649374764
15/05/03 10:41:32 INFO util.Utils: Fetching http://172.31.35.111:34347/jars/proteinsApacheSpark-0.0.1.jar to /mnt/spark/spark-cbaf9bff-4d12-4847-9135-9667ba27dccb/spark-ad82597c-4b55-46fc-9063-5d1196d6e0b0/spark-08b3b4ce-960f-488f-99ea-bd66b3277207/fetchFileTemp3079113313084659984.tmp
15/05/03 10:41:32 INFO util.Utils: Copying /mnt/spark/spark-cbaf9bff-4d12-4847-9135-9667ba27dccb/spark-ad82597c-4b55-46fc-9063-5d1196d6e0b0/spark-08b3b4ce-960f-488f-99ea-bd66b3277207/9655652641430649374764_cache to /root/spark/work/app-20150503103615-0002/6/./proteinsApacheSpark-0.0.1.jar
15/05/03 10:41:32 INFO executor.Executor: Adding file:/root/spark/work/app-20150503103615-0002/6/./proteinsApacheSpark-0.0.1.jar to class loader
15/05/03 10:41:32 INFO broadcast.TorrentBroadcast: Started reading broadcast variable 1
15/05/03 10:41:32 INFO storage.MemoryStore: ensureFreeSpace(17223) called with curMem=0, maxMem=467081625
15/05/03 10:41:32 INFO storage.MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 16.8 KB, free 445.4 MB)
15/05/03 10:41:32 INFO storage.BlockManagerMaster: Updated info of block broadcast_1_piece0
15/05/03 10:41:32 INFO broadcast.TorrentBroadcast: Reading broadcast variable 1 took 274 ms
15/05/03 10:41:32 INFO storage.MemoryStore: ensureFreeSpace(22384) called with curMem=17223, maxMem=467081625
15/05/03 10:41:32 INFO storage.MemoryStore: Block broadcast_1 stored as values in memory (estimated size 21.9 KB, free 445.4 MB)
15/05/03 10:41:33 INFO spark.CacheManager: Partition rdd_0_1 not found, computing it
15/05/03 10:41:33 INFO rdd.WholeTextFileRDD: Input split: Paths:/user/root/pepnovo3/largeinputfile2/largeinputfile2_45.mgf:0+2106005,/user/root/pepnovo3/largeinputfile2/largeinputfile2_46.mgf:0+2105954,/user/root/pepnovo3/largeinputfile2/largeinputfile2_47.mgf:0+2106590,/user/root/pepnovo3/largeinputfile2/largeinputfile2_48.mgf:0+2105696,/user/root/pepnovo3/largeinputfile2/largeinputfile2_49.mgf:0+2105891,/user/root/pepnovo3/largeinputfile2/largeinputfile2_5.mgf:0+2106283,/user/root/pepnovo3/largeinputfile2/largeinputfile2_50.mgf:0+2105559,/user/root/pepnovo3/largeinputfile2/largeinputfile2_51.mgf:0+2106403,/user/root/pepnovo3/largeinputfile2/largeinputfile2_52.mgf:0+2105535,/user/root/pepnovo3/largeinputfile2/largeinputfile2_53.mgf:0+2105615,/user/root/pepnovo3/largeinputfile2/largeinputfile2_54.mgf:0+2105861,/user/root/pepnovo3/largeinputfile2/largeinputfile2_55.mgf:0+2106100,/user/root/pepnovo3/largeinputfile2/largeinputfile2_56.mgf:0+2106265,/user/root/pepnovo3/largeinputfile2/largeinputfile2_57.mgf:0+2105768,/user/root/pepnovo3/largeinputfile2/largeinputfile2_58.mgf:0+2106180,/user/root/pepnovo3/largeinputfile2/largeinputfile2_59.mgf:0+2105751,/user/root/pepnovo3/largeinputfile2/largeinputfile2_6.mgf:0+2106247,/user/root/pepnovo3/largeinputfile2/largeinputfile2_60.mgf:0+2106133,/user/root/pepnovo3/largeinputfile2/largeinputfile2_61.mgf:0+2106224,/user/root/pepnovo3/largeinputfile2/largeinputfile2_62.mgf:0+2106415,/user/root/pepnovo3/largeinputfile2/largeinputfile2_63.mgf:0+2106408,/user/root/pepnovo3/largeinputfile2/largeinputfile2_64.mgf:0+2105702,/user/root/pepnovo3/largeinputfile2/largeinputfile2_65.mgf:0+2106268,/user/root/pepnovo3/largeinputfile2/largeinputfile2_66.mgf:0+2106149,/user/root/pepnovo3/largeinputfile2/largeinputfile2_67.mgf:0+2105846,/user/root/pepnovo3/largeinputfile2/largeinputfile2_68.mgf:0+2105408,/user/root/pepnovo3/largeinputfile2/largeinputfile2_69.mgf:0+2106172,/user/root/pepnovo3/largeinputfile2/largeinputfile2_7.mgf:0+2105517,/user/root/pepnovo3/largeinputfile2/largeinputfile2_70.mgf:0+2105980,/user/root/pepnovo3/largeinputfile2/largeinputfile2_71.mgf:0+2105651,/user/root/pepnovo3/largeinputfile2/largeinputfile2_72.mgf:0+2105936,/user/root/pepnovo3/largeinputfile2/largeinputfile2_73.mgf:0+2105966,/user/root/pepnovo3/largeinputfile2/largeinputfile2_74.mgf:0+2105456,/user/root/pepnovo3/largeinputfile2/largeinputfile2_75.mgf:0+2105786,/user/root/pepnovo3/largeinputfile2/largeinputfile2_76.mgf:0+2106151,/user/root/pepnovo3/largeinputfile2/largeinputfile2_77.mgf:0+2106284,/user/root/pepnovo3/largeinputfile2/largeinputfile2_78.mgf:0+2106163,/user/root/pepnovo3/largeinputfile2/largeinputfile2_79.mgf:0+2106233,/user/root/pepnovo3/largeinputfile2/largeinputfile2_8.mgf:0+2105885,/user/root/pepnovo3/largeinputfile2/largeinputfile2_80.mgf:0+2105979,/user/root/pepnovo3/largeinputfile2/largeinputfile2_81.mgf:0+2105888,/user/root/pepnovo3/largeinputfile2/largeinputfile2_82.mgf:0+2106546,/user/root/pepnovo3/largeinputfile2/largeinputfile2_83.mgf:0+2106322,/user/root/pepnovo3/largeinputfile2/largeinputfile2_84.mgf:0+2106017,/user/root/pepnovo3/largeinputfile2/largeinputfile2_85.mgf:0+2106242,/user/root/pepnovo3/largeinputfile2/largeinputfile2_86.mgf:0+2105543,/user/root/pepnovo3/largeinputfile2/largeinputfile2_87.mgf:0+2106556,/user/root/pepnovo3/largeinputfile2/largeinputfile2_88.mgf:0+2105637,/user/root/pepnovo3/largeinputfile2/largeinputfile2_89.mgf:0+2106130,/user/root/pepnovo3/largeinputfile2/largeinputfile2_9.mgf:0+2105634,/user/root/pepnovo3/largeinputfile2/largeinputfile2_90.mgf:0+2105731,/user/root/pepnovo3/largeinputfile2/largeinputfile2_91.mgf:0+2106401,/user/root/pepnovo3/largeinputfile2/largeinputfile2_92.mgf:0+2105736,/user/root/pepnovo3/largeinputfile2/largeinputfile2_93.mgf:0+2105688,/user/root/pepnovo3/largeinputfile2/largeinputfile2_94.mgf:0+2106436,/user/root/pepnovo3/largeinputfile2/largeinputfile2_95.mgf:0+2105609,/user/root/pepnovo3/largeinputfile2/largeinputfile2_96.mgf:0+2105525,/user/root/pepnovo3/largeinputfile2/largeinputfile2_97.mgf:0+2105603,/user/root/pepnovo3/largeinputfile2/largeinputfile2_98.mgf:0+2106211,/user/root/pepnovo3/largeinputfile2/largeinputfile2_99.mgf:0+2105928
15/05/03 10:41:33 INFO broadcast.TorrentBroadcast: Started reading broadcast variable 0
15/05/03 10:41:33 INFO storage.MemoryStore: ensureFreeSpace(6906) called with curMem=39607, maxMem=467081625
15/05/03 10:41:33 INFO storage.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 6.7 KB, free 445.4 MB)
15/05/03 10:41:33 INFO storage.BlockManagerMaster: Updated info of block broadcast_0_piece0
15/05/03 10:41:33 INFO broadcast.TorrentBroadcast: Reading broadcast variable 0 took 15 ms
15/05/03 10:41:33 INFO storage.MemoryStore: ensureFreeSpace(53787) called with curMem=46513, maxMem=467081625
15/05/03 10:41:33 INFO storage.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 52.5 KB, free 445.3 MB)
15/05/03 10:41:33 WARN snappy.LoadSnappy: Snappy native library is available
15/05/03 10:41:33 INFO util.NativeCodeLoader: Loaded the native-hadoop library
15/05/03 10:41:33 INFO snappy.LoadSnappy: Snappy native library loaded
15/05/03 10:41:36 INFO storage.MemoryStore: ensureFreeSpace(252731448) called with curMem=100300, maxMem=467081625
15/05/03 10:41:36 INFO storage.MemoryStore: Block rdd_0_1 stored as values in memory (estimated size 241.0 MB, free 204.3 MB)
15/05/03 10:41:36 INFO storage.BlockManagerMaster: Updated info of block rdd_0_1
The answer is probably in the executor log, which is different from the worker log. Most likely it runs out of memory and either starts GC thrashing or dies from OOM. You could try running with more memory per executor if this is an option.
Check your system hard disk space, network and memory. spark write file in $SPARK_HOME/work. some time full hard disk space, no memory free or network issue.
if any exception you can see on your_machine:4040
I wrote hadoop the program with log4j use (only Map step which operation didn't meet my waitings)
package org.myorg;
import java.io.*;
import java.util.*;
import org.apache.hadoop.fs.*;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapred.*;
import org.apache.hadoop.util.*;
import org.apache.log4j.Logger;
import org.apache.log4j.LogManager;
import org.apache.log4j.xml.DOMConfigurator;
public class ParallelIndexation {
public static class Map extends MapReduceBase implements
Mapper<LongWritable, Text, Text, LongWritable> {
private final static LongWritable zero = new LongWritable(0);
private Text word = new Text();
private static final Logger logger = LogManager.getLogger(Map.class.getName());
public void map(LongWritable key, Text value,
OutputCollector<Text, LongWritable> output, Reporter reporter)
throws IOException {
DOMConfigurator.configure("/folder/log4j.xml");
Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(conf);
Path localPath = new Path("/export/hadoop-1.0.1/bin/input/paths.txt");
Path hdfsPath=new Path("hdfs://192.168.1.8:7000/user/hadoop/paths.txt");
Path localPath1 = new Path("/usr/countcomputers.txt");
Path hdfsPath1=new Path("hdfs://192.168.1.8:7000/user/hadoop/countcomputers.txt");
if (!fs.exists(hdfsPath))
{
fs.copyFromLocalFile(localPath, hdfsPath);
};
if (!fs.exists(hdfsPath1))
{
fs.copyFromLocalFile(localPath1, hdfsPath1);
};
FSDataInputStream in = fs.open(hdfsPath);
BufferedReader br = new BufferedReader(new InputStreamReader(in));
String line = br.readLine();
BufferedReader br1=new BufferedReader(new InputStreamReader(fs.open(hdfsPath1)));
int CountComputers;
String result=br1.readLine();
CountComputers=Integer.parseInt(result);
ArrayList<String> paths = new ArrayList<String>();
StringTokenizer tokenizer = new StringTokenizer(line, "|");
while (tokenizer.hasMoreTokens()) {
paths.add(tokenizer.nextToken());
}
for (int i=0; i<paths.size(); i++)
{
logger.debug("paths[i]=" + paths.get(i) + "\n");
}
logger.debug("CountComputers=" + CountComputers + "\n");
I provide the /folder/log4j.xml file
<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE log4j:configuration SYSTEM "log4j.dtd">
<log4j:configuration debug="true" xmlns:log4j="http://jakarta.apache.org/log4j/">
<appender name="ConsoleAppender" class="org.apache.log4j.ConsoleAppender">
<param name="Encoding" value="UTF-8"/>
<layout class="org.apache.log4j.PatternLayout">
<param name="ConversionPattern" value="%d{ISO8601} [%-5p][%-16.16t][%32.32c] - %m%n" />
</layout>
</appender>
<root>
<priority value="DEBUG"/>
<appender-ref ref="ConsoleAppender" />
</root>
</log4j:configuration>
But despite an output in a log, as a result of command execution
./hadoop jar /export/hadoop-1.0.1/bin/ParallelIndexation.jar org.myorg.ParallelIndexation /export/hadoop-1.0.1/bin/input /export/hadoop-1.0.1/bin/output -D mapred.map.tasks=1 1> resultofexecute.txt 2>&1
in the resultofexecute.txt file there was no output of appropriate variables.Help to remove the necessary variables.
#ChrisWhite I provide the hadoop-hadoop-tasktracker-myhost2.log and hadoop-hadoop-takstacker-myhost3.log files
hadoop-hadoop-tasktracker-myhost2.log
2013-04-20 12:35:43,465 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties
2013-04-20 12:35:43,583 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source MetricsSystem,sub=Stats registered.
2013-04-20 12:35:43,584 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).
2013-04-20 12:35:43,584 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: TaskTracker metrics system started
2013-04-20 12:35:43,971 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source ugi registered.
2013-04-20 12:35:43,979 WARN org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Source name ugi already exists!
2013-04-20 12:35:44,168 INFO org.mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog
2013-04-20 12:35:44,251 INFO org.apache.hadoop.http.HttpServer: Added global filtersafety (class=org.apache.hadoop.http.HttpServer$QuotingInputFilter)
2013-04-20 12:35:44,280 INFO org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
2013-04-20 12:35:44,287 INFO org.apache.hadoop.mapred.TaskTracker: Starting tasktracker with owner as hadoop
2013-04-20 12:35:44,288 INFO org.apache.hadoop.mapred.TaskTracker: Good mapred local directories are: /tmp/hadoop-hadoop/mapred/local
2013-04-20 12:35:44,302 WARN org.apache.hadoop.util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2013-04-20 12:35:44,312 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source jvm registered.
2013-04-20 12:35:44,312 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source TaskTrackerMetrics registered.
2013-04-20 12:35:49,332 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source RpcDetailedActivityForPort51172 registered.
2013-04-20 12:35:49,332 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source RpcActivityForPort51172 registered.
2013-04-20 12:35:49,335 INFO org.apache.hadoop.mapred.TaskTracker: TaskTracker up at: localhost/127.0.0.1:51172
2013-04-20 12:35:49,335 INFO org.apache.hadoop.mapred.TaskTracker: Starting tracker tracker_myhost2:localhost/127.0.0.1:51172
2013-04-20 12:35:49,356 INFO org.apache.hadoop.ipc.Server: Starting SocketReader
2013-04-20 12:35:49,357 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting
2013-04-20 12:35:49,357 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 51172: starting
2013-04-20 12:35:49,358 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 on 51172: starting
2013-04-20 12:35:49,358 INFO org.apache.hadoop.ipc.Server: IPC Server handler 1 on 51172: starting
2013-04-20 12:35:49,358 INFO org.apache.hadoop.ipc.Server: IPC Server handler 2 on 51172: starting
2013-04-20 12:35:49,358 INFO org.apache.hadoop.ipc.Server: IPC Server handler 3 on 51172: starting
2013-04-20 12:35:50,372 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 192.168.1.8/192.168.1.8:7001. Already tried 0 time(s).
2013-04-20 12:35:51,372 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 192.168.1.8/192.168.1.8:7001. Already tried 1 time(s).
2013-04-20 12:35:52,375 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 192.168.1.8/192.168.1.8:7001. Already tried 2 time(s).
2013-04-20 12:35:53,376 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 192.168.1.8/192.168.1.8:7001. Already tried 3 time(s).
2013-04-20 12:35:54,377 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 192.168.1.8/192.168.1.8:7001. Already tried 4 time(s).
2013-04-20 12:35:55,377 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 192.168.1.8/192.168.1.8:7001. Already tried 5 time(s).
2013-04-20 12:35:56,379 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 192.168.1.8/192.168.1.8:7001. Already tried 6 time(s).
2013-04-20 12:35:57,380 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 192.168.1.8/192.168.1.8:7001. Already tried 7 time(s).
2013-04-20 12:35:58,381 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 192.168.1.8/192.168.1.8:7001. Already tried 8 time(s).
2013-04-20 12:35:59,381 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 192.168.1.8/192.168.1.8:7001. Already tried 9 time(s).
2013-04-20 12:35:59,385 INFO org.apache.hadoop.ipc.RPC: Server at 192.168.1.8/192.168.1.8:7001 not available yet, Zzzzz...
2013-04-20 12:36:01,387 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 192.168.1.8/192.168.1.8:7001. Already tried 0 time(s).
2013-04-20 12:36:02,387 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 192.168.1.8/192.168.1.8:7001. Already tried 1 time(s).
2013-04-20 12:36:03,388 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 192.168.1.8/192.168.1.8:7001. Already tried 2 time(s).
2013-04-20 12:36:04,388 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 192.168.1.8/192.168.1.8:7001. Already tried 3 time(s).
2013-04-20 12:36:05,388 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 192.168.1.8/192.168.1.8:7001. Already tried 4 time(s).
2013-04-20 12:36:06,388 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 192.168.1.8/192.168.1.8:7001. Already tried 5 time(s).
2013-04-20 12:36:07,390 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 192.168.1.8/192.168.1.8:7001. Already tried 6 time(s).
2013-04-20 12:36:08,390 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 192.168.1.8/192.168.1.8:7001. Already tried 7 time(s).
2013-04-20 12:36:09,391 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 192.168.1.8/192.168.1.8:7001. Already tried 8 time(s).
2013-04-20 12:36:10,392 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 192.168.1.8/192.168.1.8:7001. Already tried 9 time(s).
2013-04-20 12:36:10,393 INFO org.apache.hadoop.ipc.RPC: Server at 192.168.1.8/192.168.1.8:7001 not available yet, Zzzzz...
2013-04-20 12:36:12,393 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 192.168.1.8/192.168.1.8:7001. Already tried 0 time(s).
2013-04-20 12:36:43,011 INFO org.apache.hadoop.mapred.TaskTracker: Using ResourceCalculatorPlugin : null
2013-04-20 12:36:43,030 INFO org.apache.hadoop.mapred.TaskTracker: Starting thread: Map-events fetcher for all reduce tasks on tracker_myhost2:localhost/127.0.0.1:51172
2013-04-20 12:36:43,031 WARN org.apache.hadoop.util.ProcessTree: setsid is not available on this machine. So not using it.
2013-04-20 12:36:43,031 INFO org.apache.hadoop.util.ProcessTree: setsid exited with exit code 0
2013-04-20 12:36:43,182 INFO org.apache.hadoop.util.ProcfsBasedProcessTree: ProcfsBasedProcessTree currently is supported only on Linux.
2013-04-20 12:36:43,182 INFO org.apache.hadoop.mapred.TaskTracker: ProcessTree implementation is missing on this system. TaskMemoryManager is disabled.
2013-04-20 12:36:43,206 INFO org.apache.hadoop.mapred.IndexCache: IndexCache created with max memory = 10485760
2013-04-20 12:36:43,214 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source ShuffleServerMetrics registered.
2013-04-20 12:36:43,217 INFO org.apache.hadoop.http.HttpServer: Port returned by webServer.getConnectors()[0].getLocalPort() before open() is -1. Opening the listener on 50060
2013-04-20 12:36:43,218 INFO org.apache.hadoop.http.HttpServer: listener.getLocalPort() returned 50060 webServer.getConnectors()[0].getLocalPort() returned 50060
2013-04-20 12:36:43,218 INFO org.apache.hadoop.http.HttpServer: Jetty bound to port 50060
2013-04-20 12:36:43,218 INFO org.mortbay.log: jetty-6.1.26
2013-04-20 12:36:43,611 INFO org.mortbay.log: Started SelectChannelConnector#0.0.0.0:50060
2013-04-20 12:36:43,611 INFO org.apache.hadoop.mapred.TaskTracker: FILE_CACHE_SIZE for mapOutputServlet set to : 2000
2013-04-20 12:36:43,635 INFO org.apache.hadoop.mapred.UserLogCleaner: Adding job_201304112319_0001 for user-log deletion with retainTimeStamp:1366573003186
2013-04-20 12:36:43,635 INFO org.apache.hadoop.mapred.UserLogCleaner: Adding job_201304070459_0003 for user-log deletion with retainTimeStamp:1366573003186
2013-04-20 12:36:43,635 INFO org.apache.hadoop.mapred.UserLogCleaner: Adding job_201304070459_0001 for user-log deletion with retainTimeStamp:1366573003186
2013-04-20 12:36:43,635 INFO org.apache.hadoop.mapred.UserLogCleaner: Adding job_201304112319_0007 for user-log deletion with retainTimeStamp:1366573003186
2013-04-20 12:36:43,635 INFO org.apache.hadoop.mapred.UserLogCleaner: Adding job_201304070413_0001 for user-log deletion with retainTimeStamp:1366573003186
2013-04-20 12:36:43,635 INFO org.apache.hadoop.mapred.UserLogCleaner: Adding job_201304121018_0001 for user-log deletion with retainTimeStamp:1366573003186
2013-04-20 12:36:43,635 INFO org.apache.hadoop.mapred.UserLogCleaner: Adding job_201304192207_0003 for user-log deletion with retainTimeStamp:1366573003186
2013-04-20 12:49:24,662 INFO org.apache.hadoop.mapred.TaskTracker: LaunchTaskAction (registerTask): attempt_201304201135_0001_m_000001_0 task's state:UNASSIGNED
2013-04-20 12:49:24,665 INFO org.apache.hadoop.mapred.TaskTracker: Trying to launch : attempt_201304201135_0001_m_000001_0 which needs 1 slots
2013-04-20 12:49:24,665 INFO org.apache.hadoop.mapred.TaskTracker: In TaskLauncher, current free slots : 2 and trying to launch attempt_201304201135_0001_m_000001_0 which needs 1 slots
2013-04-20 12:49:25,006 INFO org.apache.hadoop.mapred.JobLocalizer: Initializing user hadoop on this TT.
2013-04-20 12:49:25,377 INFO org.apache.hadoop.mapred.JvmManager: In JvmRunner constructed JVM ID: jvm_201304201135_0001_m_-1899194781
2013-04-20 12:49:25,378 INFO org.apache.hadoop.mapred.JvmManager: JVM Runner jvm_201304201135_0001_m_-1899194781 spawned.
2013-04-20 12:49:25,381 INFO org.apache.hadoop.mapred.TaskController: Writing commands to /tmp/hadoop-hadoop/mapred/local/ttprivate/taskTracker/hadoop/jobcache/job_201304201135_0001/attempt_201304201135_0001_m_000001_0/taskjvm.sh
2013-04-20 12:49:26,284 INFO org.apache.hadoop.mapred.TaskTracker: JVM with ID: jvm_201304201135_0001_m_-1899194781 given task: attempt_201304201135_0001_m_000001_0
2013-04-20 12:49:29,803 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201304201135_0001_m_000001_0 0.0% setup
2013-04-20 12:49:29,806 INFO org.apache.hadoop.mapred.TaskTracker: Task attempt_201304201135_0001_m_000001_0 is done.
2013-04-20 12:49:29,807 INFO org.apache.hadoop.mapred.TaskTracker: reported output size for attempt_201304201135_0001_m_000001_0 was -1
2013-04-20 12:49:29,809 INFO org.apache.hadoop.mapred.TaskTracker: addFreeSlot : current free slots : 2
2013-04-20 12:49:29,947 INFO org.apache.hadoop.mapred.JvmManager: JVM : jvm_201304201135_0001_m_-1899194781 exited with exit code 0. Number of tasks it ran: 1
2013-04-20 12:49:30,719 INFO org.apache.hadoop.mapred.TaskTracker: LaunchTaskAction (registerTask): attempt_201304201135_0001_r_000000_0 task's state:UNASSIGNED
2013-04-20 12:49:30,719 INFO org.apache.hadoop.mapred.TaskTracker: Trying to launch : attempt_201304201135_0001_r_000000_0 which needs 1 slots
2013-04-20 12:49:30,719 INFO org.apache.hadoop.mapred.TaskTracker: In TaskLauncher, current free slots : 2 and trying to launch attempt_201304201135_0001_r_000000_0 which needs 1 slots
2013-04-20 12:49:30,719 INFO org.apache.hadoop.mapred.TaskTracker: Received KillTaskAction for task: attempt_201304201135_0001_m_000001_0
2013-04-20 12:49:30,719 INFO org.apache.hadoop.mapred.TaskTracker: About to purge task: attempt_201304201135_0001_m_000001_0
2013-04-20 12:49:30,720 INFO org.apache.hadoop.mapred.IndexCache: Map ID attempt_201304201135_0001_m_000001_0 not found in cache
2013-04-20 12:49:30,742 INFO org.apache.hadoop.mapred.JvmManager: In JvmRunner constructed JVM ID: jvm_201304201135_0001_r_-1899194781
2013-04-20 12:49:30,742 INFO org.apache.hadoop.mapred.JvmManager: JVM Runner jvm_201304201135_0001_r_-1899194781 spawned.
2013-04-20 12:49:30,745 INFO org.apache.hadoop.mapred.TaskController: Writing commands to /tmp/hadoop-hadoop/mapred/local/ttprivate/taskTracker/hadoop/jobcache/job_201304201135_0001/attempt_201304201135_0001_r_000000_0/taskjvm.sh
2013-04-20 12:49:31,611 INFO org.apache.hadoop.mapred.TaskTracker: JVM with ID: jvm_201304201135_0001_r_-1899194781 given task: attempt_201304201135_0001_r_000000_0
2013-04-20 12:49:33,219 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201304201135_0001_r_000000_0 0.0%
2013-04-20 12:49:33,226 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201304201135_0001_r_000000_0 0.0%
2013-04-20 12:49:33,296 INFO org.apache.hadoop.mapred.TaskTracker: Task attempt_201304201135_0001_r_000000_0 is in commit-pending, task state:COMMIT_PENDING
2013-04-20 12:49:33,297 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201304201135_0001_r_000000_0 0.0%
2013-04-20 12:49:33,731 INFO org.apache.hadoop.mapred.TaskTracker: Received commit task action for attempt_201304201135_0001_r_000000_0
2013-04-20 12:49:35,143 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201304201135_0001_r_000000_0 1.0% reduce > reduce
2013-04-20 12:49:35,147 INFO org.apache.hadoop.mapred.TaskTracker: Task attempt_201304201135_0001_r_000000_0 is done.
2013-04-20 12:49:35,147 INFO org.apache.hadoop.mapred.TaskTracker: reported output size for attempt_201304201135_0001_r_000000_0 was -1
2013-04-20 12:49:35,155 INFO org.apache.hadoop.mapred.TaskTracker: addFreeSlot : current free slots : 2
2013-04-20 12:49:35,272 INFO org.apache.hadoop.mapred.JvmManager: JVM : jvm_201304201135_0001_r_-1899194781 exited with exit code 0. Number of tasks it ran: 1
2013-04-20 12:49:36,775 INFO org.apache.hadoop.mapred.TaskTracker: LaunchTaskAction (registerTask): attempt_201304201135_0001_m_000000_0 task's state:UNASSIGNED
2013-04-20 12:49:36,783 INFO org.apache.hadoop.mapred.TaskTracker: Trying to launch : attempt_201304201135_0001_m_000000_0 which needs 1 slots
2013-04-20 12:49:36,783 INFO org.apache.hadoop.mapred.TaskTracker: In TaskLauncher, current free slots : 2 and trying to launch attempt_201304201135_0001_m_000000_0 which needs 1 slots
2013-04-20 12:49:36,783 INFO org.apache.hadoop.mapred.TaskTracker: Received KillTaskAction for task: attempt_201304201135_0001_r_000000_0
2013-04-20 12:49:36,784 INFO org.apache.hadoop.mapred.TaskTracker: About to purge task: attempt_201304201135_0001_r_000000_0
2013-04-20 12:49:36,812 INFO org.apache.hadoop.mapred.JvmManager: In JvmRunner constructed JVM ID: jvm_201304201135_0001_m_-653215379
2013-04-20 12:49:36,812 INFO org.apache.hadoop.mapred.JvmManager: JVM Runner jvm_201304201135_0001_m_-653215379 spawned.
2013-04-20 12:49:36,814 INFO org.apache.hadoop.mapred.TaskController: Writing commands to /tmp/hadoop-hadoop/mapred/local/ttprivate/taskTracker/hadoop/jobcache/job_201304201135_0001/attempt_201304201135_0001_m_000000_0/taskjvm.sh
2013-04-20 12:49:37,718 INFO org.apache.hadoop.mapred.TaskTracker: JVM with ID: jvm_201304201135_0001_m_-653215379 given task: attempt_201304201135_0001_m_000000_0
2013-04-20 12:49:38,227 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201304201135_0001_m_000000_0 0.0%
2013-04-20 12:49:41,232 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201304201135_0001_m_000000_0 0.0% cleanup
2013-04-20 12:49:41,238 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201304201135_0001_m_000000_0 0.0% cleanup
2013-04-20 12:49:41,243 INFO org.apache.hadoop.mapred.TaskTracker: Task attempt_201304201135_0001_m_000000_0 is done.
2013-04-20 12:49:41,243 INFO org.apache.hadoop.mapred.TaskTracker: reported output size for attempt_201304201135_0001_m_000000_0 was -1
2013-04-20 12:49:41,245 INFO org.apache.hadoop.mapred.TaskTracker: addFreeSlot : current free slots : 2
2013-04-20 12:49:41,378 INFO org.apache.hadoop.mapred.JvmManager: JVM : jvm_201304201135_0001_m_-653215379 exited with exit code 0. Number of tasks it ran: 1
2013-04-20 12:49:42,821 INFO org.apache.hadoop.mapred.TaskTracker: Received 'KillJobAction' for job: job_201304201135_0001
2013-04-20 12:49:42,822 INFO org.apache.hadoop.mapred.IndexCache: Map ID attempt_201304201135_0001_m_000000_0 not found in cache
2013-04-20 12:49:42,826 INFO org.apache.hadoop.mapred.UserLogCleaner: Adding job_201304201135_0001 for user-log deletion with retainTimeStamp:1366573782822
hadoop-hadoop-tasktracker-myhost3.log
2013-04-20 12:35:43,798 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: TaskTracker metrics system started
2013-04-20 12:35:44,200 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source ugi registered.
2013-04-20 12:35:44,207 WARN org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Source name ugi already exists!
2013-04-20 12:35:44,382 INFO org.mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog
2013-04-20 12:35:44,467 INFO org.apache.hadoop.http.HttpServer: Added global filtersafety (class=org.apache.hadoop.http.HttpServer$QuotingInputFilter)
2013-04-20 12:35:44,496 INFO org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
2013-04-20 12:35:44,500 INFO org.apache.hadoop.mapred.TaskTracker: Starting tasktracker with owner as hadoop
2013-04-20 12:35:44,501 INFO org.apache.hadoop.mapred.TaskTracker: Good mapred local directories are: /tmp/hadoop-hadoop/mapred/local
2013-04-20 12:35:44,506 WARN org.apache.hadoop.util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2013-04-20 12:35:44,520 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source jvm registered.
2013-04-20 12:35:44,521 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source TaskTrackerMetrics registered.
2013-04-20 12:35:49,567 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source RpcDetailedActivityForPort39327 registered.
2013-04-20 12:35:49,568 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source RpcActivityForPort39327 registered.
2013-04-20 12:35:49,575 INFO org.apache.hadoop.mapred.TaskTracker: TaskTracker up at: localhost/127.0.0.1:39327
2013-04-20 12:35:49,575 INFO org.apache.hadoop.mapred.TaskTracker: Starting tracker tracker_myhost3:localhost/127.0.0.1:39327
2013-04-20 12:35:49,586 INFO org.apache.hadoop.ipc.Server: Starting SocketReader
2013-04-20 12:35:49,587 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting
2013-04-20 12:35:49,587 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 39327: starting
2013-04-20 12:35:49,587 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 on 39327: starting
2013-04-20 12:35:49,587 INFO org.apache.hadoop.ipc.Server: IPC Server handler 1 on 39327: starting
2013-04-20 12:35:49,587 INFO org.apache.hadoop.ipc.Server: IPC Server handler 2 on 39327: starting
2013-04-20 12:35:49,587 INFO org.apache.hadoop.ipc.Server: IPC Server handler 3 on 39327: starting
2013-04-20 12:35:50,617 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 192.168.1.8/192.168.1.8:7001. Already tried 0 time(s).
2013-04-20 12:35:51,617 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 192.168.1.8/192.168.1.8:7001. Already tried 1 time(s).
2013-04-20 12:35:52,618 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 192.168.1.8/192.168.1.8:7001. Already tried 2 time(s).
2013-04-20 12:35:53,618 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 192.168.1.8/192.168.1.8:7001. Already tried 3 time(s).
2013-04-20 12:35:54,619 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 192.168.1.8/192.168.1.8:7001. Already tried 4 time(s).
2013-04-20 12:35:55,619 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 192.168.1.8/192.168.1.8:7001. Already tried 5 time(s).
2013-04-20 12:35:56,620 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 192.168.1.8/192.168.1.8:7001. Already tried 6 time(s).
2013-04-20 12:35:57,620 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 192.168.1.8/192.168.1.8:7001. Already tried 7 time(s).
2013-04-20 12:35:58,621 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 192.168.1.8/192.168.1.8:7001. Already tried 8 time(s).
2013-04-20 12:35:59,622 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 192.168.1.8/192.168.1.8:7001. Already tried 9 time(s).
2013-04-20 12:35:59,625 INFO org.apache.hadoop.ipc.RPC: Server at 192.168.1.8/192.168.1.8:7001 not available yet, Zzzzz...
2013-04-20 12:36:01,624 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 192.168.1.8/192.168.1.8:7001. Already tried 0 time(s).
2013-04-20 12:36:02,625 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 192.168.1.8/192.168.1.8:7001. Already tried 1 time(s).
2013-04-20 12:36:03,624 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 192.168.1.8/192.168.1.8:7001. Already tried 2 time(s).
2013-04-20 12:36:04,625 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 192.168.1.8/192.168.1.8:7001. Already tried 3 time(s).
2013-04-20 12:36:05,624 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 192.168.1.8/192.168.1.8:7001. Already tried 4 time(s).
2013-04-20 12:36:06,625 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 192.168.1.8/192.168.1.8:7001. Already tried 5 time(s).
2013-04-20 12:36:07,627 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 192.168.1.8/192.168.1.8:7001. Already tried 6 time(s).
2013-04-20 12:36:08,627 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 192.168.1.8/192.168.1.8:7001. Already tried 7 time(s).
2013-04-20 12:36:09,627 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 192.168.1.8/192.168.1.8:7001. Already tried 8 time(s).
2013-04-20 12:36:10,628 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 192.168.1.8/192.168.1.8:7001. Already tried 9 time(s).
2013-04-20 12:36:10,629 INFO org.apache.hadoop.ipc.RPC: Server at 192.168.1.8/192.168.1.8:7001 not available yet, Zzzzz...
2013-04-20 12:36:12,629 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 192.168.1.8/192.168.1.8:7001. Already tried 0 time(s).
2013-04-20 12:36:42,982 INFO org.apache.hadoop.mapred.TaskTracker: Using ResourceCalculatorPlugin : null
2013-04-20 12:36:43,000 INFO org.apache.hadoop.mapred.TaskTracker: Starting thread: Map-events fetcher for all reduce tasks on tracker_myhost3:localhost/127.0.0.1:39327
2013-04-20 12:36:43,002 WARN org.apache.hadoop.util.ProcessTree: setsid is not available on this machine. So not using it.
2013-04-20 12:36:43,002 INFO org.apache.hadoop.util.ProcessTree: setsid exited with exit code 0
2013-04-20 12:36:43,157 INFO org.apache.hadoop.util.ProcfsBasedProcessTree: ProcfsBasedProcessTree currently is supported only on Linux.
2013-04-20 12:36:43,157 INFO org.apache.hadoop.mapred.TaskTracker: ProcessTree implementation is missing on this system. TaskMemoryManager is disabled.
2013-04-20 12:36:43,165 INFO org.apache.hadoop.mapred.IndexCache: IndexCache created with max memory = 10485760
2013-04-20 12:36:43,173 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source ShuffleServerMetrics registered.
2013-04-20 12:36:43,176 INFO org.apache.hadoop.http.HttpServer: Port returned by webServer.getConnectors()[0].getLocalPort() before open() is -1. Opening the listener on 50060
2013-04-20 12:36:43,176 INFO org.apache.hadoop.http.HttpServer: listener.getLocalPort() returned 50060 webServer.getConnectors()[0].getLocalPort() returned 50060
2013-04-20 12:36:43,176 INFO org.apache.hadoop.http.HttpServer: Jetty bound to port 50060
2013-04-20 12:36:43,176 INFO org.mortbay.log: jetty-6.1.26
2013-04-20 12:36:43,567 INFO org.mortbay.log: Started SelectChannelConnector#0.0.0.0:50060
2013-04-20 12:36:43,567 INFO org.apache.hadoop.mapred.TaskTracker: FILE_CACHE_SIZE for mapOutputServlet set to : 2000
2013-04-20 12:36:43,576 INFO org.apache.hadoop.mapred.UserLogCleaner: Adding job_201304060623_0002 for user-log deletion with retainTimeStamp:1366573003161
2013-04-20 12:36:43,576 INFO org.apache.hadoop.mapred.UserLogCleaner: Adding job_201304060623_0005 for user-log deletion with retainTimeStamp:1366573003161
2013-04-20 12:36:43,576 INFO org.apache.hadoop.mapred.UserLogCleaner: Adding job_201304070459_0001 for user-log deletion with retainTimeStamp:1366573003161
2013-04-20 12:36:43,576 INFO org.apache.hadoop.mapred.UserLogCleaner: Adding job_201304030513_0001 for user-log deletion with retainTimeStamp:1366573003161
2013-04-20 12:36:43,576 INFO org.apache.hadoop.mapred.UserLogCleaner: Adding job_201304070413_0001 for user-log deletion with retainTimeStamp:1366573003161
2013-04-20 12:36:43,576 INFO org.apache.hadoop.mapred.UserLogCleaner: Adding job_201304012201_0015 for user-log deletion with retainTimeStamp:1366573003161
2013-04-20 12:36:43,576 INFO org.apache.hadoop.mapred.UserLogCleaner: Adding job_201304192207_0001 for user-log deletion with retainTimeStamp:1366573003161
2013-04-20 12:36:43,576 INFO org.apache.hadoop.mapred.UserLogCleaner: Adding job_201304121018_0003 for user-log deletion with retainTimeStamp:1366573003161
2013-04-20 12:36:43,576 INFO org.apache.hadoop.mapred.UserLogCleaner: Adding job_201304112319_0001 for user-log deletion with retainTimeStamp:1366573003161
2013-04-20 12:36:43,577 INFO org.apache.hadoop.mapred.UserLogCleaner: Adding job_201304112319_0005 for user-log deletion with retainTimeStamp:1366573003161
2013-04-20 12:36:43,577 INFO org.apache.hadoop.mapred.UserLogCleaner: Adding job_201304070459_0003 for user-log deletion with retainTimeStamp:1366573003161
2013-04-20 12:49:45,627 INFO org.apache.hadoop.mapred.TaskTracker: Received 'KillJobAction' for job: job_201304201135_0001
2013-04-20 12:49:45,628 WARN org.apache.hadoop.mapred.TaskTracker: Unknown job job_201304201135_0001 being deleted.
Log4j output will be written to the task log for each map / reduce task - this doesn't make its way back to the job client std out / err unless a map / reduce task fails in some way.
You'll need to find the task instance via the job tracker, and then view the logs for a particular map / reduce task instance.
My purpose - to launch the demon of namenode. It is necessary for me to work with file system of hdfs, to copy there files from local file system, to create folders in hdfs, and it requires start of the demon of namenode on the port specified in the configuration /conf/core-site.xml file.
I launched a script
./hadoop namenode
and I received as a result the following messages
2013-02-17 12:29:37,493 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = one/192.168.1.8
STARTUP_MSG: args = []
STARTUP_MSG: version = 1.0.1
STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0 -r 1243785; compiled by 'hortonfo' on Tue Feb 14 08:15:38 UTC 2012
************************************************************/
2013-02-17 12:29:38,325 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties
2013-02-17 12:29:38,400 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source MetricsSystem,sub=Stats registered.
2013-02-17 12:29:38,427 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).
2013-02-17 12:29:38,427 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system started
2013-02-17 12:29:39,509 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source ugi registered.
2013-02-17 12:29:39,542 WARN org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Source name ugi already exists!
2013-02-17 12:29:39,633 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source jvm registered.
2013-02-17 12:29:39,635 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source NameNode registered.
2013-02-17 12:29:39,704 INFO org.apache.hadoop.hdfs.util.GSet: VM type = 32-bit
2013-02-17 12:29:39,708 INFO org.apache.hadoop.hdfs.util.GSet: 2% max memory = 19.33375 MB
2013-02-17 12:29:39,708 INFO org.apache.hadoop.hdfs.util.GSet: capacity = 2^22 = 4194304 entries
2013-02-17 12:29:39,708 INFO org.apache.hadoop.hdfs.util.GSet: recommended=4194304, actual=4194304
2013-02-17 12:29:42,718 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: fsOwner=hadoop
2013-02-17 12:29:42,737 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=supergroup
2013-02-17 12:29:42,738 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: isPermissionEnabled=true
2013-02-17 12:29:42,937 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: dfs.block.invalidate.limit=100
2013-02-17 12:29:42,940 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)
2013-02-17 12:29:45,820 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered FSNamesystemStateMBean and NameNodeMXBean
2013-02-17 12:29:46,229 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: Caching file names occuring more than 10 times
2013-02-17 12:29:46,836 INFO org.apache.hadoop.hdfs.server.common.Storage: Number of files = 1
2013-02-17 12:29:47,133 INFO org.apache.hadoop.hdfs.server.common.Storage: Number of files under construction = 0
2013-02-17 12:29:47,134 INFO org.apache.hadoop.hdfs.server.common.Storage: Image file of size 112 loaded in 0 seconds.
2013-02-17 12:29:47,134 INFO org.apache.hadoop.hdfs.server.common.Storage: Edits file /tmp/hadoop-hadoop/dfs/name/current/edits of size 4 edits # 0 loaded in 0 seconds.
2013-02-17 12:29:47,163 INFO org.apache.hadoop.hdfs.server.common.Storage: Image file of size 112 saved in 0 seconds.
2013-02-17 12:29:47,375 INFO org.apache.hadoop.hdfs.server.common.Storage: Image file of size 112 saved in 0 seconds.
2013-02-17 12:29:47,479 INFO org.apache.hadoop.hdfs.server.namenode.NameCache: initialized with 0 entries 0 lookups
2013-02-17 12:29:47,480 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Finished loading FSImage in 6294 msecs
2013-02-17 12:29:47,919 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Total number of blocks = 0
2013-02-17 12:29:47,919 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of invalid blocks = 0
2013-02-17 12:29:47,920 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of under-replicated blocks = 0
2013-02-17 12:29:47,920 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of over-replicated blocks = 0
2013-02-17 12:29:47,920 INFO org.apache.hadoop.hdfs.StateChange: STATE* Safe mode termination scan for invalid, over- and under-replicated blocks completed in 430 msec
2013-02-17 12:29:47,920 INFO org.apache.hadoop.hdfs.StateChange: STATE* Leaving safe mode after 6 secs.
2013-02-17 12:29:47,920 INFO org.apache.hadoop.hdfs.StateChange: STATE* Network topology has 0 racks and 0 datanodes
2013-02-17 12:29:47,920 INFO org.apache.hadoop.hdfs.StateChange: STATE* UnderReplicatedBlocks has 0 blocks
2013-02-17 12:29:48,198 INFO org.apache.hadoop.util.HostsFileReader: Refreshing hosts (include/exclude) list
2013-02-17 12:29:48,279 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: ReplicateQueue QueueProcessingStatistics: First cycle completed 0 blocks in 129 msec
2013-02-17 12:29:48,279 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: ReplicateQueue QueueProcessingStatistics: Queue flush completed 0 blocks in 129 msec processing time, 129 msec clock time, 1 cycles
2013-02-17 12:29:48,280 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: InvalidateQueue QueueProcessingStatistics: First cycle completed 0 blocks in 0 msec
2013-02-17 12:29:48,280 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: InvalidateQueue QueueProcessingStatistics: Queue flush completed 0 blocks in 0 msec processing time, 0 msec clock time, 1 cycles
2013-02-17 12:29:48,280 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source FSNamesystemMetrics registered.
2013-02-17 12:29:48,711 INFO org.apache.hadoop.ipc.Server: Starting SocketReader
2013-02-17 12:29:48,836 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source RpcDetailedActivityForPort2000 registered.
2013-02-17 12:29:48,836 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source RpcActivityForPort2000 registered.
2013-02-17 12:29:48,865 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: Namenode up at: one/192.168.1.8:2000
2013-02-17 12:30:23,264 INFO org.mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog
2013-02-17 12:30:25,326 INFO org.apache.hadoop.http.HttpServer: Added global filtersafety (class=org.apache.hadoop.http.HttpServer$QuotingInputFilter)
2013-02-17 12:30:25,727 INFO org.apache.hadoop.http.HttpServer: dfs.webhdfs.enabled = false
2013-02-17 12:30:25,997 INFO org.apache.hadoop.http.HttpServer: Port returned by webServer.getConnectors()[0].getLocalPort() before open() is -1. Opening the listener on 50070
2013-02-17 12:30:26,269 ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:hadoop cause:java.net.BindException: Address already in use
2013-02-17 12:30:26,442 WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: ReplicationMonitor thread received InterruptedException.java.lang.InterruptedException: sleep interrupted
2013-02-17 12:30:26,445 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of transactions: 0 Total time for transactions(ms): 0Number of transactions batched in Syncs: 0 Number of syncs: 0 SyncTimes(ms): 0
2013-02-17 12:30:26,446 INFO org.apache.hadoop.ipc.Server: Stopping server on 2000
2013-02-17 12:30:26,446 INFO org.apache.hadoop.ipc.metrics.RpcInstrumentation: shut down
2013-02-17 12:30:26,616 INFO org.apache.hadoop.hdfs.server.namenode.DecommissionManager: Interrupted Monitor
java.lang.InterruptedException: sleep interrupted
at java.lang.Thread.sleep(Native Method)
at org.apache.hadoop.hdfs.server.namenode.DecommissionManager$Monitor.run(DecommissionManager.java:65)
at java.lang.Thread.run(Thread.java:722)
2013-02-17 12:30:26,761 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: java.net.BindException: Address already in use
at sun.nio.ch.Net.bind0(Native Method)
at sun.nio.ch.Net.bind(Net.java:344)
at sun.nio.ch.Net.bind(Net.java:336)
at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:199)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
at org.mortbay.jetty.nio.SelectChannelConnector.open(SelectChannelConnector.java:216)
at org.apache.hadoop.http.HttpServer.start(HttpServer.java:581)
at org.apache.hadoop.hdfs.server.namenode.NameNode$1.run(NameNode.java:445)
at org.apache.hadoop.hdfs.server.namenode.NameNode$1.run(NameNode.java:353)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
at org.apache.hadoop.hdfs.server.namenode.NameNode.startHttpServer(NameNode.java:353)
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:305)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:496)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1279)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1288)
2013-02-17 12:30:26,784 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at one/192.168.1.8
************************************************************/
Help to launch the demon of namenode for further start of hadoop of application.
2013-02-17 12:30:26,761 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: java.net.BindException: Address already in use
Looks like you already have a process running on the same port at the one the Name Node binds to. Probably means you already have an instance of the name node process running.
You should be able to use either the jps -v command to list the running java processes for the current user, or ps aww | grep java to list all running java processes.
Check that your IP address is mapped correctly in /etc/hosts file. Check using ifconfig and map to the correct DNS name. This error is thrown if the mapping is not correct also.