Hadoop job is submitted but it still not finished after few hours - java

I built an distributed environment with a virtual box. It contains 3 nodes, 1 master node and 2 slave nodes.
I've already put some data on HDFS, and I executed the following command:
./hadoop/bin/hadoop jar hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar wordcount input/ output/
Here is the log I got after few hours:
15/05/03 20:48:30 INFO client.RMProxy: Connecting to ResourceManager at hadoop01/192.168.10.11:8032
15/05/03 20:48:31 INFO input.FileInputFormat: Total input paths to process : 1
15/05/03 20:48:31 INFO mapreduce.JobSubmitter: number of splits:1
15/05/03 20:48:31 INFO Configuration.deprecation: user.name is deprecated. Instead, use mapreduce.job.user.name
15/05/03 20:48:31 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
15/05/03 20:48:31 INFO Configuration.deprecation: mapred.output.value.class is deprecated. Instead, use mapreduce.job.output.value.class
15/05/03 20:48:31 INFO Configuration.deprecation: mapreduce.combine.class is deprecated. Instead, use mapreduce.job.combine.class
15/05/03 20:48:31 INFO Configuration.deprecation: mapreduce.map.class is deprecated. Instead, use mapreduce.job.map.class
15/05/03 20:48:31 INFO Configuration.deprecation: mapred.job.name is deprecated. Instead, use mapreduce.job.name
15/05/03 20:48:31 INFO Configuration.deprecation: mapreduce.reduce.class is deprecated. Instead, use mapreduce.job.reduce.class
15/05/03 20:48:31 INFO Configuration.deprecation: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
15/05/03 20:48:31 INFO Configuration.deprecation: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
15/05/03 20:48:31 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
15/05/03 20:48:31 INFO Configuration.deprecation: mapred.output.key.class is deprecated. Instead, use mapreduce.job.output.key.class
15/05/03 20:48:31 INFO Configuration.deprecation: mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir
15/05/03 20:48:32 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1430656198359_0002
15/05/03 20:48:32 INFO impl.YarnClientImpl: Submitted application application_1430656198359_0002 to ResourceManager at hadoop01/192.168.10.11:8032
15/05/03 20:48:32 INFO mapreduce.Job: The url to track the job: http://hadoop01:8088/proxy/application_1430656198359_0002/
15/05/03 20:48:32 INFO mapreduce.Job: Running job: job_1430656198359_0002

Related

Error while importing table from mysql to hdfs using sqoop

i am running this import command to import E_INFO table from MySQL databases to HDFS using sqoop. the job is failing with below given error.
sqoop import --connect jdbc:mysql://localhost/HADOOP_BANK_DATA_POC --username root --password xxxxxxx --table E_INFO
however a blank file is getting created without any records in hdfs as E_INFO.
your help here is highly appreciated.
h-user#h-primary:~/sqoop-1.4.7/bin$ sqoop import --connect jdbc:mysql://localhost/HADOOP_BANK_DATA_POC --username root --password xxxxxxx --table E_INFO
Warning: /home/h-user/sqoop-1.4.7/../hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
Warning: /home/h-user/sqoop-1.4.7/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /home/h-user/sqoop-1.4.7/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /home/h-user/sqoop-1.4.7/../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
/usr/local/hadoop/libexec/hadoop-functions.sh: line 2366: HADOOP_ORG.APACHE.SQOOP.SQOOP_USER: invalid variable name
/usr/local/hadoop/libexec/hadoop-functions.sh: line 2461: HADOOP_ORG.APACHE.SQOOP.SQOOP_OPTS: invalid variable name
2022-12-24 21:33:59,888 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
2022-12-24 21:34:00,034 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
2022-12-24 21:34:00,355 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
2022-12-24 21:34:00,356 INFO tool.CodeGenTool: Beginning code generation
Loading class `com.mysql.jdbc.Driver'. This is deprecated. The new driver class is `com.mysql.cj.jdbc.Driver'. The driver is automatically registered via the SPI and manual loading of the driver class is generally unnecessary.
2022-12-24 21:34:02,279 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `E_INFO` AS t LIMIT 1
2022-12-24 21:34:02,678 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `E_INFO` AS t LIMIT 1
2022-12-24 21:34:02,693 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/local/hadoop
Note: /tmp/sqoop-h-user/compile/e0e749532f0eaa6daaca522c9759658a/E_INFO.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
2022-12-24 21:34:07,748 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-h-user/compile/e0e749532f0eaa6daaca522c9759658a/E_INFO.jar
2022-12-24 21:34:07,798 WARN manager.MySQLManager: It looks like you are importing from mysql.
2022-12-24 21:34:07,798 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
2022-12-24 21:34:07,798 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
2022-12-24 21:34:07,798 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
2022-12-24 21:34:07,830 INFO mapreduce.ImportJobBase: Beginning import of E_INFO
2022-12-24 21:34:07,833 INFO Configuration.deprecation: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2022-12-24 21:34:08,191 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
2022-12-24 21:34:09,386 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
2022-12-24 21:34:09,626 INFO impl.MetricsConfig: Loaded properties from hadoop-metrics2.properties
2022-12-24 21:34:10,080 INFO impl.MetricsSystemImpl: Scheduled Metric snapshot period at 10 second(s).
2022-12-24 21:34:10,080 INFO impl.MetricsSystemImpl: JobTracker metrics system started
2022-12-24 21:34:10,722 INFO db.DBInputFormat: Using read commited transaction isolation
2022-12-24 21:34:10,793 INFO mapreduce.JobSubmitter: number of splits:1
2022-12-24 21:34:11,333 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local377089840_0001
2022-12-24 21:34:11,334 INFO mapreduce.JobSubmitter: Executing with tokens: []
2022-12-24 21:34:12,059 INFO mapred.LocalDistributedCacheManager: Creating symlink: /tmp/hadoop-h-user/mapred/local/job_local377089840_0001_7cb16b63-36d5-4781-94e7-4bd839e2ca49/libjars <- /home/h-user/sqoop-1.4.7/bin/libjars/*
2022-12-24 21:34:12,098 WARN fs.FileUtil: Command 'ln -s /tmp/hadoop-h-user/mapred/local/job_local377089840_0001_7cb16b63-36d5-4781-94e7-4bd839e2ca49/libjars /home/h-user/sqoop-1.4.7/bin/libjars/*' failed 1 with: ln: failed to create symbolic link '/home/h-user/sqoop-1.4.7/bin/libjars/*': No such file or directory
2022-12-24 21:34:12,098 WARN mapred.LocalDistributedCacheManager: Failed to create symlink: /tmp/hadoop-h-user/mapred/local/job_local377089840_0001_7cb16b63-36d5-4781-94e7-4bd839e2ca49/libjars <- /home/h-user/sqoop-1.4.7/bin/libjars/*
2022-12-24 21:34:12,099 INFO mapred.LocalDistributedCacheManager: Localized file:/tmp/hadoop/mapred/staging/h-user377089840/.staging/job_local377089840_0001/libjars as file:/tmp/hadoop-h-user/mapred/local/job_local377089840_0001_7cb16b63-36d5-4781-94e7-4bd839e2ca49/libjars
2022-12-24 21:34:12,256 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
2022-12-24 21:34:12,261 INFO mapred.LocalJobRunner: OutputCommitter set in config null
2022-12-24 21:34:12,264 INFO mapreduce.Job: Running job: job_local377089840_0001
2022-12-24 21:34:12,352 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 2
2022-12-24 21:34:12,353 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
2022-12-24 21:34:12,362 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
2022-12-24 21:34:12,743 INFO mapred.LocalJobRunner: Waiting for map tasks
2022-12-24 21:34:12,745 INFO mapred.LocalJobRunner: Starting task: attempt_local377089840_0001_m_000000_0
2022-12-24 21:34:12,831 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 2
2022-12-24 21:34:12,841 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
2022-12-24 21:34:12,918 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ]
2022-12-24 21:34:13,007 INFO db.DBInputFormat: Using read commited transaction isolation
2022-12-24 21:34:13,012 INFO mapred.MapTask: Processing split: 1=1 AND 1=1
2022-12-24 21:34:13,025 INFO mapred.LocalJobRunner: map task executor complete.
2022-12-24 21:34:13,134 WARN mapred.LocalJobRunner: job_local377089840_0001
java.lang.Exception: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class E_INFO not found
at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:492)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:552)
Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class E_INFO not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2638)
at org.apache.sqoop.mapreduce.db.DBConfiguration.getInputClass(DBConfiguration.java:403)
at org.apache.sqoop.mapreduce.db.DataDrivenDBInputFormat.createDBRecordReader(DataDrivenDBInputFormat.java:270)
at org.apache.sqoop.mapreduce.db.DBInputFormat.createRecordReader(DBInputFormat.java:266)
at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.<init>(MapTask.java:527)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:770)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:347)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:271)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
Caused by: java.lang.ClassNotFoundException: Class E_INFO not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2542)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2636)
... 12 more
2022-12-24 21:34:13,345 INFO mapreduce.Job: Job job_local377089840_0001 running in uber mode : false
2022-12-24 21:34:13,351 INFO mapreduce.Job: map 0% reduce 0%
2022-12-24 21:34:13,354 INFO mapreduce.Job: Job job_local377089840_0001 failed with state FAILED due to: NA
2022-12-24 21:34:13,389 INFO mapreduce.Job: Counters: 0
2022-12-24 21:34:13,396 WARN mapreduce.Counters: Group FileSystemCounters is deprecated. Use org.apache.hadoop.mapreduce.FileSystemCounter instead
2022-12-24 21:34:13,402 INFO mapreduce.ImportJobBase: Transferred 0 bytes in 3.998 seconds (0 bytes/sec)
2022-12-24 21:34:13,404 WARN mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead
2022-12-24 21:34:13,404 INFO mapreduce.ImportJobBase: Retrieved 0 records.
2022-12-24 21:34:13,404 ERROR tool.ImportTool: Import failed: Import job failed!

Hadoop mapreduce execution stuck

I'm using HADOOP on a VM. When I try to run a jar, the execution stops because is unable to find the file resource-type.xml.
How can I solve this? Thank you.
gaia#gaia-virtual-machine:~/hadoop-3.3.2$ bin/hadoop jar erasmus-0.0.1-SNAPSHOT.jar erasmus.MaxPartecipants input output
2022-05-09 10:27:30,069 INFO client.DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at /0.0.0.0:8032
2022-05-09 10:27:30,439 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
2022-05-09 10:27:30,457 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/gaia/.staging/job_1652083747035_0002
2022-05-09 10:27:30,673 INFO input.FileInputFormat: Total input files to process : 1
2022-05-09 10:27:31,158 INFO mapreduce.JobSubmitter: number of splits:1
2022-05-09 10:27:31,268 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1652083747035_0002
2022-05-09 10:27:31,268 INFO mapreduce.JobSubmitter: Executing with tokens: []
2022-05-09 10:27:31,439 INFO conf.Configuration: resource-types.xml not found
2022-05-09 10:27:31,440 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
2022-05-09 10:27:31,516 INFO impl.YarnClientImpl: Submitted application application_1652083747035_0002
2022-05-09 10:27:31,553 INFO mapreduce.Job: The url to track the job: http://gaia-virtual-machine:8088/proxy/application_1652083747035_0002/
2022-05-09 10:27:31,554 INFO mapreduce.Job: Running job: job_1652083747035_0002
The following is the output of the jps command:
gaia#gaia-virtual-machine:~/hadoop-3.3.2$ jps
14998 SecondaryNameNode
14648 NameNode
14779 DataNode
17836 Jps
16780 ResourceManager
In the yarn web UI it says that: Total Resource Preempted: <memory:0, vCores:0>
And in the node sections it says that there are 0 active nodes

Oraoop running when running a sqoop export job

I was running this sqoop query
sqoop export -D oraoop.timestamp.string=false --connect jdbc:oracle:thin:#127.0.0.1:1521:XE --username root --password manager --table person_temp4 --columns "id,first_name,last_name,email,tim" --export-dir /sqoop/person_data/avro3/ -m 4
For this query Oraoop was invoked, which is strange as oraoop doesn`t support export or eval jobs
The output is :
Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
17/04/13 04:01:07 INFO sqoop.Sqoop: Running Sqoop version: 1.4.4.2.1.1.0-385
17/04/13 04:01:07 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
17/04/13 04:01:07 INFO manager.SqlManager: Using default fetchSize of 1000
17/04/13 04:01:08 INFO oraoop.OraOopOracleQueries: Current schema is: ROOT
17/04/13 04:01:08 INFO oraoop.OraOopManagerFactory:
***********************************************************************
*** Using Quest® Data Connector for Oracle and Hadoop 1.6.0-cdh4-20 ***
*** Copyright 2012 Quest Software, Inc. ***
*** ALL RIGHTS RESERVED. ***
***********************************************************************
17/04/13 04:01:08 INFO oraoop.OraOopManagerFactory: Oracle Database version: Oracle Database 11g Express Edition Release 11.2.0.2.0 - 64bit Production
17/04/13 04:01:08 INFO oraoop.OraOopManagerFactory: This Oracle database is not a RAC.
17/04/13 04:01:08 INFO Configuration.deprecation: mapred.map.max.attempts is deprecated. Instead, use mapreduce.map.maxattempts
17/04/13 04:01:08 INFO tool.CodeGenTool: Beginning code generation
17/04/13 04:01:09 INFO manager.SqlManager: Executing SQL statement: SELECT "ID","FIRST_NAME","LAST_NAME","EMAIL","TIM" FROM person_temp4 WHERE 0=1
17/04/13 04:01:09 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/lib/hadoop-mapreduce
Note: /tmp/sqoop-root/compile/b4748ab65cf1d407986f9bee683d82db/person_temp4.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
17/04/13 04:01:11 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/b4748ab65cf1d407986f9bee683d82db/person_temp4.jar
17/04/13 04:01:11 INFO mapreduce.ExportJobBase: Beginning export of person_temp4
17/04/13 04:01:11 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
17/04/13 04:01:13 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative
17/04/13 04:01:13 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative
17/04/13 04:01:13 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
17/04/13 04:01:13 INFO client.RMProxy: Connecting to ResourceManager at sandbox.hortonworks.com/10.0.2.15:8050
17/04/13 04:01:14 INFO input.FileInputFormat: Total input paths to process : 1
17/04/13 04:01:14 INFO mapreduce.JobSubmitter: number of splits:1
17/04/13 04:01:14 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative
17/04/13 04:01:14 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1492080500165_0005
17/04/13 04:01:15 INFO impl.YarnClientImpl: Submitted application application_1492080500165_0005
17/04/13 04:01:15 INFO mapreduce.Job: The url to track the job: http://sandbox.hortonworks.com:8088/proxy/application_1492080500165_0005/
17/04/13 04:01:15 INFO mapreduce.Job: Running job: job_1492080500165_0005
17/04/13 04:01:23 INFO mapreduce.Job: Job job_1492080500165_0005 running in uber mode : false
17/04/13 04:01:23 INFO mapreduce.Job: map 0% reduce 0%
17/04/13 04:01:32 INFO mapreduce.Job: Task Id : attempt_1492080500165_0005_m_000000_0, Status : FAILED
Error: java.lang.RuntimeException: No such field: TIM
at person_temp4.setField(person_temp4.java:337)
at org.apache.sqoop.mapreduce.AvroExportMapper.toSqoopRecord(AvroExportMapper.java:120)
at org.apache.sqoop.mapreduce.AvroExportMapper.map(AvroExportMapper.java:104)
at org.apache.sqoop.mapreduce.AvroExportMapper.map(AvroExportMapper.java:49)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
17/04/13 04:01:38 INFO mapreduce.Job: Task Id : attempt_1492080500165_0005_m_000000_1, Status : FAILED
Error: java.lang.RuntimeException: No such field: TIM
at person_temp4.setField(person_temp4.java:337)
at org.apache.sqoop.mapreduce.AvroExportMapper.toSqoopRecord(AvroExportMapper.java:120)
at org.apache.sqoop.mapreduce.AvroExportMapper.map(AvroExportMapper.java:104)
at org.apache.sqoop.mapreduce.AvroExportMapper.map(AvroExportMapper.java:49)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
17/04/13 04:01:44 INFO mapreduce.Job: Task Id : attempt_1492080500165_0005_m_000000_2, Status : FAILED
Error: java.lang.RuntimeException: No such field: TIM
at person_temp4.setField(person_temp4.java:337)
at org.apache.sqoop.mapreduce.AvroExportMapper.toSqoopRecord(AvroExportMapper.java:120)
at org.apache.sqoop.mapreduce.AvroExportMapper.map(AvroExportMapper.java:104)
at org.apache.sqoop.mapreduce.AvroExportMapper.map(AvroExportMapper.java:49)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
17/04/13 04:01:56 INFO mapreduce.Job: map 100% reduce 0%
17/04/13 04:01:57 INFO mapreduce.Job: Job job_1492080500165_0005 failed with state FAILED due to: Task failed task_1492080500165_0005_m_000000
Job failed as tasks failed. failedMaps:1 failedReduces:0
17/04/13 04:01:57 INFO mapreduce.Job: Counters: 9
Job Counters
Failed map tasks=4
Launched map tasks=4
Other local map tasks=3
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=25923
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=25923
Total vcore-seconds taken by all map tasks=25923
Total megabyte-seconds taken by all map tasks=6480750
17/04/13 04:01:57 WARN mapreduce.Counters: Group FileSystemCounters is deprecated. Use org.apache.hadoop.mapreduce.FileSystemCounter instead
17/04/13 04:01:57 INFO mapreduce.ExportJobBase: Transferred 0 bytes in 44.3748 seconds (0 bytes/sec)
17/04/13 04:01:57 WARN mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead
17/04/13 04:01:57 INFO mapreduce.ExportJobBase: Exported 0 records.
17/04/13 04:01:57 ERROR tool.ExportTool: Error during export: Export job failed!
Why is Oraoop being invoked for export jobs?
I am using HDP 2.1 and have installed Oraoop seperately.

Hadoop kNN join algorithm stuck at map 100% reduce 0%

15/06/11 10:31:51 INFO mapreduce.Job: map 100% reduce 0%
I am trying to run open source kNN join MapReduce hbrj algorithm on a Hadoop 2.6.0 for single node cluster - pseudo-distributed operation installed on my laptop (OSX). (The source can be found here: http://www.cs.utah.edu/~lifeifei/knnj/). This algorithm is comprised of two MapReduce phases where the second phase uses the first phase's output files as its input. The first phase maps and reduces successfully - I can also look into the output files and everything seems right. However, when running the second phase the job is said to finish successfully even though it never reduces or even enters that stage I believe.
Here is what gets printed as I run phase 2 (I am including everything in hopes that it can be useful)
2015-06-11 10:31:47.526 java[3918:305930] Unable to load realm info from SCDynamicStore
15/06/11 10:31:48 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/06/11 10:31:49 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
15/06/11 10:31:49 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
15/06/11 10:31:49 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
15/06/11 10:31:49 INFO mapred.FileInputFormat: Total input paths to process : 64
15/06/11 10:31:49 INFO mapreduce.JobSubmitter: number of splits:64
15/06/11 10:31:50 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local1089761712_0001
15/06/11 10:31:50 INFO mapred.LocalJobRunner: OutputCommitter set in config null
15/06/11 10:31:50 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
15/06/11 10:31:50 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapred.FileOutputCommitter
15/06/11 10:31:50 INFO mapreduce.Job: Running job: job_local1089761712_0001
15/06/11 10:31:50 INFO mapred.LocalJobRunner: Waiting for map tasks
15/06/11 10:31:50 INFO mapred.LocalJobRunner: Starting task: attempt_local1089761712_0001_m_000000_0
15/06/11 10:31:50 INFO util.ProcfsBasedProcessTree: ProcfsBasedProcessTree currently is supported only on Linux.
15/06/11 10:31:50 INFO mapred.Task: Using ResourceCalculatorProcessTree : null
15/06/11 10:31:50 INFO mapred.MapTask: Processing split: hdfs://localhost:9000/user/sasha/hbrj/output/part-00042:0+872
15/06/11 10:31:50 INFO mapred.MapTask: numReduceTasks: 0
15/06/11 10:31:50 INFO mapred.LocalJobRunner:
15/06/11 10:31:50 INFO mapred.Task: Task:attempt_local1089761712_0001_m_000000_0 is done. And is in the process of committing
15/06/11 10:31:50 INFO mapred.LocalJobRunner:
15/06/11 10:31:50 INFO mapred.Task: Task attempt_local1089761712_0001_m_000000_0 is allowed to commit now
15/06/11 10:31:50 INFO output.FileOutputCommitter: Saved output of task 'attempt_local1089761712_0001_m_000000_0' to hdfs://localhost:9000/user/sasha/hbrj/output2/_temporary/0/task_local1089761712_0001_m_000000
15/06/11 10:31:50 INFO mapred.MapTask: numReduceTasks: 0
15/06/11 10:31:50 INFO mapred.LocalJobRunner:
15/06/11 10:31:50 INFO mapred.Task: Task:attempt_local1089761712_0001_m_000000_0 is done. And is in the process of committing
15/06/11 10:31:50 INFO mapred.LocalJobRunner:
15/06/11 10:31:50 INFO mapred.Task: Task attempt_local1089761712_0001_m_000000_0 is allowed to commit now
15/06/11 10:31:50 INFO output.FileOutputCommitter: Saved output of task 'attempt_local1089761712_0001_m_000000_0' to hdfs://localhost:9000/user/sasha/hbrj/output2/_temporary/0/task_local1089761712_0001_m_000000
15/06/11 10:31:50 INFO mapred.LocalJobRunner: hdfs://localhost:9000/user/sasha/hbrj/output/part-00042:0+872
15/06/11 10:31:50 INFO mapred.Task: Task 'attempt_local1089761712_0001_m_000000_0' done.
continues in this fashion until...
15/06/11 10:31:51 INFO mapred.LocalJobRunner: Finishing task: attempt_local1089761712_0001_m_000012_0
15/06/11 10:31:51 INFO mapred.LocalJobRunner: Starting task: attempt_local1089761712_0001_m_000013_0
15/06/11 10:31:51 INFO util.ProcfsBasedProcessTree: ProcfsBasedProcessTree currently is supported only on Linux.
15/06/11 10:31:51 INFO mapred.Task: Using ResourceCalculatorProcessTree : null
15/06/11 10:31:51 INFO mapred.MapTask: Processing split: hdfs://localhost:9000/user/sasha/hbrj/output/part-00015:0+646
15/06/11 10:31:51 INFO mapred.MapTask: numReduceTasks: 0
15/06/11 10:31:51 INFO mapreduce.Job: Job job_local1089761712_0001 running in uber mode : false
15/06/11 10:31:51 INFO mapreduce.Job: map 100% reduce 0%
15/06/11 10:31:51 INFO mapred.LocalJobRunner:
15/06/11 10:31:51 INFO mapred.Task: Task:attempt_local1089761712_0001_m_000013_0 is done. And is in the process of committing
15/06/11 10:31:51 INFO mapred.LocalJobRunner:
15/06/11 10:31:51 INFO mapred.Task: Task attempt_local1089761712_0001_m_000013_0 is allowed to commit now
15/06/11 10:31:51 INFO output.FileOutputCommitter: Saved output of task 'attempt_local1089761712_0001_m_000013_0' to hdfs://localhost:9000/user/sasha/hbrj/output2/_temporary/0/task_local1089761712_0001_m_000013
15/06/11 10:31:51 INFO mapred.LocalJobRunner: hdfs://localhost:9000/user/sasha/hbrj/output/part-00015:0+646
15/06/11 10:31:51 INFO mapred.Task: Task 'attempt_local1089761712_0001_m_000013_0' done.
15/06/11 10:31:51 INFO mapred.LocalJobRunner: Finishing task: attempt_local1089761712_0001_m_000013_0
15/06/11 10:31:51 INFO mapred.LocalJobRunner: Starting task: attempt_local1089761712_0001_m_000014_0
Starting task ... Finishing task
repeats in the manner as seen below (which is the last task) and the job is said to complete successfully:
15/06/11 10:31:53 INFO mapred.MapTask: numReduceTasks: 0
15/06/11 10:31:53 INFO mapred.LocalJobRunner:
15/06/11 10:31:53 INFO mapred.Task: Task:attempt_local1089761712_0001_m_000063_0 is done. And is in the process of committing
15/06/11 10:31:53 INFO mapred.LocalJobRunner:
15/06/11 10:31:53 INFO mapred.Task: Task attempt_local1089761712_0001_m_000063_0 is allowed to commit now
15/06/11 10:31:53 INFO output.FileOutputCommitter: Saved output of task 'attempt_local1089761712_0001_m_000063_0' to hdfs://localhost:9000/user/sasha/hbrj/output2/_temporary/0/task_local1089761712_0001_m_000063
15/06/11 10:31:53 INFO mapred.LocalJobRunner: hdfs://localhost:9000/user/sasha/hbrj/output/part-00004:0+178
15/06/11 10:31:53 INFO mapred.Task: Task 'attempt_local1089761712_0001_m_000063_0' done.
15/06/11 10:31:53 INFO mapred.LocalJobRunner: Finishing task: attempt_local1089761712_0001_m_000063_0
15/06/11 10:31:53 INFO mapred.LocalJobRunner: map task executor complete.
15/06/11 10:31:54 INFO mapreduce.Job: Job job_local1089761712_0001 completed successfully
15/06/11 10:31:54 INFO mapreduce.Job: Counters: 20
File System Counters
FILE: Number of bytes read=96487226
FILE: Number of bytes written=106993472
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=1157797
HDFS: Number of bytes written=884212
HDFS: Number of read operations=8576
HDFS: Number of large read operations=0
HDFS: Number of write operations=4224
Map-Reduce Framework
Map input records=793
Map output records=793
Input split bytes=6848
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=42
Total committed heap usage (bytes)=12124160000
File Input Format Counters
Bytes Read=28599
File Output Format Counters
Bytes Written=21848
What I have done so far:
I have found a similar question here: Hadoop WordCount example stuck at map 100% reduce 0% and followed some of the advice given. In particular:
At one point I configured Yarn so that I could go into localhost:8088 and monitor the job. All the mappers worked correctly - no failures, and the job ended abruptly right after the last mapper was successful, that is no reducers were ever started. Mapper was shown as 100% and reduce as 0%.
Using this command:
cat /path/to/logs/*.log | grep ERROR
returned nothing.
As I can see the output of the mapper stage I believe the problem does not lie there.
I have tried debugging: putting print statements in the configure method of the Reduce and the reduce method itself. None of them got printed when I rerun the file.
Additional note: Since the algorithm I am using was published and is supposed to work, I believe the problem could possibly arise from the fact that the code is 3 years old and was written for Hadoop 0.20.2 version but I guess I should not be too sure of this.
I understand that this is a specific question but I hope that someone could point me in the right direction. I would be happy to include anything else you might find useful. Any help is greatly appreciated!

Can anyone explain my Apache Spark Error SparkException: Job aborted due to stage failure

I have simple Apache Spark App where I read files from hdfs and after that i pipe it to external process. When I read a big amount a data (in my case files have about 241MB) and i don't specify min number of partitions or specify min number to 4 i'm getting following error:
Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 0.0 failed 4 times, most recent failure: Lost task 1.3 in stage 0.0 (TID 6, ip-172-31-36-43.us-west-2.compute.internal): ExecutorLostFailure (executor 6 lost)
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1203)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1192)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1191)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1191)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693)
at scala.Option.foreach(Option.scala:236)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:693)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1393)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1354)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
When I specify min number of partitions to 10 or above i don't getting this error. Can anyone tell me what's wrong and avoid it? I didn't get error that subprocess exited with error code so I think it's problem with Spark configuration.
stderr from worker:
15/05/03 10:41:29 INFO executor.CoarseGrainedExecutorBackend: Registered signal handlers for [TERM, HUP, INT]
15/05/03 10:41:30 INFO spark.SecurityManager: Changing view acls to: root
15/05/03 10:41:30 INFO spark.SecurityManager: Changing modify acls to: root
15/05/03 10:41:30 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root)
15/05/03 10:41:30 INFO slf4j.Slf4jLogger: Slf4jLogger started
15/05/03 10:41:30 INFO Remoting: Starting remoting
15/05/03 10:41:31 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://driverPropsFetcher#ip-172-31-36-43.us-west-2.compute.internal:46832]
15/05/03 10:41:31 INFO util.Utils: Successfully started service 'driverPropsFetcher' on port 46832.
15/05/03 10:41:31 INFO remote.RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
15/05/03 10:41:31 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.
15/05/03 10:41:31 INFO spark.SecurityManager: Changing view acls to: root
15/05/03 10:41:31 INFO spark.SecurityManager: Changing modify acls to: root
15/05/03 10:41:31 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root)
15/05/03 10:41:31 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remoting shut down.
15/05/03 10:41:31 INFO slf4j.Slf4jLogger: Slf4jLogger started
15/05/03 10:41:31 INFO Remoting: Starting remoting
15/05/03 10:41:31 INFO util.Utils: Successfully started service 'sparkExecutor' on port 37039.
15/05/03 10:41:31 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkExecutor#ip-172-31-36-43.us-west-2.compute.internal:37039]
15/05/03 10:41:31 INFO util.AkkaUtils: Connecting to MapOutputTracker: akka.tcp://sparkDriver#ip-172-31-35-111.us-west-2.compute.internal:48730/user/MapOutputTracker
15/05/03 10:41:31 INFO util.AkkaUtils: Connecting to BlockManagerMaster: akka.tcp://sparkDriver#ip-172-31-35-111.us-west-2.compute.internal:48730/user/BlockManagerMaster
15/05/03 10:41:31 INFO storage.DiskBlockManager: Created local directory at /mnt/spark/spark-cbaf9bff-4d12-4847-9135-9667ba27dccb/spark-ad82597c-4b55-46fc-9063-5d1196d6e0b0/spark-e99f55c6-5bcb-4d1b-b014-aaec94fe6cc5/blockmgr-cda1922d-ea50-4630-a834-bfb637ecdaa0
15/05/03 10:41:31 INFO storage.DiskBlockManager: Created local directory at /mnt2/spark/spark-0c6c912f-3aa1-4c54-9970-7a75d22899e8/spark-71d64ae7-36bc-49e0-958e-e7e2c1432027/spark-56d9e077-4585-4fd7-8a48-5227943d9004/blockmgr-29c5d068-f19d-4f41-85fc-11960c77a8a3
15/05/03 10:41:31 INFO storage.MemoryStore: MemoryStore started with capacity 445.4 MB
15/05/03 10:41:32 INFO util.AkkaUtils: Connecting to OutputCommitCoordinator: akka.tcp://sparkDriver#ip-172-31-35-111.us-west-2.compute.internal:48730/user/OutputCommitCoordinator
15/05/03 10:41:32 INFO executor.CoarseGrainedExecutorBackend: Connecting to driver: akka.tcp://sparkDriver#ip-172-31-35-111.us-west-2.compute.internal:48730/user/CoarseGrainedScheduler
15/05/03 10:41:32 INFO worker.WorkerWatcher: Connecting to worker akka.tcp://sparkWorker#ip-172-31-36-43.us-west-2.compute.internal:54983/user/Worker
15/05/03 10:41:32 INFO worker.WorkerWatcher: Successfully connected to akka.tcp://sparkWorker#ip-172-31-36-43.us-west-2.compute.internal:54983/user/Worker
15/05/03 10:41:32 INFO executor.CoarseGrainedExecutorBackend: Successfully registered with driver
15/05/03 10:41:32 INFO executor.Executor: Starting executor ID 6 on host ip-172-31-36-43.us-west-2.compute.internal
15/05/03 10:41:32 INFO netty.NettyBlockTransferService: Server created on 33000
15/05/03 10:41:32 INFO storage.BlockManagerMaster: Trying to register BlockManager
15/05/03 10:41:32 INFO storage.BlockManagerMaster: Registered BlockManager
15/05/03 10:41:32 INFO util.AkkaUtils: Connecting to HeartbeatReceiver: akka.tcp://sparkDriver#ip-172-31-35-111.us-west-2.compute.internal:48730/user/HeartbeatReceiver
15/05/03 10:41:32 INFO executor.CoarseGrainedExecutorBackend: Got assigned task 6
15/05/03 10:41:32 INFO executor.Executor: Running task 1.3 in stage 0.0 (TID 6)
15/05/03 10:41:32 INFO executor.Executor: Fetching http://172.31.35.111:34347/jars/proteinsApacheSpark-0.0.1.jar with timestamp 1430649374764
15/05/03 10:41:32 INFO util.Utils: Fetching http://172.31.35.111:34347/jars/proteinsApacheSpark-0.0.1.jar to /mnt/spark/spark-cbaf9bff-4d12-4847-9135-9667ba27dccb/spark-ad82597c-4b55-46fc-9063-5d1196d6e0b0/spark-08b3b4ce-960f-488f-99ea-bd66b3277207/fetchFileTemp3079113313084659984.tmp
15/05/03 10:41:32 INFO util.Utils: Copying /mnt/spark/spark-cbaf9bff-4d12-4847-9135-9667ba27dccb/spark-ad82597c-4b55-46fc-9063-5d1196d6e0b0/spark-08b3b4ce-960f-488f-99ea-bd66b3277207/9655652641430649374764_cache to /root/spark/work/app-20150503103615-0002/6/./proteinsApacheSpark-0.0.1.jar
15/05/03 10:41:32 INFO executor.Executor: Adding file:/root/spark/work/app-20150503103615-0002/6/./proteinsApacheSpark-0.0.1.jar to class loader
15/05/03 10:41:32 INFO broadcast.TorrentBroadcast: Started reading broadcast variable 1
15/05/03 10:41:32 INFO storage.MemoryStore: ensureFreeSpace(17223) called with curMem=0, maxMem=467081625
15/05/03 10:41:32 INFO storage.MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 16.8 KB, free 445.4 MB)
15/05/03 10:41:32 INFO storage.BlockManagerMaster: Updated info of block broadcast_1_piece0
15/05/03 10:41:32 INFO broadcast.TorrentBroadcast: Reading broadcast variable 1 took 274 ms
15/05/03 10:41:32 INFO storage.MemoryStore: ensureFreeSpace(22384) called with curMem=17223, maxMem=467081625
15/05/03 10:41:32 INFO storage.MemoryStore: Block broadcast_1 stored as values in memory (estimated size 21.9 KB, free 445.4 MB)
15/05/03 10:41:33 INFO spark.CacheManager: Partition rdd_0_1 not found, computing it
15/05/03 10:41:33 INFO rdd.WholeTextFileRDD: Input split: Paths:/user/root/pepnovo3/largeinputfile2/largeinputfile2_45.mgf:0+2106005,/user/root/pepnovo3/largeinputfile2/largeinputfile2_46.mgf:0+2105954,/user/root/pepnovo3/largeinputfile2/largeinputfile2_47.mgf:0+2106590,/user/root/pepnovo3/largeinputfile2/largeinputfile2_48.mgf:0+2105696,/user/root/pepnovo3/largeinputfile2/largeinputfile2_49.mgf:0+2105891,/user/root/pepnovo3/largeinputfile2/largeinputfile2_5.mgf:0+2106283,/user/root/pepnovo3/largeinputfile2/largeinputfile2_50.mgf:0+2105559,/user/root/pepnovo3/largeinputfile2/largeinputfile2_51.mgf:0+2106403,/user/root/pepnovo3/largeinputfile2/largeinputfile2_52.mgf:0+2105535,/user/root/pepnovo3/largeinputfile2/largeinputfile2_53.mgf:0+2105615,/user/root/pepnovo3/largeinputfile2/largeinputfile2_54.mgf:0+2105861,/user/root/pepnovo3/largeinputfile2/largeinputfile2_55.mgf:0+2106100,/user/root/pepnovo3/largeinputfile2/largeinputfile2_56.mgf:0+2106265,/user/root/pepnovo3/largeinputfile2/largeinputfile2_57.mgf:0+2105768,/user/root/pepnovo3/largeinputfile2/largeinputfile2_58.mgf:0+2106180,/user/root/pepnovo3/largeinputfile2/largeinputfile2_59.mgf:0+2105751,/user/root/pepnovo3/largeinputfile2/largeinputfile2_6.mgf:0+2106247,/user/root/pepnovo3/largeinputfile2/largeinputfile2_60.mgf:0+2106133,/user/root/pepnovo3/largeinputfile2/largeinputfile2_61.mgf:0+2106224,/user/root/pepnovo3/largeinputfile2/largeinputfile2_62.mgf:0+2106415,/user/root/pepnovo3/largeinputfile2/largeinputfile2_63.mgf:0+2106408,/user/root/pepnovo3/largeinputfile2/largeinputfile2_64.mgf:0+2105702,/user/root/pepnovo3/largeinputfile2/largeinputfile2_65.mgf:0+2106268,/user/root/pepnovo3/largeinputfile2/largeinputfile2_66.mgf:0+2106149,/user/root/pepnovo3/largeinputfile2/largeinputfile2_67.mgf:0+2105846,/user/root/pepnovo3/largeinputfile2/largeinputfile2_68.mgf:0+2105408,/user/root/pepnovo3/largeinputfile2/largeinputfile2_69.mgf:0+2106172,/user/root/pepnovo3/largeinputfile2/largeinputfile2_7.mgf:0+2105517,/user/root/pepnovo3/largeinputfile2/largeinputfile2_70.mgf:0+2105980,/user/root/pepnovo3/largeinputfile2/largeinputfile2_71.mgf:0+2105651,/user/root/pepnovo3/largeinputfile2/largeinputfile2_72.mgf:0+2105936,/user/root/pepnovo3/largeinputfile2/largeinputfile2_73.mgf:0+2105966,/user/root/pepnovo3/largeinputfile2/largeinputfile2_74.mgf:0+2105456,/user/root/pepnovo3/largeinputfile2/largeinputfile2_75.mgf:0+2105786,/user/root/pepnovo3/largeinputfile2/largeinputfile2_76.mgf:0+2106151,/user/root/pepnovo3/largeinputfile2/largeinputfile2_77.mgf:0+2106284,/user/root/pepnovo3/largeinputfile2/largeinputfile2_78.mgf:0+2106163,/user/root/pepnovo3/largeinputfile2/largeinputfile2_79.mgf:0+2106233,/user/root/pepnovo3/largeinputfile2/largeinputfile2_8.mgf:0+2105885,/user/root/pepnovo3/largeinputfile2/largeinputfile2_80.mgf:0+2105979,/user/root/pepnovo3/largeinputfile2/largeinputfile2_81.mgf:0+2105888,/user/root/pepnovo3/largeinputfile2/largeinputfile2_82.mgf:0+2106546,/user/root/pepnovo3/largeinputfile2/largeinputfile2_83.mgf:0+2106322,/user/root/pepnovo3/largeinputfile2/largeinputfile2_84.mgf:0+2106017,/user/root/pepnovo3/largeinputfile2/largeinputfile2_85.mgf:0+2106242,/user/root/pepnovo3/largeinputfile2/largeinputfile2_86.mgf:0+2105543,/user/root/pepnovo3/largeinputfile2/largeinputfile2_87.mgf:0+2106556,/user/root/pepnovo3/largeinputfile2/largeinputfile2_88.mgf:0+2105637,/user/root/pepnovo3/largeinputfile2/largeinputfile2_89.mgf:0+2106130,/user/root/pepnovo3/largeinputfile2/largeinputfile2_9.mgf:0+2105634,/user/root/pepnovo3/largeinputfile2/largeinputfile2_90.mgf:0+2105731,/user/root/pepnovo3/largeinputfile2/largeinputfile2_91.mgf:0+2106401,/user/root/pepnovo3/largeinputfile2/largeinputfile2_92.mgf:0+2105736,/user/root/pepnovo3/largeinputfile2/largeinputfile2_93.mgf:0+2105688,/user/root/pepnovo3/largeinputfile2/largeinputfile2_94.mgf:0+2106436,/user/root/pepnovo3/largeinputfile2/largeinputfile2_95.mgf:0+2105609,/user/root/pepnovo3/largeinputfile2/largeinputfile2_96.mgf:0+2105525,/user/root/pepnovo3/largeinputfile2/largeinputfile2_97.mgf:0+2105603,/user/root/pepnovo3/largeinputfile2/largeinputfile2_98.mgf:0+2106211,/user/root/pepnovo3/largeinputfile2/largeinputfile2_99.mgf:0+2105928
15/05/03 10:41:33 INFO broadcast.TorrentBroadcast: Started reading broadcast variable 0
15/05/03 10:41:33 INFO storage.MemoryStore: ensureFreeSpace(6906) called with curMem=39607, maxMem=467081625
15/05/03 10:41:33 INFO storage.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 6.7 KB, free 445.4 MB)
15/05/03 10:41:33 INFO storage.BlockManagerMaster: Updated info of block broadcast_0_piece0
15/05/03 10:41:33 INFO broadcast.TorrentBroadcast: Reading broadcast variable 0 took 15 ms
15/05/03 10:41:33 INFO storage.MemoryStore: ensureFreeSpace(53787) called with curMem=46513, maxMem=467081625
15/05/03 10:41:33 INFO storage.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 52.5 KB, free 445.3 MB)
15/05/03 10:41:33 WARN snappy.LoadSnappy: Snappy native library is available
15/05/03 10:41:33 INFO util.NativeCodeLoader: Loaded the native-hadoop library
15/05/03 10:41:33 INFO snappy.LoadSnappy: Snappy native library loaded
15/05/03 10:41:36 INFO storage.MemoryStore: ensureFreeSpace(252731448) called with curMem=100300, maxMem=467081625
15/05/03 10:41:36 INFO storage.MemoryStore: Block rdd_0_1 stored as values in memory (estimated size 241.0 MB, free 204.3 MB)
15/05/03 10:41:36 INFO storage.BlockManagerMaster: Updated info of block rdd_0_1
The answer is probably in the executor log, which is different from the worker log. Most likely it runs out of memory and either starts GC thrashing or dies from OOM. You could try running with more memory per executor if this is an option.
Check your system hard disk space, network and memory. spark write file in $SPARK_HOME/work. some time full hard disk space, no memory free or network issue.
if any exception you can see on your_machine:4040

Categories