never ending job in mapreduce

never ending job in mapreduce - java

I have set some MapReduce configuration in my main method as so
configuration.set("mapreduce.jobtracker.address", "localhost:54311");
configuration.set("mapreduce.framework.name", "yarn");
configuration.set("yarn.resourcemanager.address", "localhost:8032");
Now when I launch the mapreduce task, the process is tracked (I can see it in my cluster dashboard (the one listening on port 8088)), but the process never finishes. It remains blocked at the following line:
15/06/30 15:56:17 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/06/30 15:56:17 INFO client.RMProxy: Connecting to ResourceManager at localhost/127.0.0.1:8032
15/06/30 15:56:18 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
15/06/30 15:56:18 INFO input.FileInputFormat: Total input paths to process : 1
15/06/30 15:56:18 INFO mapreduce.JobSubmitter: number of splits:1
15/06/30 15:56:18 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1435241671439_0008
15/06/30 15:56:19 INFO impl.YarnClientImpl: Submitted application application_1435241671439_0008
15/06/30 15:56:19 INFO mapreduce.Job: The url to track the job: http://10.0.0.10:8088/proxy/application_1435241671439_0008/
15/06/30 15:56:19 INFO mapreduce.Job: Running job: job_1435241671439_0008
Someone has an idea?
Edit : in my yarn nodemanager log, I have this message
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Event EventType: KILL_CONTAINER sent to absent container container_1435241671439_0003_03_000001
2015-06-30 15:44:38,396 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Event EventType: KILL_CONTAINER sent to absent container container_1435241671439_0002_04_000001
Edit 2 :
I also have in the yarn manager log, some exception that happened sooner (for a precedent mapreduce call) :
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.net.BindException: Problem binding to [0.0.0.0:8040] java.net.BindException: Address already in use; For more details see:
Solution : I killed all the daemon processes and restarted again hadoop ! In fact, when I ran jps, I was still getting hadoop daemons though I had stopped them. This was a mismatch of HADOOP_PID_DIR

The default port of nodemanage of yarn is 8040. The error says that the port is already in use. Stop all the hadoop process, if you dont have data, may be format namenode once and try running the job again. From both of your edits, the issue is surely with node manager

Solution : I killed all the daemon processes and restarted again hadoop ! In fact, when I ran jps, I was still getting hadoop daemons though I had stopped them. This was related to a mismatch of HADOOP_PID_DIR

Related

Hadoop mapreduce execution stuck

I'm using HADOOP on a VM. When I try to run a jar, the execution stops because is unable to find the file resource-type.xml.
How can I solve this? Thank you.
gaia#gaia-virtual-machine:~/hadoop-3.3.2$ bin/hadoop jar erasmus-0.0.1-SNAPSHOT.jar erasmus.MaxPartecipants input output
2022-05-09 10:27:30,069 INFO client.DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at /0.0.0.0:8032
2022-05-09 10:27:30,439 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
2022-05-09 10:27:30,457 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/gaia/.staging/job_1652083747035_0002
2022-05-09 10:27:30,673 INFO input.FileInputFormat: Total input files to process : 1
2022-05-09 10:27:31,158 INFO mapreduce.JobSubmitter: number of splits:1
2022-05-09 10:27:31,268 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1652083747035_0002
2022-05-09 10:27:31,268 INFO mapreduce.JobSubmitter: Executing with tokens: []
2022-05-09 10:27:31,439 INFO conf.Configuration: resource-types.xml not found
2022-05-09 10:27:31,440 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
2022-05-09 10:27:31,516 INFO impl.YarnClientImpl: Submitted application application_1652083747035_0002
2022-05-09 10:27:31,553 INFO mapreduce.Job: The url to track the job: http://gaia-virtual-machine:8088/proxy/application_1652083747035_0002/
2022-05-09 10:27:31,554 INFO mapreduce.Job: Running job: job_1652083747035_0002
The following is the output of the jps command:
gaia#gaia-virtual-machine:~/hadoop-3.3.2$ jps
14998 SecondaryNameNode
14648 NameNode
14779 DataNode
17836 Jps
16780 ResourceManager
In the yarn web UI it says that: Total Resource Preempted: <memory:0, vCores:0>
And in the node sections it says that there are 0 active nodes

Failed to load an FSImage file! || How to solve

I am trying to show all the services using the Jps command, but when i hit the console the below nodes are only showing
3633 SecondaryNameNode
4228 Jps
3493 DataNode
4198 NodeManager
4088 ResourceManager
I am trying to start all services using start-dfs.sh and start-yarn.sh.But after that also the result is same.I went into the logs to find the exception,i saw below exception .
2018-06-29 16:02:31,414 INFO org.mortbay.log: Stopped HttpServer2$SelectChannelConnectorWithSafeStartup#0.0.0.0:50070
2018-06-29 16:02:31,414 WARN org.apache.hadoop.http.HttpServer2: HttpServer Acceptor: isRunning is false. Rechecking.
2018-06-29 16:02:31,416 WARN org.apache.hadoop.http.HttpServer2: HttpServer Acceptor: isRunning is false
2018-06-29 16:02:31,423 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping NameNode metrics system...
2018-06-29 16:02:31,425 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system stopped.
2018-06-29 16:02:31,425 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system shutdown complete.
2018-06-29 16:02:31,425 FATAL org.apache.hadoop.hdfs.server.namenode.NameNode: Failed to start namenode.
java.io.IOException: Failed to load an FSImage file!
at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:673)
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:281)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1006)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:736)
at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:531)
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:587)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:754)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:738)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1427)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1493)
2018-06-29 16:02:31,428 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1
2018-06-29 16:02:31,454 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at ubuntu/127.0.1.1
************************************************************/
I have no clue to solve this , please help.I am using hadoop-2.5.0-cdh5.3.2.

Follow these steps:
Check the path to your FSImage, i.e, where the Namenode is storing the FSImage. In my case it is /hadoop/hdfs/namenode/current
Check the last create FSImage in Namenode and Secondary Namenode. Find the latest FSImage available.
Copy the latest FSImage from Secondary Namenode to Namenode with the same permissions it had in Secondary Namenode. By default, it is hdfs:hadoop in my case
After copying, try restarting all the services.

Format the namenode: "hdfs namenode -format"
Now, ensure the clusterID= of namenode and datanode as same. If
not,replace with one another.
In my case,
/Path_installation_dir/hdata/dfs/name/current/VERSION
/Path_installation_dir/hdata/dfs/data/current/VERSION
All done. start dfs, yarn.

In my case, I had 2 namenodes running and after a server reboot data got corrupted. I was getting "Failed to load image from FSImageFile" in the logs.
In my case, namenode-0 was still healthy and namenode-1 was having the problem
I proceeded as follows:
scale down namenode to 1: leave only namenode-0
delete namenode-1 PVC
make sure the volume is not there with kubectl get pvc -n hadoop
scale namenode back to 2
namenode-0 took care of Data Corruption and made it available to namenode-1

Spark-submit fails without an error

I used the following command to run the spark java example of wordcount:-
time spark-submit --deploy-mode cluster --master spark://192.168.0.7:7077 --class org.apache.spark.examples.JavaWordCount /home/pi/Desktop/example/new/target/javaword.jar /books_500.txt
I have copied the same jar file into all nodes in the same location. (Copying into HDFS didn't work for me.) When I run it, the following is the output:-
Running Spark using the REST application submission protocol.
16/07/14 16:32:18 INFO rest.RestSubmissionClient: Submitting a request to launch an application in spark://192.168.0.7:7077.
16/07/14 16:32:30 WARN rest.RestSubmissionClient: Unable to connect to server spark://192.168.0.7:7077.
Warning: Master endpoint spark://192.168.0.7:7077 was not a REST server. Falling back to legacy submission gateway instead.
16/07/14 16:32:30 WARN util.Utils: Your hostname, master02 resolves to a loopback address: 127.0.1.1; using 192.168.0.7 instead (on interface wlan0)
16/07/14 16:32:30 WARN util.Utils: Set SPARK_LOCAL_IP if you need to bind to another address
16/07/14 16:32:31 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
It just stops there, quits the job and waits for the next command on terminal. I didn't understand this error without an error message. Help needed please...!!

CDH5.2: MR, Unable to initialize any output collector

Cloudera CDH5.2 Quickstart VM
Cloudera Manager showing all nodes state = GREEN
I've jared on Eclipse a MR job including all relevant cloudera jars in the Build Path:
avro-1.7.6-cdh5.2.0.jar,
avro-mapred-1.7.6-cdh5.2.0-hadoop2.jar,
hadoop-common-2.5.0-cdh5.2.0.jar,
hadoop-mapreduce-client-core-2.5.0-cdh5.2.0.jar
I've run the following job
hadoop jar jproject1.jar avro00.AvroUserPrefCount -libjars ${LIBJARS} avro/00/in avro/00/out
I get the following error, is it a Java heap problem, any comments ? Thank you in advance
14/11/14 01:02:40 INFO client.RMProxy: Connecting to ResourceManager at quickstart.cloudera/127.0.0.1:8032
14/11/14 01:02:43 INFO input.FileInputFormat: Total input paths to process : 1
14/11/14 01:02:43 INFO mapreduce.JobSubmitter: number of splits:1
14/11/14 01:02:44 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1415950730849_0001
14/11/14 01:02:45 INFO impl.YarnClientImpl: Submitted application application_1415950730849_0001
14/11/14 01:02:45 INFO mapreduce.Job: The url to track the job: http://quickstart.cloudera:8088/proxy/application_1415950730849_0001/
14/11/14 01:02:45 INFO mapreduce.Job: Running job: job_1415950730849_0001
14/11/14 01:03:04 INFO mapreduce.Job: Job job_1415950730849_0001 running in uber mode : false
14/11/14 01:03:04 INFO mapreduce.Job: map 0% reduce 0%
14/11/14 01:03:11 INFO mapreduce.Job: Task Id : attempt_1415950730849_0001_m_000000_0, Status : FAILED
Error: java.io.IOException: Unable to initialize any output collector
at org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:412)
at org.apache.hadoop.mapred.MapTask.access$100(MapTask.java:81)
at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:695)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:767)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
...
...

Checking the full task log of the failed attempt attempt_1415950730849_0001_m_000000_0 will help tell why you ran into the given exception.
The most common reason of observing such an error is a misconfigured value of io.sort.mb in your job. Its value must never be anywhere close to (or higher than) the configured map task heap size, and must also not currently exceed ~2000 MB (Java array maximum size).
An upstream improvement of making the error more clear on the true failure was also filed and resolved recently, via MAPREDUCE-6194.

I encountered the same issue yesterday. I checked the syslog for the particular map task which was failing, which suggested that I was getting another exception in that task which was triggering this error. In my case this was an invalid parsing, and when I corrected that issue, this error was fixed.
Closer examination of the log for the failed task should give you the root cause for the issue.

Unable to start Oryx with Hadoop

I am trying to run Oryx with Hadoop 2.4. Hadoop starts successfully with warning:
WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable.
Oryx also starts successfully. But when I ingest data into it, following exception is thrown :
2014-08-22 14:35:05,835 ERROR [IPC Server handler 3 on 37788] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: attempt_1408697508855_0002_m_000000_0 - exited : org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy()Z
2014-08-22 14:35:05,835 INFO [IPC Server handler 3 on 37788] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Diagnostics report from attempt_1408697508855_0002_m_000000_0: Error: org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy()Z
2014-08-22 14:35:05,837 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1408697508855_0002_m_000000_0: Error: org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy()Z
2014-08-22 14:35:05,840 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1408697508855_0002_m_000000_0 TaskAttempt Transitioned from RUNNING to FAIL_CONTAINER_CLEANUP
Has anyone faced such kind of issue earlier? Any kind of help will be appreciable!

I'm copying a few items from your thread on the mailing list:
This may be a problem with the installation's Snappy libraries, but that seems to have been resolved
The YARN containers are being killed for running past virtual memory limits. See the FAQ -- this may be a Java issue you can work around by changing YARN config.
Final problem seems to be another issue with YARN config, although it's not clear. I suggest starting from fresh config and/or a distribution that is preconfigured and known to work, if possible.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

never ending job in mapreduce - java

The default port of nodemanage of yarn is 8040. The error says that the port is already in use. Stop all the hadoop process, if you dont have data, may be format namenode once and try running the job again. From both of your edits, the issue is surely with node manager

Solution : I killed all the daemon processes and restarted again hadoop ! In fact, when I ran jps, I was still getting hadoop daemons though I had stopped them. This was related to a mismatch of HADOOP_PID_DIR

Related

Hadoop mapreduce execution stuck

Failed to load an FSImage file! || How to solve

Spark-submit fails without an error

CDH5.2: MR, Unable to initialize any output collector

Unable to start Oryx with Hadoop

Categories

Resources