I am trying to show all the services using the Jps command, but when i hit the console the below nodes are only showing
3633 SecondaryNameNode
4228 Jps
3493 DataNode
4198 NodeManager
4088 ResourceManager
I am trying to start all services using start-dfs.sh and start-yarn.sh.But after that also the result is same.I went into the logs to find the exception,i saw below exception .
2018-06-29 16:02:31,414 INFO org.mortbay.log: Stopped HttpServer2$SelectChannelConnectorWithSafeStartup#0.0.0.0:50070
2018-06-29 16:02:31,414 WARN org.apache.hadoop.http.HttpServer2: HttpServer Acceptor: isRunning is false. Rechecking.
2018-06-29 16:02:31,416 WARN org.apache.hadoop.http.HttpServer2: HttpServer Acceptor: isRunning is false
2018-06-29 16:02:31,423 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping NameNode metrics system...
2018-06-29 16:02:31,425 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system stopped.
2018-06-29 16:02:31,425 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system shutdown complete.
2018-06-29 16:02:31,425 FATAL org.apache.hadoop.hdfs.server.namenode.NameNode: Failed to start namenode.
java.io.IOException: Failed to load an FSImage file!
at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:673)
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:281)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1006)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:736)
at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:531)
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:587)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:754)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:738)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1427)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1493)
2018-06-29 16:02:31,428 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1
2018-06-29 16:02:31,454 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at ubuntu/127.0.1.1
************************************************************/
I have no clue to solve this , please help.I am using hadoop-2.5.0-cdh5.3.2.
Follow these steps:
Check the path to your FSImage, i.e, where the Namenode is storing the FSImage. In my case it is /hadoop/hdfs/namenode/current
Check the last create FSImage in Namenode and Secondary Namenode. Find the latest FSImage available.
Copy the latest FSImage from Secondary Namenode to Namenode with the same permissions it had in Secondary Namenode. By default, it is hdfs:hadoop in my case
After copying, try restarting all the services.
Format the namenode: "hdfs namenode -format"
Now, ensure the clusterID= of namenode and datanode as same. If
not,replace with one another.
In my case,
/Path_installation_dir/hdata/dfs/name/current/VERSION
/Path_installation_dir/hdata/dfs/data/current/VERSION
All done. start dfs, yarn.
In my case, I had 2 namenodes running and after a server reboot data got corrupted. I was getting "Failed to load image from FSImageFile" in the logs.
In my case, namenode-0 was still healthy and namenode-1 was having the problem
I proceeded as follows:
scale down namenode to 1: leave only namenode-0
delete namenode-1 PVC
make sure the volume is not there with kubectl get pvc -n hadoop
scale namenode back to 2
namenode-0 took care of Data Corruption and made it available to namenode-1
Related
I am using hadoop 1.2.1 version. due to some unknown reason, my namenode goes down and following log information was obtained
2017-07-28 15:04:47,422 INFO org.apache.hadoop.hdfs.server.common.Storage: Start loading image file /home/hpcnl/crawler/hadoop-1.2.1/tmp/dfs/name/current/fsimage
2017-07-28 15:04:47,423 ERROR org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem initialization failed.
java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:392)
at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:881)
at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:834)
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:378)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:104)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:427)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:395)
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:299)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:569)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1479)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1488)
2017-07-28 15:04:47,428 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:392)
at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:881)
at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:834)
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:378)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:104)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:427)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:395)
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:299)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:569)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1479)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1488)
Then I search on internet and found that you should stop cluster and run following command
hadoop namenode -format
After this when I restart cluster, data was not appeared in respective folders in HDFS. Can I recover my data? How to handle such situations in future if my namenode goes down?
You can always backup your metadata by using these commands:
hdfs dfsadmin -safemode enter
hdfs dfsadmin -saveNamespace
These commands will put your namenode in safemode and push the edits to the FSImage file:
hdfs dfsadmin -fetchImage /path/someFilename
or
cd /namenode/data/current/
tar -cvf /root/nn_backup_data.tar
Now you can place this data in your namenode metadata directory and restart the namenode.
Please note that you shouldn't use the command below until unless you don't have any other options:
hadoop namenode -format
I have set some MapReduce configuration in my main method as so
configuration.set("mapreduce.jobtracker.address", "localhost:54311");
configuration.set("mapreduce.framework.name", "yarn");
configuration.set("yarn.resourcemanager.address", "localhost:8032");
Now when I launch the mapreduce task, the process is tracked (I can see it in my cluster dashboard (the one listening on port 8088)), but the process never finishes. It remains blocked at the following line:
15/06/30 15:56:17 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/06/30 15:56:17 INFO client.RMProxy: Connecting to ResourceManager at localhost/127.0.0.1:8032
15/06/30 15:56:18 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
15/06/30 15:56:18 INFO input.FileInputFormat: Total input paths to process : 1
15/06/30 15:56:18 INFO mapreduce.JobSubmitter: number of splits:1
15/06/30 15:56:18 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1435241671439_0008
15/06/30 15:56:19 INFO impl.YarnClientImpl: Submitted application application_1435241671439_0008
15/06/30 15:56:19 INFO mapreduce.Job: The url to track the job: http://10.0.0.10:8088/proxy/application_1435241671439_0008/
15/06/30 15:56:19 INFO mapreduce.Job: Running job: job_1435241671439_0008
Someone has an idea?
Edit : in my yarn nodemanager log, I have this message
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Event EventType: KILL_CONTAINER sent to absent container container_1435241671439_0003_03_000001
2015-06-30 15:44:38,396 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Event EventType: KILL_CONTAINER sent to absent container container_1435241671439_0002_04_000001
Edit 2 :
I also have in the yarn manager log, some exception that happened sooner (for a precedent mapreduce call) :
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.net.BindException: Problem binding to [0.0.0.0:8040] java.net.BindException: Address already in use; For more details see:
Solution : I killed all the daemon processes and restarted again hadoop ! In fact, when I ran jps, I was still getting hadoop daemons though I had stopped them. This was a mismatch of HADOOP_PID_DIR
The default port of nodemanage of yarn is 8040. The error says that the port is already in use. Stop all the hadoop process, if you dont have data, may be format namenode once and try running the job again. From both of your edits, the issue is surely with node manager
Solution : I killed all the daemon processes and restarted again hadoop ! In fact, when I ran jps, I was still getting hadoop daemons though I had stopped them. This was related to a mismatch of HADOOP_PID_DIR
I installed a cloudera cluster with a vagrant box.
I get an error when I launch the following example:
hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar grep input output23 'dfs[a-z.]+'
I went to check the log in /var/log/hadoop-yarn.
There several log file, in yarn-yarn-nodemanager-cdh-master.log, there is the following stackstrace:
2015-06-17 11:42:42,398 INFO SecurityLogger.org.apache.hadoop.ipc.Server: Auth successful for appattempt_1434535025160_0001_000001 (auth:SIMPLE)
2015-06-17 11:42:42,597 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Start request for container_1434535025160_0001_01_
000001 by user vagrant
2015-06-17 11:42:42,762 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Creating a new application reference for app appli
cation_1434535025160_0001
2015-06-17 11:42:42,776 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Application application_1434535025160_0001 tran
sitioned from NEW to INITING
2015-06-17 11:42:42,778 INFO org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=vagrant IP=10.10.50.5 OPERATION=Start Container Request
TARGET=ContainerManageImpl RESULT=SUCCESS APPID=application_1434535025160_0001 CONTAINERID=container_1434535025160_0001_01_000001
2015-06-17 11:42:43,997 FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: Error in dispatcher thread
java.lang.IllegalArgumentException: Wrong FS: hdfs://var/log/hadoop-yarn, expected: hdfs://cdh-master:8020
at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:645)
at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:193)
at org.apache.hadoop.hdfs.DistributedFileSystem.access$000(DistributedFileSystem.java:105)
at org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1128)
at org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1124)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1124)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.verifyAndCreateRemoteLogDir(LogAggregationService.java:192)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.initApp(LogAggregationService.java:319)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:443)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:67)
at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
at java.lang.Thread.run(Thread.java:744)
2015-06-17 11:42:44,000 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Adding container_1434535025160_0001_01_000001 t
o application application_1434535025160_0001
2015-06-17 11:42:44,001 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Exiting, bbye..
2015-06-17 11:42:44,034 INFO org.mortbay.log: Stopped HttpServer2$SelectChannelConnectorWithSafeStartup#0.0.0.0:8042
2015-06-17 11:42:44,035 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Applications still running : [application_14345350
I've seen this error
java.lang.IllegalArgumentException: Wrong FS:
hdfs://var/log/hadoop-yarn, expected: hdfs://cdh-master:8020
in the following post: Failed to start Jobtracker and Tasktracker in CDH pseudo cluster, but this did not helped me much.
Does anyone has an idea?
Thx
Change the property yarn.nodemanager.remote-app-log-dir in yarn-site.xml config file either to:
<property>
<name>yarn.nodemanager.remote-app-log-dir</name>
<value>hdfs://cdh-master:8020/var/log/hadoop-yarn/apps</value>
</property>
or
<property>
<name>yarn.nodemanager.remote-app-log-dir</name>
<value>/var/log/hadoop-yarn/apps</value>
</property>
The second option will use the default filesystem that should be set to HDFS anyway.
I'm trying to create a folder using the below command
manoj#ubuntu:/usr/local/hadoop/bin$ hadoop dfs -mkdir /tmp
I am encountering, however, the following error:
mkdir: unknown host: hadoop
I have posted the log file, and would appreciate some assistance. I have installed single-node on hadoop. It looks like java unknownhostexception error. Please let me know what to do to correct this.
manoj#ubuntu:/usr/local/hadoop/logs$ cat hadoop-manoj-datanode-ubuntu.log
2014-10-05 13:08:30,621 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting DataNode
STARTUP_MSG: host = ubuntu/127.0.1.1
STARTUP_MSG: args = []
STARTUP_MSG: version = 1.2.0
STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.2 -r 1479473; compiled by 'hortonfo' on Mon May 6 06:59:37 UTC 2013
STARTUP_MSG: java = 1.7.0_65
************************************************************/
2014-10-05 13:08:32,449 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties
2014-10-05 13:08:32,514 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source MetricsSystem,sub=Stats registered.
2014-10-05 13:08:32,519 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).
2014-10-05 13:08:32,519 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: DataNode metrics system started
2014-10-05 13:08:34,173 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source ugi registered.
2014-10-05 13:08:34,191 WARN org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Source name ugi already exists!
2014-10-05 13:08:36,439 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.net.UnknownHostException: unknown host: hadoop
at org.apache.hadoop.ipc.Client$Connection.<init>(Client.java:233)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1233)
at org.apache.hadoop.ipc.Client.call(Client.java:1087)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229)
at com.sun.proxy.$Proxy5.getProtocolVersion(Unknown Source)
at org.apache.hadoop.ipc.RPC.checkVersion(RPC.java:422)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:414)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:392)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:374)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:453)
at org.apache.hadoop.ipc.RPC.waitForProxy(RPC.java:335)
at org.apache.hadoop.ipc.RPC.waitForProxy(RPC.java:300)
at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:383)
at org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java:319)
at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1698)
at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1637)
at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1655)
at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:1781)
at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1798)
2014-10-05 13:08:36,443 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down DataNode at ubuntu/127.0.1.1
************************************************************/
Looks like a configuration problems.
I'm assuming you use a recent version of Hadoop - if that is the case, you should use the hdfs command instead. So try bin/hdfs dfs -ls to see if any of your fs commands work.
I'm guessing they won't work. In that case you should check your core-site.xml for the HDFS settings (fs.defaultFS).
go to $HADOOP_HOME and try $bin/hadoop fs -mkdir /tmp
I am trying to setup a hadoop single node cluster on my local machine.
I installed hadoop using the following instructions
http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/
after starting the cluster using
bin/start-all.sh
I get the following output from jps
19623 TaskTracker
19388 SecondaryNameNode
19670 Jps
19479 JobTracker
I can see the nameNode is notrunning. I pulled out the logs from the /logs directory and look like this.
2014-01-24 11:30:20,614 INFO org.apache.hadoop.hdfs.server.common.Storage: Storage directory /app/hadoop/tmp/dfs/name does not exist.
2014-01-24 11:30:20,617 ERROR org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem initialization failed.
org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory /app/hadoop/tmp/dfs/name is in an inconsistent state: storage directory does not exist or is not accessible.
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:303)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:100)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:388)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:362)
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:276)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:496)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1279)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1288)
2014-01-24 11:30:20,619 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory /app/hadoop/tmp/dfs/name is in an inconsistent state: storage directory does not exist or is not accessible.
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:303)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:100)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:388)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:362)
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:276)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:496)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1279)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1288)
2014-01-24 11:30:20,620 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at ishan-HP-Pavilion-dv6700-Notebook-PC/127.0.1.1
It says the direcoty path /app/hadoop/tmp/dfs/name I tried creating this directory path for hadoop user but I got same error again.
Can someone please help me fix this.
Please note: I have read similar posts on here but none of them helped.
Thnks !
I would suggest you check permissions of the directory "/app/hadoop/tmp/dfs/name" given to hduser. Alternatively, you could make sure none of the components (secondary name node etc.) are up and running, and then format the namenode using this command:
$HADOOP_INSTALL/hadoop/bin/hadoop namenode -format
Try to start your cluster again and see if it works.
I think "/app/hadoop/tmp/dfs/name" should automatically get created. This folder keeps current info(cache) of namenode. Similarly there must be another folder "namesecondary" with "name" folder. If these folder are not there check "conf/core-site.xml" again. Id's for each daemons is created during each run and these id's are stored in these folders(and also some other information(i don't know exaclty)).
and
if these folders are available
just remove all content of "name" folder.
format namenode.
instead of "start-all" do start-dfs.sh and start-mapred.sh separately
Expecting this should work.
You donot have permissions to create this direcory. Try giving a path in your home directory. you should mention this path in core-site.xml for property hadoop.tmp.dir. I have mentioned it in following link
http://lets-do-something-big.blogspot.in/2014/01/hadoop-series-single-node-installation.html