How to recover data after namenode -format command in Hadoop - java

I am using hadoop 1.2.1 version. due to some unknown reason, my namenode goes down and following log information was obtained
2017-07-28 15:04:47,422 INFO org.apache.hadoop.hdfs.server.common.Storage: Start loading image file /home/hpcnl/crawler/hadoop-1.2.1/tmp/dfs/name/current/fsimage
2017-07-28 15:04:47,423 ERROR org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem initialization failed.
java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:392)
at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:881)
at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:834)
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:378)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:104)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:427)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:395)
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:299)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:569)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1479)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1488)
2017-07-28 15:04:47,428 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:392)
at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:881)
at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:834)
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:378)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:104)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:427)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:395)
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:299)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:569)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1479)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1488)
Then I search on internet and found that you should stop cluster and run following command
hadoop namenode -format
After this when I restart cluster, data was not appeared in respective folders in HDFS. Can I recover my data? How to handle such situations in future if my namenode goes down?

You can always backup your metadata by using these commands:
hdfs dfsadmin -safemode enter
hdfs dfsadmin -saveNamespace
These commands will put your namenode in safemode and push the edits to the FSImage file:
hdfs dfsadmin -fetchImage /path/someFilename
or
cd /namenode/data/current/
tar -cvf /root/nn_backup_data.tar
Now you can place this data in your namenode metadata directory and restart the namenode.
Please note that you shouldn't use the command below until unless you don't have any other options:
hadoop namenode -format

Related

hadoop error: No MD5 file found corresponding to image file

Got an old hadoop system (that haven't been used for years), when trying to restart the cluster (1 master, 2 slaves), all on Linux, got error, on the namenode.
Error output:
2021-03-18 20:18:28,628 FATAL org.apache.hadoop.hdfs.server.namenode.NameNode: Exception in namenode join
java.io.IOException: Failed to load image from FSImageFile(file=/home/xxx/tmp/hadoop/name/current/fsimage_0000000000000480607, cpktTxId=0000000000000480607)
at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:651)
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:264)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:627)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:469)
at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:403)
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:437)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:609)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:594)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1169)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1235)
Caused by: java.io.IOException: No MD5 file found corresponding to image file /home/xxx/tmp/hadoop/name/current/fsimage_0000000000000480607
at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:736)
at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:632)
... 9 more
2021-03-18 20:18:28,631 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1
2021-03-18 20:18:28,633 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
More info:
One of the slave's datanode's partition has bad disk blocks, so I have removed that partition from /etc/fstab so that to bring the Linux up. So, that slave's data is lost.
What I have tried:
Start the cluster, including the all 3 nodes, got above error.
Start the cluster, excluding the bad slave, thus only 2 nodes, still got above error.
Questions:
A. What the error means ?
B. Is it relevant to the bad slave?
C. Is there anyway to recover without re-format hdfs filesystem on namenode?
There should be a file called:
/home/xxx/tmp/hadoop/name/current/fsimage_0000000000000480607.md5
In the same location as the image file. It will have contents that look like this:
177e5f4ed0b7f43eb9e274903e069da4 *fsimage_0000000000000014367
Simply get the md5 sum of your fsimage file:
md5sum fsimage_0000000000000480607.md5
Then create a new md5 file that looks like:
xxxxxx *fsimage_0000000000000480607.md5
Where xxxxxx is the md5sum from the md5 command.

Failed to load an FSImage file! || How to solve

I am trying to show all the services using the Jps command, but when i hit the console the below nodes are only showing
3633 SecondaryNameNode
4228 Jps
3493 DataNode
4198 NodeManager
4088 ResourceManager
I am trying to start all services using start-dfs.sh and start-yarn.sh.But after that also the result is same.I went into the logs to find the exception,i saw below exception .
2018-06-29 16:02:31,414 INFO org.mortbay.log: Stopped HttpServer2$SelectChannelConnectorWithSafeStartup#0.0.0.0:50070
2018-06-29 16:02:31,414 WARN org.apache.hadoop.http.HttpServer2: HttpServer Acceptor: isRunning is false. Rechecking.
2018-06-29 16:02:31,416 WARN org.apache.hadoop.http.HttpServer2: HttpServer Acceptor: isRunning is false
2018-06-29 16:02:31,423 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping NameNode metrics system...
2018-06-29 16:02:31,425 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system stopped.
2018-06-29 16:02:31,425 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system shutdown complete.
2018-06-29 16:02:31,425 FATAL org.apache.hadoop.hdfs.server.namenode.NameNode: Failed to start namenode.
java.io.IOException: Failed to load an FSImage file!
at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:673)
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:281)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1006)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:736)
at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:531)
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:587)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:754)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:738)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1427)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1493)
2018-06-29 16:02:31,428 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1
2018-06-29 16:02:31,454 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at ubuntu/127.0.1.1
************************************************************/
I have no clue to solve this , please help.I am using hadoop-2.5.0-cdh5.3.2.
Follow these steps:
Check the path to your FSImage, i.e, where the Namenode is storing the FSImage. In my case it is /hadoop/hdfs/namenode/current
Check the last create FSImage in Namenode and Secondary Namenode. Find the latest FSImage available.
Copy the latest FSImage from Secondary Namenode to Namenode with the same permissions it had in Secondary Namenode. By default, it is hdfs:hadoop in my case
After copying, try restarting all the services.
Format the namenode: "hdfs namenode -format"
Now, ensure the clusterID= of namenode and datanode as same. If
not,replace with one another.
In my case,
/Path_installation_dir/hdata/dfs/name/current/VERSION
/Path_installation_dir/hdata/dfs/data/current/VERSION
All done. start dfs, yarn.
In my case, I had 2 namenodes running and after a server reboot data got corrupted. I was getting "Failed to load image from FSImageFile" in the logs.
In my case, namenode-0 was still healthy and namenode-1 was having the problem
I proceeded as follows:
scale down namenode to 1: leave only namenode-0
delete namenode-1 PVC
make sure the volume is not there with kubectl get pvc -n hadoop
scale namenode back to 2
namenode-0 took care of Data Corruption and made it available to namenode-1

Initialization failed for Block pool <registering> (Datanode Uuid unassigned)

What is the source of this error and how could it be fixed?
2015-11-29 19:40:04,670 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for Block pool <registering> (Datanode Uuid unassigned) service to anmol-vm1-new/10.0.1.190:8020. Exiting.
java.io.IOException: All specified directories are not accessible or do not exist.
at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:217)
at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:254)
at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:974)
at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:945)
at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:278)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:220)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:816)
at java.lang.Thread.run(Thread.java:745)
2015-11-29 19:40:04,670 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Ending block pool service for: Block pool <registering> (Datanode Uuid unassigned) service to anmol-vm1-new/10.0.1.190:8020
2015-11-29 19:40:04,771 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Removed Block pool <registering> (Datanode Uuid unassigned)
there are 2 Possible Solutions to resolve
First:
Your namenode and datanode cluster ID does not match, make sure to make them the same.
In name node, change ur cluster id in the file located in:
$ nano HADOOP_FILE_SYSTEM/namenode/current/VERSION
In data node you cluster id is stored in the file:
$ nano HADOOP_FILE_SYSTEM/datanode/current/VERSION
Second:
Format the namenode at all:
Hadoop 1.x: $ hadoop namenode -format
Hadoop 2.x: $ hdfs namenode -format
I met the same problem and solved it by doing the following steps:
step 1. remove the hdfs directory (for me it was the default directory "/tmp/hadoop-root/")
rm -rf /tmp/hadoop-root/*
step 2. run
bin/hdfs namenode -format
to format the directory
The root cause of this is datanode and namenode clusterID different, please unify them with namenode clusterID then restart hadoop then it should be resolved.
The issue arises because of mismatch of cluster ID's of datanode and namenode.
Follow these steps:
GO to Hadoop_home/data/namenode/CURRENT and copy cluster ID from "VERSION".
GO to Hadoop_home/data/datanode/CURRENT and paste this cluster ID in "VERSION" replacing the one present there.
then format the namenode
start datanode and namenode again.
The issue arises because of mismatch of cluster ID's of datanode and namenode.
Follow these steps:
1- GO to Hadoop_home/ delete folder Data
2- create folder with anthor name data123
3- create two folder namenode and datanode
4-go to hdfs-site and past your path
<name>dfs.namenode.name.dir</name>
<value>........../data123/namenode</value>
<name>dfs.datanode.data.dir</name>
<value>............../data123/datanode</value>
.
This problem may occur when there are some storage i/o errors. In this scenario, the VERSION file is not available hence appearing as the error above.
You may need to exclude the storage locations on those bad drives in hdfs-site.xml.
For me, this worked -
Delete (or make a backup) of HADOOP_FILE_SYSTEM/namenode/current directory
restart the datanode service
This should create the current directory again, with the correct clusterID in the VERSION file
Source - https://community.pivotal.io/s/article/Cluster-Id-is-incompatible-error-reported-when-starting-datanode-service?language=en_US

Couldn't start hadoop datanode normally

I'd started datanode successfully before, but when I tried today it shows the following info. It sounds like I have not mkdir the /home/hadoop/appdata/hadoopdata directory,but I confirmed that the directory already exists in my computer. So what's the problem? Why I couldn't start the datanode normally?
Ex:I've tried to delete /home/hadoop/appdata/ and mkdir a new one, but it still doesn't work.
I've also deleted /home/hadoop/tmp/hadoop_tmp and mkdir a new one, it still doesn't work...
2014-03-04 09:30:30,106 WARN org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Source name ugi already exists!
2014-03-04 09:30:30,349 INFO org.apache.hadoop.hdfs.server.common.Storage: Cannot access storage directory /home/hadoop/appdata/hadoopdata
2014-03-04 09:30:30,350 INFO org.apache.hadoop.hdfs.server.common.Storage: Storage directory /home/hadoop/appdata/hadoopdata does not exist
2014-03-04 09:30:30,453 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode:
java.io.IOException: All specified directories are not accessible or do not exist.
at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:139)
at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:414)
at org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java:321)
at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1712)
at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1651)
at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1669)
at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:1795)
Stop all hadoop services
Delete dfs/namenode
Delete dfs/datanode from both slaves and masters
Check the premission of the Hadoop folder:
sudo chmod –R 755 /usr/local/hadoop
Restart Hadoop
Check/verify the folder permission.
sudo chmod –R 755 /home/hadoop/appdata
If you still have the problem check the log files
Try to formate your namenode
**
use hadoop namenode -format
or
hdfs namenode -format
**
you will get clear picture what is not configure as expected.

nameNode doesnt start - hadoop-1.0.3 ubuntu 13.10 single node cluster

I am trying to setup a hadoop single node cluster on my local machine.
I installed hadoop using the following instructions
http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/
after starting the cluster using
bin/start-all.sh
I get the following output from jps
19623 TaskTracker
19388 SecondaryNameNode
19670 Jps
19479 JobTracker
I can see the nameNode is notrunning. I pulled out the logs from the /logs directory and look like this.
2014-01-24 11:30:20,614 INFO org.apache.hadoop.hdfs.server.common.Storage: Storage directory /app/hadoop/tmp/dfs/name does not exist.
2014-01-24 11:30:20,617 ERROR org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem initialization failed.
org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory /app/hadoop/tmp/dfs/name is in an inconsistent state: storage directory does not exist or is not accessible.
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:303)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:100)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:388)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:362)
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:276)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:496)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1279)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1288)
2014-01-24 11:30:20,619 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory /app/hadoop/tmp/dfs/name is in an inconsistent state: storage directory does not exist or is not accessible.
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:303)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:100)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:388)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:362)
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:276)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:496)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1279)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1288)
2014-01-24 11:30:20,620 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at ishan-HP-Pavilion-dv6700-Notebook-PC/127.0.1.1
It says the direcoty path /app/hadoop/tmp/dfs/name I tried creating this directory path for hadoop user but I got same error again.
Can someone please help me fix this.
Please note: I have read similar posts on here but none of them helped.
Thnks !
I would suggest you check permissions of the directory "/app/hadoop/tmp/dfs/name" given to hduser. Alternatively, you could make sure none of the components (secondary name node etc.) are up and running, and then format the namenode using this command:
$HADOOP_INSTALL/hadoop/bin/hadoop namenode -format
Try to start your cluster again and see if it works.
I think "/app/hadoop/tmp/dfs/name" should automatically get created. This folder keeps current info(cache) of namenode. Similarly there must be another folder "namesecondary" with "name" folder. If these folder are not there check "conf/core-site.xml" again. Id's for each daemons is created during each run and these id's are stored in these folders(and also some other information(i don't know exaclty)).
and
if these folders are available
just remove all content of "name" folder.
format namenode.
instead of "start-all" do start-dfs.sh and start-mapred.sh separately
Expecting this should work.
You donot have permissions to create this direcory. Try giving a path in your home directory. you should mention this path in core-site.xml for property hadoop.tmp.dir. I have mentioned it in following link
http://lets-do-something-big.blogspot.in/2014/01/hadoop-series-single-node-installation.html

Categories