I have a single node where i run MR jobs frequently. The system up and running fine from two days. Suddently all the hadoop process were stoped. It is annoying that my all running jobs were failed since then. I have gotten the logs saying
for secondary namenode:
java.io.ioexception: cannot lock storage /hdfs/namesecondary. the directory is already locked
for namenode :
java.io.ioexception: cannot lock storage /hdfs/name. the directory is already locked
I tried to come out of the safemode and tried formatting the namenode but this also throws the same exception.
How can i start the Hadoop process. There were no disk space issue. its a 900GB disk and 300GB is free at the time of this shutdown.
What should i verify now? Have not found any thread on this world.
Thanks
I solved it by removing the inuse_lock file in /hdfs/name/ and /hdfs/secondarynamenode. After i formatted both namenode and secondarynamenode.
Related
In my organization, we have a 2 cluster setup for apigee 4.19.01.
In one of the components i.e the management server, there is a service edge-management-server which has gone dead after the node went for patch schedule reboot.
When we tried to restart the service a error happen and I had this error logs in /tmp/snappy-1.0.5-libsnappyjava.so:
""java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.xerial.snappy.SnappyLoader.loadNativeLibrary(SnappyLoader.java:322)
at org.xerial.snappy.SnappyLoader.load(SnappyLoader.java:229)
at org.xerial.snappy.Snappy.<clinit>(Snappy.java:48)
at com.datastax.driver.core.FrameCompressor$SnappyCompressor.<init>(FrameCompressor.java:55)
at com.datastax.driver.core.FrameCompressor$SnappyCompressor.<clinit>(FrameCompressor.java:41)
at com.datastax.driver.core.ProtocolOptions$Compression.<clinit>(ProtocolOptions.java:30)
at com.datastax.driver.core.Cluster$Builder.<init>(Cluster.java:573)
at com.datastax.driver.core.Cluster.builder(Cluster.java:197)
at com.apigee.datastore.pooling.CqlDriverConnectionPool.createSession(CqlDriverConnectionPool.java:126)
at com.apigee.datastore.pooling.CqlDriverConnectionPool.<init>(CqlDriverConnectionPool.java:37)
at com.apigee.datastore.DataStoreServiceImpl.createInstance(DataStoreServiceImpl.java:182)
at com.apigee.zones.datastore.IdentityZonesDataStore.init(IdentityZonesDataStore.java:167)
at com.apigee.registration.events.BasicDispatcher.registerListener(BasicDispatcher.java:189)
at com.apigee.registration.ServerRegistrationServiceImpl.register(ServerRegistrationServiceImpl.java:644)
at com.apigee.registration.ServerRegistrationServiceImpl.register(ServerRegistrationServiceImpl.java:629)
at com.apigee.zones.datastore.IdentityZonesDataStore.init(IdentityZonesDataStore.java:91)
at com.apigee.zones.datastore.DataStoreFactory.init(DataStoreFactory.java:53)
at com.apigee.zones.service.IdentityZoneServiceImpl.start(IdentityZoneServiceImpl.java:50)
at com.apigee.kernel.service.deployment.ServiceDeployer.startService(ServiceDeployer.java:168)
at com.apigee.kernel.service.deployment.ServiceDeployer.deploy(ServiceDeployer.java:71)
at com.apigee.kernel.service.deployment.ServiceDeployer.deployDependantServices(ServiceDeployer.java:356)
at com.apigee.kernel.service.deployment.ServiceDeployer.deploy(ServiceDeployer.java:77)
at com.apigee.kernel.service.deployment.ServiceDeployer.deployDependantServices(ServiceDeployer.java:356)
at com.apigee.kernel.service.deployment.ServiceDeployer.deploy(ServiceDeployer.java:77)
at com.apigee.kernel.service.deployment.ServiceDeployer.deployDependantServices(ServiceDeployer.java:356)
at com.apigee.kernel.service.deployment.ServiceDeployer.deploy(ServiceDeployer.java:77)
at com.apigee.kernel.module.deployment.ModuleDeployer.deploy(ModuleDeployer.java:58)
at com.apigee.kernel.MicroKernel.deployAll(MicroKernel.java:190)
at com.apigee.kernel.MicroKernel.start(MicroKernel.java:151)
at com.apigee.kernel.MicroKernel.start(MicroKernel.java:146)
at com.apigee.kernel.MicroKernel.main(MicroKernel.java:95)
Caused by: java.lang.UnsatisfiedLinkError: /tmp/snappy-1.0.5-libsnappyjava.so: /tmp/snappy-1.0.5-libsnappyjava.so: failed to map segment from shared object: Operation not permitted
at java.lang.ClassLoader$NativeLibrary.load(Native Method)
at java.lang.ClassLoader.loadLibrary0(ClassLoader.java:1934)
at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1817)
at java.lang.Runtime.load0(Runtime.java:810)
at java.lang.System.load(System.java:1088)
at org.xerial.snappy.SnappyNativeLoader.load(SnappyNativeLoader.java:39)""
The /tmp partition by default has noexec permission set in fstab and I set it to exec and remount the partition.
After that only thing that changed is the invocation exception did not come but still after the creation of the snappy file in /tmp the edge-management service goes dead.
I observed that when I delete the snappy file from /tmp and restart the edge-management service, the service looks ok but once the time the snappy file (snappy-1.0.5-libsnappyjava.so)is created in the /tmp then it goes dead again.
The Java version running is 1.8.0_292
I want to know if there is a mechanism where in I could change the creation path of this snappy file so that it doesnt get created in /tmp and stops the edge management process.
Or any other resolution to this issue?
This problem currently is in the staging management server but in next month the production edge management is up for patch schedule reboot too so dont want that this problem should occur there.
Would be really helpful if anyone could help on this.
The /tmp partition by default has noexec permission set in fstab and I set it to exec and remount the partition.
After that only thing that changed is the invocation exception did not come but still after the creation of the snappy file in /tmp the edge-management service goes dead.
I observed that when I delete the snappy file from /tmp and restart the edge-management service, the service looks ok but once the time the snappy file (snappy-1.0.5-libsnappyjava.so)is created in the /tmp then it goes dead again.
I have this project to do with Hadoop and I have installed Hadoop just as described here: https://www.codeproject.com/Articles/757934/Apache-Hadoop-for-Windows-Platform I'm trying to run the same map-reduce job Recipe.java on the dataset recipeitems-latest.json
I have created a .jar file from this Recipe.java code, and I've started YARN and DFS. I have also created the directory /in and copied recipeitems-latest.json to it.
Now, I start the job by calling:
hadoop jar c:\Hwork\Recipe.jar Recipe /in /out
The job starts, says running but no progress is made as you can see here: https://i.stack.imgur.com/QSifC.png
I tried tracking the job too by clicking on given link, its status is accepted but the progress bar shows nothing.
I have started using Hadoop only 1 day back and I really don't know what is going wrong. Why is there no progress in the job I started?
The problem is resolved. Apparently EOL characters in \sbin\start-yarn must be changed (as well as in \bin\hadoop.cmd) from '\n' to '\r\n' and it worked like a charm!
I have two machines named: ubuntu1 and ubuntu2.
In ubuntu1, I started the master node in Spark Standalone Cluster and ubuntu2 I started with a worker (slave).
I am trying to execute the example workCount available on github.
When I submit the application, the worker send an error message
java.io.FileNotFoundException: File file:/home/ubuntu1/demo/test.txt does not exist.
My command line is
./spark-submit --master spark://ubuntu1-VirtualBox:7077 --deploy-mode cluster --clas br.com.wordCount.App -v --name"Word Count" /home/ubuntu1/demo/wordCount.jar /home/ubuntu1/demo/test.txt
The file test.txt has only to stay in one machine ?
Note: The master and the worker are in different machine.
Thank you
I got the same problem while loading the JSON file. I recognized by default windows storing file format as Textfile regardless of the name. identify the file format then you can load easily.
example: think you saved the file as test.JSON. but by default windows adding .txt to it.
check that and try to run again.
I hope your problem will get resolved with this idea.
Thank you.
You should put your file on hdfs by going to the folder and typing :
hdfs dfs -put <file>
Otherwise each node has to have access to it by having the same path folder existing on each machine.
Don't forget to change file:/ to hdfs:/ after you do that
My PHP server is hosted on Job Tracker machine and I am trying to run the map reduce job through my web page by calling the command line executing the jar command,
but I am getting no response and job is not starting.
However if I run a command to list the hdfs using same methodology it is running fine. Please guide me.
Following command is not responding me anything and job is not running:
exec("HADOOP_DIR/bin/hadoop jar /usr/local/MapReduce.jar Mapreduce [input Path] [output Path]");
But if I do this:
exec("HADOOP_DIR/bin/hadoop dfs -ls /user/hadoop");
It is running fine.
I solved this problem by changing the php server user to hduser (user which has permission to write files in hdfs). without changing this user only the commands which reads from the hdfs were working and not the one which needs to create the files or write on hdfs.
When i tried to run the command for creating the directory in hdfs through my php script, I got the following error in my php server logs (/var/log/apache2/error.log):
mkdir: org.apache.hadoop.security.AccessControlException: Permission denied: user=www-data, access=WRITE, inode="hduser":hduser:supergroup:rwxr-xr-x
And on running the Jar command to trigger MapRed program I got the following error:
Exception in thread "main" java.io.IOException: Permission denied
at java.io.UnixFileSystem.createFileExclusively(Native Method)
at java.io.File.createTempFile(File.java:1879)
at org.apache.hadoop.util.RunJar.main(RunJar.java:115)
Then what i did is i changed the user in /etc/apache2/apache2.conf to my hadoop user and then restarted my server and every thing was working fine now.
I should reference Execute hadoop jar from PHP Server fails. Permission denied post which helped me alot in solving this problem. I hope this post helps others too.
I trying setup Hadoop install on Ubuntu 11.04 and Java 6 sun. I was working with hadoop 0.20.203 rc1 build. I am repeatedly running into an issue on Ubuntu 11.04 with java-6-sun. When I try to start the hadoop, the datanode doesn't start due to "Cannot access storage".
2011-12-22 22:09:20,874 INFO org.apache.hadoop.hdfs.server.common.Storage: Cannot lock storage /home/hadoop/work/dfs_blk/hadoop. The directory is already locked.
2011-12-22 22:09:20,896 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException: Cannot lock storage /home/hadoop/work/dfs_blk/hadoop. The directory is already locked.
at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.lock(Storage.java:602)
at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.analyzeStorage(Storage.java:455)
at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:111)
at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:354)
at org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java:268)
at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1480)
at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1419)
at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1437)
at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:1563)
at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1573)
I have tried upgrading and downgrading to couple of versions in 0.20 branch from Apache, even cloudera, also deleting and installing hadoop again. But Still I am running into this issue. Typical workarounds such as deleting *.pid files in /tmp directory is also not working. Could anybody point me to solution for this?
Yes I formatted the namenode , the problem was in of the rogue templates for hdfs-site.xml that i copy pasted , the dfs.data.dir and dfs.name.dir pointed to the same directory location resulting Locked storage error. They should be different directories. Unfortunately, the hadoop documentation is not clear enough in this subtle details.