DataNode failing to Start in Hadoop - java

I trying setup Hadoop install on Ubuntu 11.04 and Java 6 sun. I was working with hadoop 0.20.203 rc1 build. I am repeatedly running into an issue on Ubuntu 11.04 with java-6-sun. When I try to start the hadoop, the datanode doesn't start due to "Cannot access storage".
2011-12-22 22:09:20,874 INFO org.apache.hadoop.hdfs.server.common.Storage: Cannot lock storage /home/hadoop/work/dfs_blk/hadoop. The directory is already locked.
2011-12-22 22:09:20,896 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException: Cannot lock storage /home/hadoop/work/dfs_blk/hadoop. The directory is already locked.
at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.lock(Storage.java:602)
at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.analyzeStorage(Storage.java:455)
at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:111)
at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:354)
at org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java:268)
at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1480)
at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1419)
at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1437)
at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:1563)
at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1573)
I have tried upgrading and downgrading to couple of versions in 0.20 branch from Apache, even cloudera, also deleting and installing hadoop again. But Still I am running into this issue. Typical workarounds such as deleting *.pid files in /tmp directory is also not working. Could anybody point me to solution for this?

Yes I formatted the namenode , the problem was in of the rogue templates for hdfs-site.xml that i copy pasted , the dfs.data.dir and dfs.name.dir pointed to the same directory location resulting Locked storage error. They should be different directories. Unfortunately, the hadoop documentation is not clear enough in this subtle details.

Related

Tomcat9 creates files in /tmp/systemd-private*** path instead of my given path [duplicate]

I'd like for my webapp which is deployed as a war ROOT.war to have write access to /var/www/html/static/images so that it can write uploaded and converted images to that folder so nginx can serve it statically. Currently it doesn't work and triggers a java.nio.file.FileSystemException exception together with the Filesystem is read-only message.
But the filesystem is not read-only and is in great condition. The folder has already been chmodded 777.
Extra info:
The tomcat setup is running on an Ubuntu 18.04 Azure VM with managed disk. The folder is residing on an Ext4 formatted drive
Let's start with: chmod 777 is great for testing, but absolutely unfit for the real world and you shouldn't get used to this setting. Rather set the owner/group correctly, before you give world write permissions.
Edit: A similar question just came up on the Tomcat mailing list, and Emmanuel Bourg pointed out that Debian Tomcat is sandboxed by systemd. Read your /usr/share/doc/tomcat9/README.Debian which contains this paragraph:
Tomcat is sandboxed by systemd and only has write access to the
following directories:
/var/lib/tomcat9/conf/Catalina (actually /etc/tomcat9/Catalina)
/var/lib/tomcat9/logs (actually /var/log/tomcat9)
/var/lib/tomcat9/webapps
/var/lib/tomcat9/work (actually /var/cache/tomcat9)
If write access to other directories is required the service settings
have to be overridden. This is done by creating an override.conf file
in /etc/systemd/system/tomcat9.service.d/ containing:
[Service]
ReadWritePaths=/path/to/the/directory/
The service has to be restarted afterward with:
systemctl daemon-reload
systemctl restart tomcat9
Edit 2022: Note that these are the 2019 paths - validate the file locations for later versions. From the comments to this answer (thank you to V H and Ng Sek Long) here are some updates:
In current Ubuntu file is here: sudo vi /etc/systemd/system/multi-user.target.wants/tomcat9.service – V H Feb 26, 2022 at 19:55
Mine (Ubuntu 20) is installed here /lib/systemd/system/tomcat9.service smh everybody use a different path. – Ng Sek Long Mar 28, 2022 at 8:36
End of edit, continuing with the passage that didn't solve OP's problem, but should stay in:
If - all things tested - Tomcat should have write access to that directory, but doesn't have it, the error message points me to an assumption: Could it be that
Tomcat is running as root?
The directory is mounted through NFS?
The default configuration for NFS is that root has no permissions whatsoever on that external filesystem (or was it no write-permission? this is ancient historical memory - look up "NFS root squash" to get the full story)
If this is a condition that matches what you are running, you should stop running Tomcat as root, and rather run it as an unprivileged user. Then you can set the permissions on the directory in question to be writeable by your tomcat-user, and readable by nginx, and you're done.
Running Tomcat as root is a recipe for disaster: You don't want a process that's available from the internet to run as root.
If these conditions don't meet your configuration: Elaborate on the configuration. I'd still stand by this description for others who might find this question/answer later.

Can't create a Hadoop sequence file on a local file system

I found this example of how to write to a local file system, but it throws this exception:
Exception in thread "main" java.io.IOException: (null) entry in command string: null chmod 0644 C:\temp\test.seq
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:770)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:866)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:849)
at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:733)
at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.<init>(RawLocalFileSystem.java:225)
at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.<init>(RawLocalFileSystem.java:209)
at org.apache.hadoop.fs.RawLocalFileSystem.createOutputStreamWithMode(RawLocalFileSystem.java:307)
at org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:296)
at org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:328)
at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.<init>(ChecksumFileSystem.java:398)
at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:461)
at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:440)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:911)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:892)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:789)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:778)
at org.apache.hadoop.io.SequenceFile$Writer.<init>(SequenceFile.java:1168)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
at java.lang.reflect.Constructor.newInstance(Unknown Source)
Running this on a Windows 10 box. I even tried using the msys git bash shell thinking maybe that would help the JVM simulate a chmod operation. Didn't change anything. Any suggestions on how to do this on Windows?
I too faced this error and it was resolved after following the steps. (Note : I am using Spark 2.0.2 and Hadoop 2.7)
Verify whether you are getting "java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.". You check it by running "spark-shell" command.
I got the above mentioned error. It occurred because I didn't add "HADOOP_HOME" in environment var. After adding the "HADOOP_HOME", in my case same as "SPARK_HOME", the issue was resolved.
Running a Hadoop program using only jars on Windows requires a few steps beyond just referencing the jars.
Credit to Professor Lu at University of Helsinki for posting a Hadoop on Windows guide for his students.
Here is a rundown of steps I had to take using Windows 10 and Hadoop 2.7.3:
Download and extract Hadoop binaries to somewhere like C:\hadoop-2.7.3.
Download patch files from https://github.com/srccodes/hadoop-common-2.2.0-bin/archive/master.zip and extract them to your %HADOOP_HOME%\bin directory.
Set a HADOOP_HOME environment variable. For example, C:\hadoop-2.7.3.
Download the Hadoop source code, copy hadoop-common-project\hadoop-common\src\main\java\org\apache\hadoop\io\nativeio\NativeIO.java to your project, and modify line 609 from
return access0(path, desiredAccess.accessRight());
to
return true;
One of the solutions is as follows.
In the Project Structure (Intelij), under SDK's ensure there is no other version of Hadoop referenced. In my case - I was running Spark earlier and it was referring Hadoop JAR's and this was causing access issues. Once I removed them and ran the MR job it ran fine.

Some problems with Flume

I have 2 CDH4 Clusters.
One with CentOS 6.4 (Real hardware) other Ubuntu 12.04(Amazon EC2).
All configuration file a made manually same(With Cloudera manager).
I try to start Cloudera-twitter-example. When I start flume on CentOS cluster it works without any problems. But on Ubuntu cluster Flume gives such error in log file:
2013-09-11 15:04:54,491 INFO org.apache.flume.instrumentation.MonitoredCounterGroup: Component type: SINK, name: HDFS started
2013-09-11 15:04:54,527 ERROR org.apache.flume.lifecycle.LifecycleSupervisor: Unable to start EventDrivenSourceRunner: { source:com.cloudera.flume.source.TwitterSource
{name:Twitter,state:IDLE} } - Exception follows.
java.lang.NoSuchMethodError: twitter4j.FilterQuery.setIncludeEntities(Z)Ltwitter4j/FilterQuery;
After some googling i found this solution in comments by
Suresh E GopalanAugust 20, 2013 at 2:43 AM
So we have another JAR file
search-contrib-0.9.1-cdh4.3.0-SNAPSHOT-jar-with-dependencies.jar with
the same class and conflicting with correct one in FLUME_CLASSPATH
Temporarily rename it to .org extension, so that it will be excluded
from classpath at the startup
After renaming this jar it Flume start to work on Ubuntu cluster.
On CentOS cluster i have same jar with same Classes but it don't need to be renamed.
Why it happens and What should i do change in Ubuntu cluster to have same behaviour without renaming?
rebuild the flume-source, don't download the prebuilt snapshot.jar

Service will not start: error 1067: the process terminated unexpectedly

We have a custom service that we install with our application. The only problem is that after it is installed, it will not start, generating the error above. I have tried to diagnose what the problem is, but can't seem to find any useful information as to why it is quitting. I have tried the same service on a non "R2" 2008 server, and manual it worked fine.
service simple java file running using batch file. Deamon service.
Has anyone had any experience troubleshooting this type of problem, where there are so few clues?
Goto:
Registry-> HKEY_LOCAL‌​_MACHINE-> System-> Cur‌​rentControlSet-> Servi‌​ces.
Find the concerned service & delete it. Close regedit. Reboot the PC & Re-install the concerned service. Now the error should be gone.
This is a problem related permission.
Make sure that the current user has access to the folder which contains installation files.
I resolved the problem.This is for EAServer Windows Service
Resolution is -->
Open Regedit in Run prompt
Under HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\services\EAServer
In parameters, give SERVERNAME entry as EAServer.
[It is sometime overwritten with Envirnoment variable : Path value]
This error message appears if the Windows service launcher has quit immediately after being started.
This problem usually happens because the license key has not been correctly deployed(license.txt file in the license folder).
If service is not strtign with correct key, just put incorrect key and try to start. Once started, place the correct key, it will work.
I had this error, I looked into a log file C:\...\mysql\data\VM-IIS-Server.err and found this
2016-06-07 17:56:07 160c InnoDB: Error: unable to create temporary file; errno: 2
2016-06-07 17:56:07 3392 [ERROR] Plugin 'InnoDB' init function returned error.
2016-06-07 17:56:07 3392 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed.
2016-06-07 17:56:07 3392 [ERROR] Unknown/unsupported storage engine: InnoDB
2016-06-07 17:56:07 3392 [ERROR] Aborting
The first line says "unable to create temporary file", it sounds like "insufficient privileges", first I tried to give access to mysql folder for my current user - no effect, then after some wandering around I came up to control panel->Administration->Services->Right Clicked MysqlService->Properties->Log On, switched to "This account", entered my username/password, clicked OK, and it woked!
In my case the error 1067 was caused with a specific version of Tomcat 7.0.96 32-bit in combination with AdoptOpenJDK. Spent two hours on it, un-installing, re-installing and trying different Java settings but Tomcat would not start. See...
ASF Bugzilla – Bug 63625
seems to point at the issue though they refer to seeing a different error.
I tried 7.0.99 32-bit and it started straight away with the same AdoptOpenJDK 32-bit binary install.
I solved this issue using Monitor Tomcat application. I ran it and after a few seconds its icon appeared in my system tray. I right clicked on the icon and clicked the start button and after a few seconds Apache Tomcat started.

Hudson cannot launch slave - hudson-slave.exe not being copied

I am trying to add a node to my Hudson master.
The node runs Windows Server 2008 Enterprise edition and it has Java, Ant and .NET installed on it.
The connection log of that machine shows this output and is never able to connect.
Connecting to machine01
Checking if Java exists
java full version "1.6.0_25-b06"
Copying slave.jar
Starting the service
Connecting to machine01
Checking if Java exists
java full version "1.6.0_25-b06"
Copying slave.jar
Starting the service
Connecting to machine01
The message keeps on repeating and never connects.
Upon further investigation, I see that the "Hudson Slave at <FS Root>" service is registered, but the "hudson-slave.exe" in the FS root is not there. It means that this .exe file is not copied onto the slave at all. I have checked the entire hudson.war, but no exe file exists in it - may be it is getting created? Only slave.jar is being copied.
I wonder why no error is reported and master keeps trying. Can any one suggest a solution for this?
Try this:
Convert your slave into a JNLP (Java Web Start) slave, start the web service from your slave, and then use it install the service (File > Install as Service)
Also, check to make sure the folder you have assigned as FS Root is writeable by the user you have specified.

Categories