Can't create a Hadoop sequence file on a local file system

Can't create a Hadoop sequence file on a local file system - java

I found this example of how to write to a local file system, but it throws this exception:
Exception in thread "main" java.io.IOException: (null) entry in command string: null chmod 0644 C:\temp\test.seq
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:770)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:866)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:849)
at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:733)
at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.<init>(RawLocalFileSystem.java:225)
at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.<init>(RawLocalFileSystem.java:209)
at org.apache.hadoop.fs.RawLocalFileSystem.createOutputStreamWithMode(RawLocalFileSystem.java:307)
at org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:296)
at org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:328)
at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.<init>(ChecksumFileSystem.java:398)
at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:461)
at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:440)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:911)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:892)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:789)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:778)
at org.apache.hadoop.io.SequenceFile$Writer.<init>(SequenceFile.java:1168)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
at java.lang.reflect.Constructor.newInstance(Unknown Source)
Running this on a Windows 10 box. I even tried using the msys git bash shell thinking maybe that would help the JVM simulate a chmod operation. Didn't change anything. Any suggestions on how to do this on Windows?

I too faced this error and it was resolved after following the steps. (Note : I am using Spark 2.0.2 and Hadoop 2.7)
Verify whether you are getting "java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.". You check it by running "spark-shell" command.
I got the above mentioned error. It occurred because I didn't add "HADOOP_HOME" in environment var. After adding the "HADOOP_HOME", in my case same as "SPARK_HOME", the issue was resolved.

Running a Hadoop program using only jars on Windows requires a few steps beyond just referencing the jars.
Credit to Professor Lu at University of Helsinki for posting a Hadoop on Windows guide for his students.
Here is a rundown of steps I had to take using Windows 10 and Hadoop 2.7.3:
Download and extract Hadoop binaries to somewhere like C:\hadoop-2.7.3.
Download patch files from https://github.com/srccodes/hadoop-common-2.2.0-bin/archive/master.zip and extract them to your %HADOOP_HOME%\bin directory.
Set a HADOOP_HOME environment variable. For example, C:\hadoop-2.7.3.
Download the Hadoop source code, copy hadoop-common-project\hadoop-common\src\main\java\org\apache\hadoop\io\nativeio\NativeIO.java to your project, and modify line 609 from
return access0(path, desiredAccess.accessRight());
to
return true;

One of the solutions is as follows.
In the Project Structure (Intelij), under SDK's ensure there is no other version of Hadoop referenced. In my case - I was running Spark earlier and it was referring Hadoop JAR's and this was causing access issues. Once I removed them and ran the MR job it ran fine.

Related

Hadoop 1.2.1: Put jars in hdfs on classpath

I have a hadoop job which requires several 3rd party jars. I have put them on the classpath with conf/hadoop-env.sh
export HADOOP_CLASSPATH=hdfs://name.node.private.ip:9000/home/ec2-user/hadoop-gremlin-libs/
When I run $ bin/hadoop classpath this path is included, as you can see here. However, when I go to run a job, it throws an error in initialization:
Error: java.lang.ClassNotFoundException: com.google.common.collect.Lists
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at org.apache.giraph.conf.AllOptions.<clinit>(AllOptions.java:37)
at org.apache.giraph.conf.ClassConfOption.<init>(ClassConfOption.java:47)
at org.apache.giraph.conf.ClassConfOption.create(ClassConfOption.java:60)
at org.apache.giraph.conf.GiraphConstants.<clinit>(GiraphConstants.java:62)
at org.apache.giraph.conf.GiraphClasses.readFromConf(GiraphClasses.java:152)
at org.apache.giraph.conf.GiraphClasses.<init (GiraphClasses.java:142)
at org.apache.giraph.conf.ImmutableClassesGiraphConfiguration.<init>(ImmutableClassesGiraphConfiguration.java:93)
at org.apache.giraph.bsp.BspOutputFormat.getOutputCommitter(BspOutputFormat.java:56)
at org.apache.hadoop.mapred.Task.initialize(Task.java:515)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:347)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
This particular class should be packaged in guava, which is included on the classpath:
[ec2-user]$ bin/hadoop dfs -ls /home/ec2-user/hadoop-gremlin-libs | grep guava
-rw-r--r-- 3 ec2-user supergroup 0 2017-04-20 17:57 /home/ec2-user/hadoop-gremlin-libs/guava-18.0.jar
I am submitting the job from gremlin as follows:
graph = GraphFactory.open('conf/hadoop.properties')
result = graph.compute().program(MyVertexProgram.build().create()).submit().get()
I have also tried putting the jars on the local filesystem and receive the same error. Does anyone know how to solve this issue?

I can't tell exactly what kind of job are you doing, but looking at those classes it appears to be a Mapreduce2 maptask it is trying to setup when you hit that exception.
I think you are updating the wrong classpath value probably. You are updating the Hadoop classpath not the mapreduce classpath.
More than likely you need to update the hadoop clusters yarn/mapreduce2 application classpath values in the cluster manager application, or their site xml files the cluster is using. You should have a mapred-site.xml file which has property named mapreduce.application.classpath that has its own classpath to point to its own jars it needs to execute its jobs, add your path to the classpath in the value of the mapreduce.application.classpath value instead.
The second goes for yarn, update the yarn.application.classpath property if yarn needs any other jars, as the yarn classpath points to yarn jars that help yarn run. You can update this easily in a cluster manager application if you have it, or edit the yarn-site.xml manually to add this classpath.
The only other option is if your client software program has its own dedicated mapred-site.xml file it reads to get the mapreduce.application.classpath from for you. If so it is possible you can just modify the mapreduce.application.classpath on the client site if your software supports it. Some client programs may have their own classpaths, or read the hadoop clusters site xml files to connect to the cluster.
I am pretty sure from what it shows in the exception you need this jar somehow in the mapreduce.application.path not the hadoop classpath.

Running a script in apache storm

Is there a way to simply run a python script in Apache Storm?
I'm trying to figure out how to use storm to run scripts but am having trouble. It seems like I need to create a Java program to call the script and use it as a bolt but I simply want to send a very basic python script to storm to see if it is possible.
I read that the following command is helpful in sending topologies to storm but am having trouble understanding the syntax and if I am allowed to send any python code to storm or if it needs to have specific syntax.
Can someone clarify whether or not I can submit any python script to storm and if so what the following line of code means.
storm shell resources/ python topology.py arg1 arg2
When I try to submit a basic python script using the above code i get the following output.
956 [main] INFO backtype.storm.StormSubmitter - Uploading topology jar stormshell8691441.jar to assigned location: /home/scix3/apache/storm/data/nimbus/inbox/stormjar-ae0739f9-7c93-4f00-a02b-c4eceba3b005.jar
966 [main] INFO backtype.storm.StormSubmitter - Successfully uploaded topology jar to assigned location: /home/scix3/apache/storm/data/nimbus/inbox/stormjar-ae0739f9-7c93-4f00-a02b-c4eceba3b005.jar
Exception in thread "main" java.io.IOException: Cannot run program "simple.py" (in directory "."): error=2, No such file or directory
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1047)
at java.lang.Runtime.exec(Runtime.java:617)
at org.apache.commons.exec.launcher.Java13CommandLauncher.exec(Java13CommandLauncher.java:58)
at org.apache.commons.exec.DefaultExecutor.launch(DefaultExecutor.java:254)
at org.apache.commons.exec.DefaultExecutor.executeInternal(DefaultExecutor.java:319)
at org.apache.commons.exec.DefaultExecutor.execute(DefaultExecutor.java:160)
at org.apache.commons.exec.DefaultExecutor.execute(DefaultExecutor.java:147)
at backtype.storm.util$exec_command_BANG_.invoke(util.clj:386)
at backtype.storm.command.shell_submission$_main.doInvoke(shell_submission.clj:29)
at clojure.lang.RestFn.applyTo(RestFn.java:139)
at backtype.storm.command.shell_submission.main(Unknown Source)
Caused by: java.io.IOException: error=2, No such file or directory
at java.lang.UNIXProcess.forkAndExec(Native Method)
at java.lang.UNIXProcess.<init>(UNIXProcess.java:186)
at java.lang.ProcessImpl.start(ProcessImpl.java:130)
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1028)
... 10 more
The exact command I'm using (possibly incorrect) is storm shell resources/ simple.py
simple.py is merely a print 'Hello, world script.
I'm using storm version 0.9.4

Yes, you can run python on Storm. In fact you can run just about code from just about any language on a storm cluster, its just a matter of implementing the API.
However, there are some requirements for that to work, and so far as I can tell said requirements are not spelled out in the storm documentation. The fastest path to get up an running would be to take the splitsentence.py example from the storm source and run with it.

try pyleus (https://github.com/Yelp/pyleus) or streamparse (https://github.com/Parsely/streamparse), i will recommend using pyleus as it is simple.

bin/hive giving issue with the errors

I have installed the hive using source and run ant package.
as per cwiki.apache.org document, I have added PATH var also i.e $HIVE_HOME and $PATH but running the command from base directory (bin/hive or hive)
It give the following error.
I have added the patch (HIVE-3606.1.patch) to resolve it but still it's not working.
Command to add patch:
hive-0.10.0-bin]$ patch -p0 < ~/Downloads/HIVE-3606.1.patch
To run Hive:
hive-0.10.0-bin]$ bin/hive
Exception in thread "main" java.lang.NoSuchFieldError: ALLOW_UNQUOTED_CONTROL_CHARS
at org.apache.hadoop.hive.ql.udf.generic.GenericUDTFJSONTuple.<clinit>(GenericUDTFJSONTuple.java:59)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:113)
at org.apache.hadoop.hive.ql.exec.FunctionRegistry.registerGenericUDTF(FunctionRegistry.java:545)
at org.apache.hadoop.hive.ql.exec.FunctionRegistry.registerGenericUDTF(FunctionRegistry.java:539)
at org.apache.hadoop.hive.ql.exec.FunctionRegistry.<clinit>(FunctionRegistry.java:472)
at org.apache.hadoop.hive.ql.session.SessionState.<init>(SessionState.java:202)
at org.apache.hadoop.hive.cli.CliSessionState.<init>(CliSessionState.java:86)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:635)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:613)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
Can anyone help here?

It's most probably because your Hadoop uses a different (older) version of Jackson libraries than Hive. As a quick workaround you can replace jackson-core-asl-X-X-X.jar and jackson-mapper-asl-X.X.X.jar in $HADOOP_HOME/lib with the newer ones in $HIVE_HOME/lib

Its because you are working with old version of Hadoop.
If you have Hadoop, its better to compile the source code yourself with the following command for old version of Hadoop:
$ svn co http://svn.apache.org/repos/asf/hive/trunk hive
$ cd hive
$ mvn clean install -Phadoop-2,dist
Check this link for more info: https://cwiki.apache.org/confluence/display/Hive/GettingStarted
Then, change the jackson* file names in $HADOOP_HOME/lib and add an .old postfix to them (Its a good practice not to delete them, as we may want them in future):
$ mv jackson-core-asl-1.0.1.jar jackson-core-asl-1.0.1.jar.old
$ mv jackson-mapper-asl-1.0.1.jar jackson-mapper-asl-1.0.1.jar.old
You can find the new jackson compiled files somewhere around Hive's packaging folder, mine is in:
packaging/target/apache-hive-0.14.0-SNAPSHOT-bin/apache-hive-0.14.0-SNAPSHOT/bin/hcatalog/share/webhcat/svr/lib
If you can't find it, its ok. use the following command in your hive directory.
$ find ./ -iname "*jackson*"
It will show you all the jackson* files that it can find. Then go to that specific folder that contains them and copy all of them to the $HADOOP_HOME/lib (currently we may just need the "jackson-core-*" but we copy all for future use):
$ cp jackson* $HADOOP_HOME/lib
Ask if you have more enquiries.

deploy java app on Heroku launch.main error

I want to deploy a java appa on HEROKU. I am new to maven. I made all set up as per in the deploy a sample app to heroku (java+maven+ tomcat7)documentation.
and i made a bulid,It was successfull. But as soon i typed sh /target/bin/weapps i am getting the following error. Can anybody help me to sort out this issue.
Exception in thread "main" java.lang.NoClassDefFoundError: launch/Main
Caused by: java.lang.ClassNotFoundException: launch.Main</br>
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
Could not find the main class: launch.Main. Program will exit.
regards,
arunbv

I had a similar problem, I (well, almost) solved it this way, just remove:
<packaging>war</packaging>
from pom.xml. I know this is not an ideal solution, but somehow the embedded tomcat plugin doesn't work with war packaging!

I don't have enough points to comment, but what article are you following? Can you link it?
Based on the error, I suspect your Main.java is not in src/main/java/launch. If you follow these instructions, I think you should be able to clear up the error:
http://devcenter.heroku.com/articles/create-a-java-web-application-using-embedded-tomcat
If that's the article you're following and you're still having trouble, it may ease the problems you're having if you simply clone the sample repo:
git clone git://github.com/heroku/devcenter-embedded-tomcat.git

Since you said you are new to maven, I think the reason of this error is that you simply put your *.java code under the root folder of the project, while you are rather expected to put it under ./src/main/java/
This should work for you. I experienced the same problem and figured this out.

This is probably dead, but in case it helps people of the future.
I was getting up and running with a java app on Heroku, using an Ubuntu machine for development - and got this same Exception in thread "main" java.lang.NoClassDefFoundError.
Double check that you're using the same java compiler version as your java sdk version.
type java -version & javac -version. Make sure they are the same version.
If not, you can fix this by reconfiguring your environment to use the same for both.
Assuming you have downloaded and installed the java version you want to use, type :
`sudo update-alternatives --config java`
You'll get a nice menu with options
Selection Path Priority Status
------------------------------------------------------------
* 0 /usr/lib/jvm/java-6-openjdk-amd64/jre/bin/java 1061 auto mode
1 /usr/lib/jvm/java-6-openjdk-amd64/jre/bin/java 1061 manual mode
2 /usr/lib/jvm/jdk1.7.0/bin/java 1
Select the version you wish to use. Done.
Now try to build.

DataNode failing to Start in Hadoop

I trying setup Hadoop install on Ubuntu 11.04 and Java 6 sun. I was working with hadoop 0.20.203 rc1 build. I am repeatedly running into an issue on Ubuntu 11.04 with java-6-sun. When I try to start the hadoop, the datanode doesn't start due to "Cannot access storage".
2011-12-22 22:09:20,874 INFO org.apache.hadoop.hdfs.server.common.Storage: Cannot lock storage /home/hadoop/work/dfs_blk/hadoop. The directory is already locked.
2011-12-22 22:09:20,896 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException: Cannot lock storage /home/hadoop/work/dfs_blk/hadoop. The directory is already locked.
at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.lock(Storage.java:602)
at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.analyzeStorage(Storage.java:455)
at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:111)
at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:354)
at org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java:268)
at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1480)
at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1419)
at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1437)
at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:1563)
at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1573)
I have tried upgrading and downgrading to couple of versions in 0.20 branch from Apache, even cloudera, also deleting and installing hadoop again. But Still I am running into this issue. Typical workarounds such as deleting *.pid files in /tmp directory is also not working. Could anybody point me to solution for this?

Yes I formatted the namenode , the problem was in of the rogue templates for hdfs-site.xml that i copy pasted , the dfs.data.dir and dfs.name.dir pointed to the same directory location resulting Locked storage error. They should be different directories. Unfortunately, the hadoop documentation is not clear enough in this subtle details.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.