Is there a way to simply run a python script in Apache Storm?
I'm trying to figure out how to use storm to run scripts but am having trouble. It seems like I need to create a Java program to call the script and use it as a bolt but I simply want to send a very basic python script to storm to see if it is possible.
I read that the following command is helpful in sending topologies to storm but am having trouble understanding the syntax and if I am allowed to send any python code to storm or if it needs to have specific syntax.
Can someone clarify whether or not I can submit any python script to storm and if so what the following line of code means.
storm shell resources/ python topology.py arg1 arg2
When I try to submit a basic python script using the above code i get the following output.
956 [main] INFO backtype.storm.StormSubmitter - Uploading topology jar stormshell8691441.jar to assigned location: /home/scix3/apache/storm/data/nimbus/inbox/stormjar-ae0739f9-7c93-4f00-a02b-c4eceba3b005.jar
966 [main] INFO backtype.storm.StormSubmitter - Successfully uploaded topology jar to assigned location: /home/scix3/apache/storm/data/nimbus/inbox/stormjar-ae0739f9-7c93-4f00-a02b-c4eceba3b005.jar
Exception in thread "main" java.io.IOException: Cannot run program "simple.py" (in directory "."): error=2, No such file or directory
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1047)
at java.lang.Runtime.exec(Runtime.java:617)
at org.apache.commons.exec.launcher.Java13CommandLauncher.exec(Java13CommandLauncher.java:58)
at org.apache.commons.exec.DefaultExecutor.launch(DefaultExecutor.java:254)
at org.apache.commons.exec.DefaultExecutor.executeInternal(DefaultExecutor.java:319)
at org.apache.commons.exec.DefaultExecutor.execute(DefaultExecutor.java:160)
at org.apache.commons.exec.DefaultExecutor.execute(DefaultExecutor.java:147)
at backtype.storm.util$exec_command_BANG_.invoke(util.clj:386)
at backtype.storm.command.shell_submission$_main.doInvoke(shell_submission.clj:29)
at clojure.lang.RestFn.applyTo(RestFn.java:139)
at backtype.storm.command.shell_submission.main(Unknown Source)
Caused by: java.io.IOException: error=2, No such file or directory
at java.lang.UNIXProcess.forkAndExec(Native Method)
at java.lang.UNIXProcess.<init>(UNIXProcess.java:186)
at java.lang.ProcessImpl.start(ProcessImpl.java:130)
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1028)
... 10 more
The exact command I'm using (possibly incorrect) is storm shell resources/ simple.py
simple.py is merely a print 'Hello, world script.
I'm using storm version 0.9.4
Yes, you can run python on Storm. In fact you can run just about code from just about any language on a storm cluster, its just a matter of implementing the API.
However, there are some requirements for that to work, and so far as I can tell said requirements are not spelled out in the storm documentation. The fastest path to get up an running would be to take the splitsentence.py example from the storm source and run with it.
try pyleus (https://github.com/Yelp/pyleus) or streamparse (https://github.com/Parsely/streamparse), i will recommend using pyleus as it is simple.
Related
I'm trying to run some command on a Windows agent with cmd using the groovy String.execute() in a Jenkins pipeline.
I know that there exists the bat script which I can use and I intentionally don't want to use it in this specific case since it bloats the logs and the Blue Ocean plugin has a limit of number of steps it can shows for any stage.
Basically what I have is basically some cleanup function that I call very often which does a lot of checks and runs multiple commands (at the moment with sh and bat in addition to isUnix() and similar). The result is usually a very bloated logs of steps like sh and bat which makes the logs larger and more difficult to analyze.
I decided to use some native way of running shell commands "silently" and only print its stdout and stderr in case the command fails.
I have written such a function that executes the command
# NonCPS
def exec(cmd) {
def isUnix = Jenkins.instance.getNodes().find { it.getNodeName() == env.NODE_NAME }.toComputer().isUnix()
cmd = (isUnix ? ["/bin/bash", "-c"] : ["cmd.exe", "/c"]) + [cmd]
def proc = cmd.execute()
proc.waitFor()
def result = [out: proc.in.text.trim(), err: proc.err.text.trim(), exitCode: proc.exitValue()]
return result
}
The above function works perfectly for linux agents but not on Windows. I always get the following error:
java.io.IOException: error=2, No such file or directory
at java.base/java.lang.ProcessImpl.forkAndExec(Native Method)
at java.base/java.lang.ProcessImpl.<init>(ProcessImpl.java:340)
at java.base/java.lang.ProcessImpl.start(ProcessImpl.java:271)
at java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1107)
at java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1071)
at java.base/java.lang.Runtime.exec(Runtime.java:591)
at java.base/java.lang.Runtime.exec(Runtime.java:415)
at java.base/java.lang.Runtime.exec(Runtime.java:312)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:93)
at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:325)
at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1213)
at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1022)
at org.codehaus.groovy.runtime.callsite.PojoMetaClassSite.call(PojoMetaClassSite.java:47)
at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:48)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:113)
at com.cloudbees.groovy.cps.sandbox.DefaultInvoker.methodCall(DefaultInvoker.java:20)
Caused: java.io.IOException: Cannot run program "cmd.exe": error=2, No such file or directory
at java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1128)
at java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1071)
at java.base/java.lang.Runtime.exec(Runtime.java:591)
at java.base/java.lang.Runtime.exec(Runtime.java:415)
at java.base/java.lang.Runtime.exec(Runtime.java:312)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
Or mainly the following line Cannot run program "cmd.exe": error=2, No such file or directory.
I tried replacing "cmd.exe" with the full path "C:\Windows\System32\cmd.exe" or "C:\Windows\SysWOW64\cmd.exe", replace cmd.exe with ["start", "cmd.exe", "/C"], using Runtime.getRuntime().exec() and many others and I cannot make it work.
If I do "git version".execute(), that works fine but I cannot run some more complex commands like "echo hi && git version".execute() which will print "hi && git version" instead of "hi" and the real git version. I thought maybe the PATH environment variable is not set properly so I ran the following command "set".execute() which resulted in the same error above ("set" no such file or directory).
I have been trying various combinations of this for the past 2 days and that is not going anywhere.
Any help is much appreciated. Thanks in advance.
Update:
Found the reason why it is failing on Windows but not on Linux nodes with Jenkins and hit another "harder" wall here.
The problem is that String.exucute() method behaves internally like Runtime.getRuntime().exec() and this uses the runtime environment from the master node even if encapsulated within the node() directive. This basically means that when it was running fine on linux, it was actually running all the time on the master node which is also linux. The reason why it cannot run cmd or start or any similar is because those commands (obviously) do not exist on linux.
I found this example of how to write to a local file system, but it throws this exception:
Exception in thread "main" java.io.IOException: (null) entry in command string: null chmod 0644 C:\temp\test.seq
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:770)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:866)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:849)
at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:733)
at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.<init>(RawLocalFileSystem.java:225)
at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.<init>(RawLocalFileSystem.java:209)
at org.apache.hadoop.fs.RawLocalFileSystem.createOutputStreamWithMode(RawLocalFileSystem.java:307)
at org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:296)
at org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:328)
at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.<init>(ChecksumFileSystem.java:398)
at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:461)
at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:440)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:911)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:892)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:789)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:778)
at org.apache.hadoop.io.SequenceFile$Writer.<init>(SequenceFile.java:1168)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
at java.lang.reflect.Constructor.newInstance(Unknown Source)
Running this on a Windows 10 box. I even tried using the msys git bash shell thinking maybe that would help the JVM simulate a chmod operation. Didn't change anything. Any suggestions on how to do this on Windows?
I too faced this error and it was resolved after following the steps. (Note : I am using Spark 2.0.2 and Hadoop 2.7)
Verify whether you are getting "java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.". You check it by running "spark-shell" command.
I got the above mentioned error. It occurred because I didn't add "HADOOP_HOME" in environment var. After adding the "HADOOP_HOME", in my case same as "SPARK_HOME", the issue was resolved.
Running a Hadoop program using only jars on Windows requires a few steps beyond just referencing the jars.
Credit to Professor Lu at University of Helsinki for posting a Hadoop on Windows guide for his students.
Here is a rundown of steps I had to take using Windows 10 and Hadoop 2.7.3:
Download and extract Hadoop binaries to somewhere like C:\hadoop-2.7.3.
Download patch files from https://github.com/srccodes/hadoop-common-2.2.0-bin/archive/master.zip and extract them to your %HADOOP_HOME%\bin directory.
Set a HADOOP_HOME environment variable. For example, C:\hadoop-2.7.3.
Download the Hadoop source code, copy hadoop-common-project\hadoop-common\src\main\java\org\apache\hadoop\io\nativeio\NativeIO.java to your project, and modify line 609 from
return access0(path, desiredAccess.accessRight());
to
return true;
One of the solutions is as follows.
In the Project Structure (Intelij), under SDK's ensure there is no other version of Hadoop referenced. In my case - I was running Spark earlier and it was referring Hadoop JAR's and this was causing access issues. Once I removed them and ran the MR job it ran fine.
I am trying to copy the file from HDFS to Local linux file system using Hadoop FileSystem class.
I have access to create folder in the path where i am trying to copy, i checked using mkdir command.
Also i tried using shell command hadoop fs -copyToLocal hdfsFilePath localFilepath it was working.
I am running this on YARN Cluster.
I tried below approaches, but i am getting the java.io.IOException: Mkdirs failed to create file:/home/user error.
Error log:
16/01/14 01:09:36 ERROR util.FileUtil:
java.io.IOException: Mkdirs failed to create /home/user (exists=false, cwd=file:/hdfs4/yarn/nm/usercache/user/appcache/application_1452126203792_8862/container_e2457_1452126203792_8862_01_000001)
at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:442)
at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:428)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:908)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:889)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:786)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:365)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:338)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:289)
at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1970)
at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1939)
at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1915)
at com.batch.util.FileUtil.copyToLocalFileSystem(FileUtil.java:66)
at com.batch.dao.impl.DaoImpl.writeFile(DaoImpl.java:108)
at com.batch.JobDriver.runJob(JobDriver.java:79)
at com.batch.JobDriver.main(JobDriver.java:54)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:480)
Actually i am passing localFilePath as /home/user/test, but i am getting the error like failed to create file:/home/user
fs.copyToLocalFile(hdfsFilePath, localFilePath);
fs.copyToLocalFile(false, hdfsFilePath, localFilePath, true);
This week i faced the same thing, problem was that i was deploying the job in cluster mode, therefore the machine where the job was going to run did not have that directory created. Is it possible you are deploying the job in cluster mode? If so, try deploying it in client mode (the output directory has to exist though)
For anyone looking for this exact error, but maybe not from YARN:
I had this exact error when trying to run org.apache.hadoop.fs.FileSystem.copyToLocalFile on my local (Mac) machine, with local FS configured using the job.local.dir attribute.
This was the exception:
java.io.IOException: Mkdirs failed to create file:/User/yossiv/algo-resources/AWS/QuerySearchEngine.blacklistVersionFile (exists=false, cwd=file:/Users/yossiv/git/c2s-algo)
at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:456)
at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:441)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:928)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:909)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:806)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:368)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:341)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:292)
at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:2066)
at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:2035)
at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:2011)
What fixed it was to change job.local.dir to be under the current directory , which is listed in the exception text after cwd=, In my case that's /Users/yossiv/git/c2s-algo.
Broke my head two days over this, hope this helps someone.
I am setting up an Apache Storm system but am having problems getting the program to run consistently. I have set up storm on three servers but it only works consistently on one. I think the issue lies somewhere in the path of the command.
I have been using storm-starter to set up the program and have tested it locally with RollingTopWords. When I run the following command $ storm jar storm-starter-*.jar storm.starter.RollingTopWords the computer stalls a second then i get the following error:
Could not find or load main class storm.starter.RollingTopWords
The jar is stored in the directory /apache/storm/examples/storm-starter/target . Let me know if there is any other information I can provide that would be of help because I'm feeling a little desperate at this point.
The following is the entire output for the program that doesn't work.
Running: /usr/lib/jvm/java-1.7.0-openjdk-amd64/bin/java -client -Dstorm.options= -Dstorm.home=/home/scix3/apache/storm -Dstorm.log.dir=/home/scix3/apache/storm/logs -Djava.library.path=/usr/local/lib:/opt/local/lib:/usr/lib -Dstorm.conf.file= -cp /home/scix3/apache/storm/lib/kryo-2.21.jar:/home/scix3/apache/storm/lib/core.incubator-0.1.0.jar:/home/scix3/apache/storm/lib/commons-fileupload-1.2.1.jar:/home/scix3/apache/storm/lib/ring-servlet-0.3.11.jar:/home/scix3/apache/storm/lib/clj-stacktrace-0.2.2.jar:/home/scix3/apache/storm/lib/jline-2.11.jar:/home/scix3/apache/storm/lib/servlet-api-2.5.jar:/home/scix3/apache/storm/lib/disruptor-2.10.1.jar:/home/scix3/apache/storm/lib/log4j-over-slf4j-1.6.6.jar:/home/scix3/apache/storm/lib/clojure-1.5.1.jar:/home/scix3/apache/storm/lib/commons-exec-1.1.jar:/home/scix3/apache/storm/lib/logback-core-1.0.13.jar:/home/scix3/apache/storm/lib/jetty-util-6.1.26.jar:/home/scix3/apache/storm/lib/slf4j-api-1.7.5.jar:/home/scix3/apache/storm/lib/carbonite-1.4.0.jar:/home/scix3/apache/storm/lib/compojure-1.1.3.jar:/home/scix3/apache/storm/lib/minlog-1.2.jar:/home/scix3/apache/storm/lib/commons-lang-2.5.jar:/home/scix3/apache/storm/lib/tools.macro-0.1.0.jar:/home/scix3/apache/storm/lib/reflectasm-1.07-shaded.jar:/home/scix3/apache/storm/lib/tools.cli-0.2.4.jar:/home/scix3/apache/storm/lib/math.numeric-tower-0.0.1.jar:/home/scix3/apache/storm/lib/logback-classic-1.0.13.jar:/home/scix3/apache/storm/lib/tools.logging-0.2.3.jar:/home/scix3/apache/storm/lib/asm-4.0.jar:/home/scix3/apache/storm/lib/jetty-6.1.26.jar:/home/scix3/apache/storm/lib/snakeyaml-1.11.jar:/home/scix3/apache/storm/lib/hiccup-0.3.6.jar:/home/scix3/apache/storm/lib/clj-time-0.4.1.jar:/home/scix3/apache/storm/lib/jgrapht-core-0.9.0.jar:/home/scix3/apache/storm/lib/clout-1.0.1.jar:/home/scix3/apache/storm/lib/chill-java-0.3.5.jar:/home/scix3/apache/storm/lib/commons-io-2.4.jar:/home/scix3/apache/storm/lib/joda-time-2.0.jar:/home/scix3/apache/storm/lib/storm-core-0.9.4.jar:/home/scix3/apache/storm/lib/objenesis-1.2.jar:/home/scix3/apache/storm/lib/commons-logging-1.1.3.jar:/home/scix3/apache/storm/lib/ring-core-1.1.5.jar:/home/scix3/apache/storm/lib/ring-jetty-adapter-0.3.11.jar:/home/scix3/apache/storm/lib/commons-codec-1.6.jar:/home/scix3/apache/storm/lib/json-simple-1.1.jar:/home/scix3/apache/storm/lib/ring-devel-0.3.11.jar:storm-starter-.jar:/home/scix3/apache/storm/conf:/home/scix3/apache/storm/bin -Dstorm.jar=storm-starter-.jar storm.starter.RollingTopWords
Error: Could not find or load main class storm.starter.RollingTopWords
The main issue for the error
Could not find or load main class storm.starter.RollingTopWords cloud be.
Check the launch configuration while building the jar.
you must be very careful while building the jar ,it asks you to choose destination folder and launch configuration(launch configuration should be of same project)
You might have missed the main class in your project.
Before using Stormsubmitter in Remote cluster, check once weather it works properly localcluster
To check if the problem is with storm unable to find the jar, you can try issuing
storm jar /fullpath/my-storm-jar.jar Classname
Few other things you can make sure
The jar is compiled properly/jar contains the RollingTopWords class
storm.yaml points to the correct nimubs (This seems less probable, as the the connection is being made and there is an attempt to load the topology)
My PHP server is hosted on Job Tracker machine and I am trying to run the map reduce job through my web page by calling the command line executing the jar command,
but I am getting no response and job is not starting.
However if I run a command to list the hdfs using same methodology it is running fine. Please guide me.
Following command is not responding me anything and job is not running:
exec("HADOOP_DIR/bin/hadoop jar /usr/local/MapReduce.jar Mapreduce [input Path] [output Path]");
But if I do this:
exec("HADOOP_DIR/bin/hadoop dfs -ls /user/hadoop");
It is running fine.
I solved this problem by changing the php server user to hduser (user which has permission to write files in hdfs). without changing this user only the commands which reads from the hdfs were working and not the one which needs to create the files or write on hdfs.
When i tried to run the command for creating the directory in hdfs through my php script, I got the following error in my php server logs (/var/log/apache2/error.log):
mkdir: org.apache.hadoop.security.AccessControlException: Permission denied: user=www-data, access=WRITE, inode="hduser":hduser:supergroup:rwxr-xr-x
And on running the Jar command to trigger MapRed program I got the following error:
Exception in thread "main" java.io.IOException: Permission denied
at java.io.UnixFileSystem.createFileExclusively(Native Method)
at java.io.File.createTempFile(File.java:1879)
at org.apache.hadoop.util.RunJar.main(RunJar.java:115)
Then what i did is i changed the user in /etc/apache2/apache2.conf to my hadoop user and then restarted my server and every thing was working fine now.
I should reference Execute hadoop jar from PHP Server fails. Permission denied post which helped me alot in solving this problem. I hope this post helps others too.