Apache Storm - ClassNotFoundException on storm-starter

Apache Storm - ClassNotFoundException on storm-starter - java

I’m trying to get the storm-starter to work. I tried the mailing list, and that does not seem to be gaining traction. When I run the following:
$ mvn compile exec:java -Dstorm.topology=storm.starter.ExclamationTopology
I get an error:
Error on initialization of server mk-worker
java.lang.RuntimeException: java.lang.ClassNotFoundException: backtype.storm.testing.TestWordSpout
I’m not a java developer, so I’m not sure exactly how imports are supposed to work. I do see storm-core/src/jvm/backtype/storm/testing/TestWordSpout.java.
When I find any jar files, I see:
./target/original-storm-starter-0.11.0-SNAPSHOT.jar
./target/storm-starter-0.11.0-SNAPSHOT.jar
When I inspect those jar files, TestWordSpout is not there. I am running my commands from ./examples/storm-starter as per the documentation linked above.
To the best of my knowledge, I've followed the tutorial exactly. OSX El Capitan 10.11.2, Java 1.8.0, Storm 0.9.5., Maven 3.3.3. Any help would be great; I’d enjoy being able to get started :)

Running a Storm Topology via maven is not the way to go. You should use bin/storm jar myJarFile.jar on command line to submit a topology to a cluster (which does also work for local mode).
The files ./target/original-storm-starter-0.11.0-SNAPSHOT.jar and ./target/storm-starter-0.11.0-SNAPSHOT.jar are standard maven artifacts and cannot be used to submit a topology to a cluster.
You can use maven-jar-plugin (which I would recommend to get started -- you might need to use maven-dependency-plugin, too),maven-assembly-plugin, ormaven-shade-pluginto assembly a correctjar` file for submission. There is a bunch of SO question about this so I will not include further details here. For an example you can have a look into my git repository at https://github.com/mjsax/aeolus/blob/master/monitoring/pom.xml

Related

Java-Eclipse Dependency Errors with Spark

So I could be misunderstand how Spark works completely... but here goes.
I am working on a project in (pure) Java, using Eclipse as my IDE. I wanted to use Spark for this, as I have some sections that I need Machine Learning for.
I tried downloading Spark as provided, but there was no convient jar file I could import, so that didn't work. Trying to import single java files fails completely, since it can't find critical pieces...
So I learned Maven, and tried downloading Core using Maven, the way listed..
groupId = org.apache.spark
artifactId = spark-core_2.10
version = 1.1.0
However, my project won't compile because these have dependency issues, and other issues...
For example, if you import core (like I did), it can't find "Accumulator". This file SHOULD exist, because it's visible on the API (http://spark.apache.org/docs/latest/api/java/index.html), but not anywhere else...
No idea how to proceed, help would be appreciated.
PS: Windows. I'm sorry.

I wanted to also be able to run Spark using pure Java on Windows, so here is what you need
Download the Spark 1.3.0 binaries from http://spark.apache.org/downloads.html
choose Pre-built for Hadoop 2.4 and later
unzip and add ..\spark-1.3.0-bin-hadoop2.4\spark-assembly-1.3.0-hadoop2.4.0.jar as a external jar reference to your Java project
The fastest way to start with Spark using Java is to run the JavaWordCount example which is found on
..spark-1.3.0-bin-hadoop2.4\examples\src\main\java\org\apache\spark\examples\JavaWordCount.java
To fix error
Exception in thread "main" org.apache.spark.SparkException: A master URL must be set in your configuration
replace line SparkConf sparkConf = new SparkConf().setAppName("JavaWordCount"); with SparkConf sparkConf = new SparkConf().setAppName("JavaWordCount").setMaster("local[2]").set("spark.executor.memory","1g");
And that's it, try running using Eclipse you should get success. If you see below error:
java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.
at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:318)
just ignore, scroll the console down and you'll see your input text file line per line followed by a counter of words.
This is a fast way to get started on Spark with Windows OS without worrying to get Hadoop installed, you just need JDK 6 and Eclipse on Windows

Do you have Maven support installed in eclipse (M2Eclipse)? Did you add:
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>1.1.0</version>
</dependency>
to the element of element of your pom.xml file?
When all of this is true, you should see "Maven Dependencies" classpath container that contains various spark dependencies, along with spark-core_2.10-1.1.0.jar which indeed contains org.apache.spark.Accumulator class.

Okay, im just stupid. Anyway, when I selected something to download using the Maven Dependecy adder, I needed to go into the subtabs, and select the version of the Jar I wanted. If I didn't do that (which I didn't), then it simply... didn't download the actual jar, just everything else...

Hadoop Eclipse configuration

i am new to Hadoop,i successfully installed Hadoop 2.2.0 pseudo distributed mode and successfully executed some example programs like word count ,PI through command prompt ,now i want to practice some map reduce programs using eclipse ,so i installed Eclipse juno first but does not know how to configure eclipse for hadoop,can anyone tell steps to configure the Eclipse juno for hadoop 2.2.0.
Thanks in Advance.

It is easy to get Eclipse configured for Hadoop. Basically you need to setup the Build Path and configure Ant and Maven. There is a good write up Here Check that out and come back with any questions once you get started.
Even though the above link references Cloudera, the Eclipse configuration is the same as a manually installed Hadoop release as it relates to getting Eclipse working. You will need to follow steps 1 - 4 at least in order to get the correct Build Path, Ant configuration and Hadoop run time jars in the correct Path.
The easiest way to make sure you have configured Eclipse correctly is to create a Java project and copy/paste the wordcount java file into the project. Once saved take a look at any errors in the console. If you have everything correctly configured you will be able to compile worcount and should have wordcount.class in you projects bin dir

1.Build the project with maven(m2e plugin) and the required jar files in the classpath.
2.Export the jar for the project.
3.Use command line utility of Hadoop to execute the MapReduce Job.

Bash Script to execute Java Project

I usually compile and execute from Java project using eclipse IDE. Now, I need to write a bash script to submit this task to a cluster. My project contains 3 packages and has been written as per the Maven architecture. Can anyone hint me as to how can I include the task of executing the java program in the my bash script?
I tried different things like first trying to execute the project using command line but it does not work. I already read the replies provided in this post: Compiling and running multiple packages using the command line in Java and some others, but didn't help.
Thank you for your help.

I strongly suggest you look into something like the Maven App Assembler plugin. I don't know about your cluster but the mentioned plugin can create run scripts for most major OSes and is highly configurable.

Checkstyle & Findbugs Install

I have javac version 1.6.0_16 already installed on Windows XP and I'm using both Dr.Java and command prompt to compile and run Java programs.
I downloaded and extracted Checkstyle 5.5 and Findbugs 2.0.1. I'm trying to install Checkstyle and the instructions stated that I need to include checkstyle-5.5-all.jar in the classpath.
My question is, should I place the Checkstyle directory in the lib folder of the jdk1.6.0_16 directory and set the classpath as follows:
C:>set classpath=%C:\Program Files\Java\jdk1.6.0_16\lib\checkstyle-5.5\checkstyle-5.5-all.jar
Is this correct? Should I do the same for Findbugs? Thanks in advance
EDIT: When I added the above path using the environmental variables, and ran checkstyle hello.java, I got the error: 'checkstyle' is not recognized as an internal or external command, operable program or batch file

Maven will solve this problem for you
It sounds like you're just getting started in the world of Java. To that end, I'd suggest that you look into Maven for your build process. Also, you should be using at least JDK1.6.0_33 at the time of writing.
Essentially, Maven will manage the process of running Checkstyle, Findbugs (and you should also consider PMD) via standard plugins against your code. It will also manage the creation of the Javadocs, linked source code and generate a website for your project. Further, Maven promotes a good release process whereby you work against snapshots until ready to share your work to the wider world.
And if I don't use Maven?
Well, just create a /lib folder in your project and stuff your dependencies into it. Over time you will create more and more and these will get intertwined. After a while you will enter JAR Hell and turn to Maven to solve the problem.
We've all been there.

Error org.apache.xerces.jaxp.DocumentBuilderFactoryImpl not found

I have a problem including the jars for reading a file from hadoop. If i run the application from netbeans, it works. But if i run it from command line
it succeeds building the jar but i cannot run it and i get the following exception. When i execute the programm i put also the path of the
jars.
javax.xml.parsers.FactoryConfigurationError: Provider
org.apache.xerces.jaxp.DocumentBuilderFactoryImpl not found
If i add in the list of jars xercesImpl-2.9.1.jar I get the next following exception.
java.lang.NoClassDefFoundError: com/sun/security/auth/UnixPrincipal
Does anyone has a clue of how i can solve this?

This might be due to the fact that you're running the IBM JVM.
switch to the Sun JVM, which is the only one that Hadoop is rigorously tested on
OR
Go to the IBM web site and download their slightly-modified version of Hadoop that works with their JVM.
I think that the issue might be fixed with the latest hadoop revisions.
Jira issues: one and two.
See this, this and this

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Apache Storm - ClassNotFoundException on storm-starter - java

Related

Java-Eclipse Dependency Errors with Spark

Hadoop Eclipse configuration

Bash Script to execute Java Project

Checkstyle & Findbugs Install

Error org.apache.xerces.jaxp.DocumentBuilderFactoryImpl not found

Categories

Resources