Java-Eclipse Dependency Errors with Spark - java

So I could be misunderstand how Spark works completely... but here goes.
I am working on a project in (pure) Java, using Eclipse as my IDE. I wanted to use Spark for this, as I have some sections that I need Machine Learning for.
I tried downloading Spark as provided, but there was no convient jar file I could import, so that didn't work. Trying to import single java files fails completely, since it can't find critical pieces...
So I learned Maven, and tried downloading Core using Maven, the way listed..
groupId = org.apache.spark
artifactId = spark-core_2.10
version = 1.1.0
However, my project won't compile because these have dependency issues, and other issues...
For example, if you import core (like I did), it can't find "Accumulator". This file SHOULD exist, because it's visible on the API (http://spark.apache.org/docs/latest/api/java/index.html), but not anywhere else...
No idea how to proceed, help would be appreciated.
PS: Windows. I'm sorry.

I wanted to also be able to run Spark using pure Java on Windows, so here is what you need
Download the Spark 1.3.0 binaries from http://spark.apache.org/downloads.html
choose Pre-built for Hadoop 2.4 and later
unzip and add ..\spark-1.3.0-bin-hadoop2.4\spark-assembly-1.3.0-hadoop2.4.0.jar as a external jar reference to your Java project
The fastest way to start with Spark using Java is to run the JavaWordCount example which is found on
..spark-1.3.0-bin-hadoop2.4\examples\src\main\java\org\apache\spark\examples\JavaWordCount.java
To fix error
Exception in thread "main" org.apache.spark.SparkException: A master URL must be set in your configuration
replace line SparkConf sparkConf = new SparkConf().setAppName("JavaWordCount"); with SparkConf sparkConf = new SparkConf().setAppName("JavaWordCount").setMaster("local[2]").set("spark.executor.memory","1g");
And that's it, try running using Eclipse you should get success. If you see below error:
java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.
at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:318)
just ignore, scroll the console down and you'll see your input text file line per line followed by a counter of words.
This is a fast way to get started on Spark with Windows OS without worrying to get Hadoop installed, you just need JDK 6 and Eclipse on Windows

Do you have Maven support installed in eclipse (M2Eclipse)? Did you add:
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>1.1.0</version>
</dependency>
to the element of element of your pom.xml file?
When all of this is true, you should see "Maven Dependencies" classpath container that contains various spark dependencies, along with spark-core_2.10-1.1.0.jar which indeed contains org.apache.spark.Accumulator class.

Okay, im just stupid. Anyway, when I selected something to download using the Maven Dependecy adder, I needed to go into the subtabs, and select the version of the Jar I wanted. If I didn't do that (which I didn't), then it simply... didn't download the actual jar, just everything else...

Related

Exception in thread "main" java.lang.UnsatisfiedLinkError: org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Ljava/lang/String;I)Z

trying to run MR program version(2.7) in windows 7 64 bit in eclipse while running the above exception occurring .
I verified that using 64 bit 1.8 java version and observed that all the hadoop daemons are running.
Any suggestions highly appreciated
In addition to other solutions, Please download winutil.exe and hadoop.dll and add to $HADOOP_HOME/bin. It works for me.
https://github.com/steveloughran/winutils/tree/master/hadoop-2.7.1/bin
Note: I'm using hadoop-2.7.3 version
After putting haddop.dll and winutils in hadoop/bin folder and adding the folder of hadoop to PATH, we also need to put hadoop.dll into the C:\Windows\System32 folder
This issue occurred to me and the cause was I forgot to append %HADOOP_HOME%/bin to PATH in my environment variables.
In my case I was having this issue when running unit tests on local machine after upgrading dependencies to CDH6. I already had HADOOP_HOME and PATH variables configured properly but I had to copy the hadoop.ddl to C:\Windows\System32 as suggested in the other answer.
After trying all the above, things worked after putting hadoop.dll to windows/System32
For me this issue was resolved by downloading the winutils.exe & hadoop.dll from https://github.com/steveloughran/winutils/tree/master/hadoop-2.7.1/bin and putting those in hadoop/bin folder
Adding hadoop.dll and WinUntils.exe fixed the error, support for latest versions can be found here
I already had %HADOOP_HOME%/bin in my PATH and my code had previously run without errors. Restarting my machine made it work again.
The version mismacth is main cause for this issue. Follow complete hadoop version with java library will solve the issue and if you still face issue and working on hadoop 3.1.x version use this library to download bin
https://github.com/s911415/apache-hadoop-3.1.0-winutils/tree/master/bin
I already had %HADOOP_HOME%/bin in my PATH. Adding hadoop.dll in Hadoop/bin directory made it work again.
In Intellij under Run/Debug Configurations, open the application you are trying to run, Under configurations tab, specify the exact working Directory.having the variable to represent the working directory also creates this problem. When I changed the Working Directory under configurations, it started working again.
Yes this issues arose when I was using the PIGUNITS for automation of PIGSCRIPTS. Two things in sequence need to be done:
Copy both the files as mentioned about in a location and add it to the environment variables under PATH.
To reflect the change what you have just done, you have to restart your machine to load the file.
Under JUNIT I was getting this error which would help others as well:
org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias XXXXX. Backend error : java.lang.IllegalStateException: Job in state DEFINE instead of RUNNING
at org.apache.pig.PigServer.openIterator(PigServer.java:925)
This is what worked for me: Download the latest winutils https://github.com/kontext-tech/winutils Or check your spark Release text, it shows the cer of Hadoop it is using
Steps
Dowload repo
Create a folder named hadoop anywhere (e.g. desktop/hadoop)
Paste the bin into that folder (you will then have hadoop/bin)
copy hadoop.dll to windows/system32
Set system environment:
set HADOOP_HOME=c:/desktop/hadoop
set PATH=%PATH%;%HADOOP_HOME%/bin;
For me, I have to download the winutils from https://github.com/kontext-tech/winutils as it has the latest version 3.3.
It is important to make sure the version matches to the spark version you downloaded otherwise, you can find some weird error messages.
Both hadoop.dll and winutils.exe are fine to be at the same folder C:/hadoop/bin. I didn't copy either to system folder and it works.
Note: I followed this except the download pages of winutils tool.
After downloading and configuring hadoop.dll and wintuils.exe as previous answer, you need to "restart the windows" to make it works.
In my case (pyspark = 3.3.1, Spark version = 3.3.1, Hadoop version = 3.3.2)
I set env vars by python code
os.environ['PYSPARK_PYTHON'] = sys.executable
os.environ['HADOOP_HOME'] = "C:\\Program Files\\Hadoop\\"
added from https://github.com/kontext-tech/winutils to bin folder last version of hadoop files hadoop-3.4.0-win10-x64 and added hadoop.dll to C:\Windows\System32
I'm answering this because I've had the same issue.
I'm using MinIO as a object storage and spark as a processing engine, version 3.1.2 and hadoop 3.2 (spark-3.1.2-bin-hadoop3.2).
And I solved this by just downloading the hadoop.dll file from a Github page = https://github.com/cdarlint/winutils
and saving it to the bin folder within my spark folder. Then I just hit the spark-submit on the vscode terminal and it went smooth.
enter image description here
I hope it can help anyone here!
This might be old but if its still not working for someone, Step 1- Double click on winutils.exe. If it shows some dll file is missing, download that .dll file and place that at appropriate place.
In my case, msvcr100.dll wasmissing and I had to install Microsoft Visual C++ 2010 Service Pack 1 Redistributable Package to make it work.
All the best
I had everything configured correctly, but while using pyspark.SparkContext.wholeTextFiles I specified the path as "/directory/" instead of "/directory/*" which gave me this issue.

Apache Storm - ClassNotFoundException on storm-starter

I’m trying to get the storm-starter to work. I tried the mailing list, and that does not seem to be gaining traction. When I run the following:
$ mvn compile exec:java -Dstorm.topology=storm.starter.ExclamationTopology
I get an error:
Error on initialization of server mk-worker
java.lang.RuntimeException: java.lang.ClassNotFoundException: backtype.storm.testing.TestWordSpout
I’m not a java developer, so I’m not sure exactly how imports are supposed to work. I do see storm-core/src/jvm/backtype/storm/testing/TestWordSpout.java.
When I find any jar files, I see:
./target/original-storm-starter-0.11.0-SNAPSHOT.jar
./target/storm-starter-0.11.0-SNAPSHOT.jar
When I inspect those jar files, TestWordSpout is not there. I am running my commands from ./examples/storm-starter as per the documentation linked above.
To the best of my knowledge, I've followed the tutorial exactly. OSX El Capitan 10.11.2, Java 1.8.0, Storm 0.9.5., Maven 3.3.3. Any help would be great; I’d enjoy being able to get started :)
Running a Storm Topology via maven is not the way to go. You should use bin/storm jar myJarFile.jar on command line to submit a topology to a cluster (which does also work for local mode).
The files ./target/original-storm-starter-0.11.0-SNAPSHOT.jar and ./target/storm-starter-0.11.0-SNAPSHOT.jar are standard maven artifacts and cannot be used to submit a topology to a cluster.
You can use maven-jar-plugin (which I would recommend to get started -- you might need to use maven-dependency-plugin, too),maven-assembly-plugin, ormaven-shade-pluginto assembly a correctjar` file for submission. There is a bunch of SO question about this so I will not include further details here. For an example you can have a look into my git repository at https://github.com/mjsax/aeolus/blob/master/monitoring/pom.xml

Hadoop Eclipse configuration

i am new to Hadoop,i successfully installed Hadoop 2.2.0 pseudo distributed mode and successfully executed some example programs like word count ,PI through command prompt ,now i want to practice some map reduce programs using eclipse ,so i installed Eclipse juno first but does not know how to configure eclipse for hadoop,can anyone tell steps to configure the Eclipse juno for hadoop 2.2.0.
Thanks in Advance.
It is easy to get Eclipse configured for Hadoop. Basically you need to setup the Build Path and configure Ant and Maven. There is a good write up Here Check that out and come back with any questions once you get started.
Even though the above link references Cloudera, the Eclipse configuration is the same as a manually installed Hadoop release as it relates to getting Eclipse working. You will need to follow steps 1 - 4 at least in order to get the correct Build Path, Ant configuration and Hadoop run time jars in the correct Path.
The easiest way to make sure you have configured Eclipse correctly is to create a Java project and copy/paste the wordcount java file into the project. Once saved take a look at any errors in the console. If you have everything correctly configured you will be able to compile worcount and should have wordcount.class in you projects bin dir
1.Build the project with maven(m2e plugin) and the required jar files in the classpath.
2.Export the jar for the project.
3.Use command line utility of Hadoop to execute the MapReduce Job.

How to create a RPM package with Redline RPM Java library?

I would like to create a RPM package for my Java game (currently packaged as JARs + JNLP file). I use Ant as a build tool. I cannot use platform-dependent tools as the few developers who use my source code use several operating systems, not necessarily GNU Linux unlike me.
At first, I tried to use JDIC but its source code hasn't been maintained for years and I had to modify tons of things just to make it compile anew. Moreover, it just calls the native RPM tools under the hood. Then, I found RPM Ant task but it uses the native RPM tools under the hood too. After that, I found RPM Maven plugin but I don't want to switch to another build tool now just to create a RPM package.
Finally, I found Redline RPM pure Java library which has an Ant task, there is an example here. I still don't understand how to use it. I understand the role of a few basic fields (group, version, release, name), I know that I have to use "depends" to indicate that my game requires at least Java 1.7 but I don't know what to do with my JARs, where to put the .desktop file for the desktop shortcut and where to put the bash script that calls the main class to run my game. As a first step, I'd like to create a binary package. I have found another example using this library here. Do I have to provide an uninstall script too? Should I use a postinstall script to copy the .desktop file into the desktop directory? Should I use a tarfileset for the third party libraries? I know it would be better to put the JARs into several RPMs but I want to succeed in doing something simple before doing more elaborated but cleaner things.
I wrote a simple tutorial on how to use redline here
Basically everything you have to do build an empty rpm is that :
org.redline_rpm.Builder builder = new Builder();
File directory = new File(".");
builder.setType(RpmType.BINARY);
builder.setPlatform(Architecture.X86_64, Os.LINUX);
builder.setPackage("name", "1", "1");
builder.setDescription("Description");
builder.setSummary("Summary");
builder.build(directory);
You can add dependency on certain commands : example
builder.addDependencyMore("tar", "0");
builder.addDependencyMore("python", "0");
builder.addDependencyMore("wget", "0");
Then you can add some pre-install script or post-install script and files too.
builder.setPostInstallScript(xxx);
File tarball = new File("/the/dir/of/your/file/file.tar.gz");
builder.addFile("/where/to/put/it/file.tar.gz", tarball);
Redline maven dependency
<dependency>
<groupId>org.redline-rpm</groupId>
<artifactId>redline</artifactId>
<version>1.2.1</version>
</dependency>
I solved my problem here. I just have to add a short script to run the application.
P.S: By the way, I now use my own tool (which uses Redline RPM under the hood), it's fully documented, free software (under GPL) and works for DEB, APP and EXE (via NSIS) too, it's called Java Native Deployment Toolkit.

Checkstyle & Findbugs Install

I have javac version 1.6.0_16 already installed on Windows XP and I'm using both Dr.Java and command prompt to compile and run Java programs.
I downloaded and extracted Checkstyle 5.5 and Findbugs 2.0.1. I'm trying to install Checkstyle and the instructions stated that I need to include checkstyle-5.5-all.jar in the classpath.
My question is, should I place the Checkstyle directory in the lib folder of the jdk1.6.0_16 directory and set the classpath as follows:
C:>set classpath=%C:\Program Files\Java\jdk1.6.0_16\lib\checkstyle-5.5\checkstyle-5.5-all.jar
Is this correct? Should I do the same for Findbugs? Thanks in advance
EDIT: When I added the above path using the environmental variables, and ran checkstyle hello.java, I got the error: 'checkstyle' is not recognized as an internal or external command, operable program or batch file
Maven will solve this problem for you
It sounds like you're just getting started in the world of Java. To that end, I'd suggest that you look into Maven for your build process. Also, you should be using at least JDK1.6.0_33 at the time of writing.
Essentially, Maven will manage the process of running Checkstyle, Findbugs (and you should also consider PMD) via standard plugins against your code. It will also manage the creation of the Javadocs, linked source code and generate a website for your project. Further, Maven promotes a good release process whereby you work against snapshots until ready to share your work to the wider world.
And if I don't use Maven?
Well, just create a /lib folder in your project and stuff your dependencies into it. Over time you will create more and more and these will get intertwined. After a while you will enter JAR Hell and turn to Maven to solve the problem.
We've all been there.

Categories