Classpath to use for MapR/Hadoop/Hive - java

I'm trying to compile some java code for hadoop and need to know what classpath I need to specify. For cloudera I use this below but what do I use for a MapR installation? Surprisingly I could only find how to set the classpath in google, not what to set it to.
javac -classpath "/opt/cloudera/parcels/CDH-4.6.0-1.cdh4.6.0.p0.26/lib/hadoop/client/*" mr.java -d mr

Found the answer by trial and error. Oddly google is very silent on this and all the books and examples I've read appear to assume this is too obvious to bother printing.
mkdir MyClass
javac -classpath "/opt/mapr/hadoop/hadoop-0.20.2/lib/*" MyClass.java -d MyClass
jar -cvf MyClass.jar -C MyClass .
Additionally, if you want the hive libraries, eg for compiling a hive UDF:
javac -classpath "/opt/mapr/hadoop/hadoop-0.20.2/lib/*:/opt/mapr/hive/hive-0.12/lib/*" MyClass.java -d MyClass
EDIT: one thing I would add is make sure you put quotes around the path, otherwise linux expands it on the command line which is not what you want. The * in the path needs to be passed to java as is.

Related

Run a Java Class with Command-Line

I have 4 Java-Data:
Common.java
Constants.java
KeywordsEditor.java
ExecutionEngine.java (There is here a Main-Method)
I have successful compiled in Command-Line with this Command from Project-Directory (C:\ProjectDemo\src\main\java\ValueInput)
javac -cp "C:\Users\ABC\selenium-java-2.48.2\selenium-2.48.2\selenium-java-2.48.2.jar;selenium-server-standalone-3.141.59.j‌​ar" *.java
I got 4 Data .class in the same Directory. Now i want to run them with this code:
java -cp "C:\Users\ABC\selenium-java-2.48.2\selenium-2.48.2\selenium-java-2.48.2.jar;selenium-server-standalone-3.141.59.j‌​ar" ExecutionEngine
But i got Error:
Error: ExecutionEngine main class could not be found or loaded
```
I've tried with some same code else:
```
java -cp "C:\Users\ABC\selenium-java-2.48.2\selenium-2.48.2\selenium-java-2.48.2.jar;libs\*;selenium-server-standalone-3.141.59.j‌​ar" ExecutionEngine
java -cp "C:\Users\ABC\selenium-java-2.48.2\selenium-2.48.2\selenium-java-2.48.2.jar;libs/*;selenium-server-standalone-3.141.59.j‌​ar" ExecutionEngine
```
And some more, but they don't work. Can somebody help me?
Update
From your comment, I learnt that you have package ValueInput; mentioned in ExecutionEngine.java. Therefore, you should use the switch -d when compiling:
javac -d . -cp "C:\Users\ABC\selenium-java-2.48.2\selenium-2.48.2\selenium-java-2.48.2.jar;selenium-server-standalone-3.141.59.j‌​ar" *.java
The option -d . asks the compiler to place the generated class files at the current directory. Now, if you use the command ls in Mac/Unix or dir in Windows, you will see a directory, ValueInput has been created and all the .class files have been placed inside this directory. Learn more about the switches by simply using the command javac
In order to execute ExecutionEngine.class, you can now use the following command:
java -cp ".;C:\Users\ABC\selenium-java-2.48.2\selenium-2.48.2\selenium-java-2.48.2.jar;selenium-server-standalone-3.141.59.j‌​ar" ValueInput.ExecutionEngine
You can also check this answer for a similar solution.
Side note: You should follow the Java naming conventions. As per the convention, the name of the package should be something like value.input.
Original answer
The root cause of the problem is using only jars with -cp. You missed realizing that your ExecutionEngine.class is not in the jars; rather it is at the current directory which is denoted by a dot (.) which you missed to include in the classpath.
Thus, the correct command will be:
java -cp ".;C:\Users\ABC\selenium-java-2.48.2\selenium-2.48.2\selenium-java-2.48.2.jar;selenium-server-standalone-3.141.59.j‌​ar" ExecutionEngine
It doesn't matter where you put . i.e. the current directory e.g. the following will also work for you:
java -cp "C:\Users\ABC\selenium-java-2.48.2\selenium-2.48.2\selenium-java-2.48.2.jar;selenium-server-standalone-3.141.59.j‌​ar;." ExecutionEngine
Note for Mac:
The separator used for this purpose in Mac is : instead of ; e.g.
javac -cp mysql-connector-java-5.1.49.jar MysqlDemo.java
java -cp mysql-connector-java-5.1.49.jar:. MysqlDemo
Note for Java-11 onwards:
Java-11 allows launching Single-File Source-Code programs without compiling e.g.
java -cp mysql-connector-java-5.1.49.jar MysqlDemo.java
You can learn more about it from this article.

classpath relative to current working directory when using javac

I haven't been able to find a good answer to this yet, but when using javac, I find that for classpath I can't seem to get it to work using a relative path to the current working directory, although it seems to work for the target directory and source file location. Here is what we're currently using:
javac java/src/com/<our-company>/ecommerce/<vendor-tool>/executor/*.java java/src/com/<our-company>/ecommerce/<vendor-tool>/util/*.java -d ./target/java/bin -cp /c/Git/<our-api>/java/lib/some.jar:/c/Git/<our-api>/java/lib/another.jar -verbose
And we would like to use something like this:
javac java/src/com/<our-company>/ecommerce/<vendor-tool>/executor/*.java java/src/com/<our-company>/ecommerce/<vendor-tool>/util/*.java -d ./target/java/bin -cp java/lib/some.jar:java/lib/another.jar -verbose
or perhaps:
javac java/src/com/<our-company>/ecommerce/<vendor-tool>/executor/*.java java/src/com/<our-company>/ecommerce/<vendor-tool>/util/*.java -d ./target/java/bin -cp ./java/lib/some.jar:./java/lib/another.jar -verbose
The first example seems to work, but the other two get an error that the packages within those jars cannot be found. Since we are using Git, I would prefer the whole build path be relative to allow for wherever the individual's Git repo is located locally--as well as be flexible concerning which operating system they may be using (this is being run in a shell which should operate the same for each of the platforms).

What does the cp in java -cp ... mean in a unix script?

I'm debugging a unix script. It makes a call to a java program with an option of -cp. Does anyone know what that does? I've never seen the -cp option before. Nor am I able to google an answer. It looks like this:
java -cp ../myboot.jar -Djava.security.policy=$SOME_POLICY...
Thanks for your help.
It's the shortcut for classpath ;)
From
Windows:
http://docs.oracle.com/javase/7/docs/technotes/tools/windows/java.html
Unix:
http://docs.oracle.com/javase/7/docs/technotes/tools/solaris/java.html
-cp classpath
Specifies a list of directories, JAR files, and ZIP archives to search for class files. Separate class path entries with semicolons (;). Specifying -classpath or -cp overrides any setting of the CLASSPATH environment variable

What are the common errors you see when you run 'java -cp ...' or 'java -classpath'? How do you set a directory of jars in classpath? [duplicate]

Is there a way to include all the jar files within a directory in the classpath?
I'm trying java -classpath lib/*.jar:. my.package.Program and it is not able to find class files that are certainly in those jars. Do I need to add each jar file to the classpath separately?
Using Java 6 or later, the classpath option supports wildcards. Note the following:
Use straight quotes (")
Use *, not *.jar
Windows
java -cp "Test.jar;lib/*" my.package.MainClass
Unix
java -cp "Test.jar:lib/*" my.package.MainClass
This is similar to Windows, but uses : instead of ;. If you cannot use wildcards, bash allows the following syntax (where lib is the directory containing all the Java archive files):
java -cp "$(printf %s: lib/*.jar)"
(Note that using a classpath is incompatible with the -jar option. See also: Execute jar file with multiple classpath libraries from command prompt)
Understanding Wildcards
From the Classpath document:
Class path entries can contain the basename wildcard character *, which is considered equivalent to specifying a list of all the files
in the directory with the extension .jar or .JAR. For example, the
class path entry foo/* specifies all JAR files in the directory named
foo. A classpath entry consisting simply of * expands to a list of all
the jar files in the current directory.
A class path entry that contains * will not match class files. To
match both classes and JAR files in a single directory foo, use either
foo;foo/* or foo/*;foo. The order chosen determines whether the
classes and resources in foo are loaded before JAR files in foo, or
vice versa.
Subdirectories are not searched recursively. For example, foo/* looks
for JAR files only in foo, not in foo/bar, foo/baz, etc.
The order in which the JAR files in a directory are enumerated in the
expanded class path is not specified and may vary from platform to
platform and even from moment to moment on the same machine. A
well-constructed application should not depend upon any particular
order. If a specific order is required then the JAR files can be
enumerated explicitly in the class path.
Expansion of wildcards is done early, prior to the invocation of a
program's main method, rather than late, during the class-loading
process itself. Each element of the input class path containing a
wildcard is replaced by the (possibly empty) sequence of elements
generated by enumerating the JAR files in the named directory. For
example, if the directory foo contains a.jar, b.jar, and c.jar, then
the class path foo/* is expanded into foo/a.jar;foo/b.jar;foo/c.jar,
and that string would be the value of the system property
java.class.path.
The CLASSPATH environment variable is not treated any differently from
the -classpath (or -cp) command-line option. That is, wildcards are
honored in all these cases. However, class path wildcards are not
honored in the Class-Path jar-manifest header.
Note: due to a known bug in java 8, the windows examples must use a backslash preceding entries with a trailing asterisk: https://bugs.openjdk.java.net/browse/JDK-8131329
Under Windows this works:
java -cp "Test.jar;lib/*" my.package.MainClass
and this does not work:
java -cp "Test.jar;lib/*.jar" my.package.MainClass
Notice the *.jar, so the * wildcard should be used alone.
On Linux, the following works:
java -cp "Test.jar:lib/*" my.package.MainClass
The separators are colons instead of semicolons.
We get around this problem by deploying a main jar file myapp.jar which contains a manifest (Manifest.mf) file specifying a classpath with the other required jars, which are then deployed alongside it. In this case, you only need to declare java -jar myapp.jar when running the code.
So if you deploy the main jar into some directory, and then put the dependent jars into a lib folder beneath that, the manifest looks like:
Manifest-Version: 1.0
Implementation-Title: myapp
Implementation-Version: 1.0.1
Class-Path: lib/dep1.jar lib/dep2.jar
NB: this is platform-independent - we can use the same jars to launch on a UNIX server or on a Windows PC.
My solution on Ubuntu 10.04 using java-sun 1.6.0_24 having all jars in "lib" directory:
java -cp .:lib/* my.main.Class
If this fails, the following command should work (prints out all *.jars in lib directory to the classpath param)
java -cp $(for i in lib/*.jar ; do echo -n $i: ; done). my.main.Class
Short answer: java -classpath lib/*:. my.package.Program
Oracle provides documentation on using wildcards in classpaths here for Java 6 and here for Java 7, under the section heading Understanding class path wildcards. (As I write this, the two pages contain the same information.) Here's a summary of the highlights:
In general, to include all of the JARs in a given directory, you can use the wildcard * (not *.jar).
The wildcard only matches JARs, not class files; to get all classes in a directory, just end the classpath entry at the directory name.
The above two options can be combined to include all JAR and class files in a directory, and the usual classpath precedence rules apply. E.g. -cp /classes;/jars/*
The wildcard will not search for JARs in subdirectories.
The above bullet points are true if you use the CLASSPATH system property or the -cp or -classpath command line flags. However, if you use the Class-Path JAR manifest header (as you might do with an ant build file), wildcards will not be honored.
Yes, my first link is the same one provided in the top-scoring answer (which I have no hope of overtaking), but that answer doesn't provide much explanation beyond the link. Since that sort of behavior is discouraged on Stack Overflow these days, I thought I'd expand on it.
Windows:
java -cp file.jar;dir/* my.app.ClassName
Linux:
java -cp file.jar:dir/* my.app.ClassName
Remind:
- Windows path separator is ;
- Linux path separator is :
- In Windows if cp argument does not contains white space, the "quotes" is optional
For me this works in windows .
java -cp "/lib/*;" sample
For linux
java -cp "/lib/*:" sample
I am using Java 6
You can try java -Djava.ext.dirs=jarDirectory
http://docs.oracle.com/javase/6/docs/technotes/guides/extensions/spec.html
Directory for external jars when running java
Correct:
java -classpath "lib/*:." my.package.Program
Incorrect:
java -classpath "lib/a*.jar:." my.package.Program
java -classpath "lib/a*:." my.package.Program
java -classpath "lib/*.jar:." my.package.Program
java -classpath lib/*:. my.package.Program
If you are using Java 6, then you can use wildcards in the classpath.
Now it is possible to use wildcards in classpath definition:
javac -cp libs/* -verbose -encoding UTF-8 src/mypackage/*.java -d build/classes
Ref: http://www.rekk.de/bloggy/2008/add-all-jars-in-a-directory-to-classpath-with-java-se-6-using-wildcards/
If you really need to specify all the .jar files dynamically you could use shell scripts, or Apache Ant. There's a commons project called Commons Launcher which basically lets you specify your startup script as an ant build file (if you see what I mean).
Then, you can specify something like:
<path id="base.class.path">
<pathelement path="${resources.dir}"/>
<fileset dir="${extensions.dir}" includes="*.jar" />
<fileset dir="${lib.dir}" includes="*.jar"/>
</path>
In your launch build file, which will launch your application with the correct classpath.
Please note that wildcard expansion is broken for Java 7 on Windows.
Check out this StackOverflow issue for more information.
The workaround is to put a semicolon right after the wildcard. java -cp "somewhere/*;"
To whom it may concern,
I found this strange behaviour on Windows under an MSYS/MinGW shell.
Works:
$ javac -cp '.;c:\Programs\COMSOL44\plugins\*' Reclaim.java
Doesn't work:
$ javac -cp 'c:\Programs\COMSOL44\plugins\*' Reclaim.java
javac: invalid flag: c:\Programs\COMSOL44\plugins\com.comsol.aco_1.0.0.jar
Usage: javac <options> <source files>
use -help for a list of possible options
I am quite sure that the wildcard is not expanded by the shell, because e.g.
$ echo './*'
./*
(Tried it with another program too, rather than the built-in echo, with the same result.)
I believe that it's javac which is trying to expand it, and it behaves differently whether there is a semicolon in the argument or not. First, it may be trying to expand all arguments that look like paths. And only then it would parse them, with -cp taking only the following token. (Note that com.comsol.aco_1.0.0.jar is the second JAR in that directory.) That's all a guess.
This is
$ javac -version
javac 1.7.0
All the above solutions work great if you develop and run the Java application outside any IDE like Eclipse or Netbeans.
If you are on Windows 7 and used Eclipse IDE for Development in Java, you might run into issues if using Command Prompt to run the class files built inside Eclipse.
E.g. Your source code in Eclipse is having the following package hierarchy:
edu.sjsu.myapp.Main.java
You have json.jar as an external dependency for the Main.java
When you try running Main.java from within Eclipse, it will run without any issues.
But when you try running this using Command Prompt after compiling Main.java in Eclipse, it will shoot some weird errors saying "ClassNotDef Error blah blah".
I assume you are in the working directory of your source code !!
Use the following syntax to run it from command prompt:
javac -cp ".;json.jar" Main.java
java -cp ".;json.jar" edu.sjsu.myapp.Main
[Don't miss the . above]
This is because you have placed the Main.java inside the package edu.sjsu.myapp and java.exe will look for the exact pattern.
Hope it helps !!
macOS, current folder
For Java 13 on macOS Mojave…
If all your .jar files are in the same folder, use cd to make that your current working directory. Verify with pwd.
For the -classpath you must first list the JAR file for your app. Using a colon character : as a delimiter, append an asterisk * to get all other JAR files within the same folder. Lastly, pass the full package name of the class with your main method.
For example, for an app in a JAR file named my_app.jar with a main method in a class named App in a package named com.example, alongside some needed jars in the same folder:
java -classpath my_app.jar:* com.example.App
For windows quotes are required and ; should be used as separator. e.g.:
java -cp "target\\*;target\\dependency\\*" my.package.Main
Short Form: If your main is within a jar, you'll probably need an additional '-jar pathTo/yourJar/YourJarsName.jar ' explicitly declared to get it working (even though 'YourJarsName.jar' was on the classpath)
(or, expressed to answer the original question that was asked 5 years ago: you don't need to redeclare each jar explicitly, but does seem, even with java6 you need to redeclare your own jar ...)
Long Form:
(I've made this explicit to the point that I hope even interlopers to java can make use of this)
Like many here I'm using eclipse to export jars: (File->Export-->'Runnable JAR File'). There are three options on 'Library handling' eclipse (Juno) offers:
opt1: "Extract required libraries into generated JAR"
opt2: "Package required libraries into generated JAR"
opt3: "Copy required libraries into a sub-folder next to the generated JAR"
Typically I'd use opt2 (and opt1 was definitely breaking), however native code in one of the jars I'm using I discovered breaks with the handy "jarinjar" trick that eclipse leverages when you choose that option. Even after realizing I needed opt3, and then finding this StackOverflow entry, it still took me some time to figure it out how to launch my main outside of eclipse, so here's what worked for me, as it's useful for others...
If you named your jar: "fooBarTheJarFile.jar"
and all is set to export to the dir: "/theFully/qualifiedPath/toYourChosenDir".
(meaning the 'Export destination' field will read: '/theFully/qualifiedPath/toYourChosenDir/fooBarTheJarFile.jar' )
After you hit finish, you'll find eclipse then puts all the libraries into a folder named 'fooBarTheJarFile_lib' within that export directory, giving you something like:
/theFully/qualifiedPath/toYourChosenDir/fooBarTheJarFile.jar
/theFully/qualifiedPath/toYourChosenDir/fooBarTheJarFile_lib/SomeOtherJar01.jar
/theFully/qualifiedPath/toYourChosenDir/fooBarTheJarFile_lib/SomeOtherJar02.jar
/theFully/qualifiedPath/toYourChosenDir/fooBarTheJarFile_lib/SomeOtherJar03.jar
/theFully/qualifiedPath/toYourChosenDir/fooBarTheJarFile_lib/SomeOtherJar04.jar
You can then launch from anywhere on your system with:
java -classpath "/theFully/qualifiedPath/toYourChosenDir/fooBarTheJarFile_lib/*" -jar /theFully/qualifiedPath/toYourChosenDir/fooBarTheJarFile.jar package.path_to.the_class_with.your_main.TheClassWithYourMain
(For Java Newbies: 'package.path_to.the_class_with.your_main' is the declared package-path that you'll find at the top of the 'TheClassWithYourMain.java' file that contains the 'main(String[] args){...}' that you wish to run from outside java)
The pitfall to notice: is that having 'fooBarTheJarFile.jar' within the list of jars on your declared classpath is not enough. You need to explicitly declare '-jar', and redeclare the location of that jar.
e.g. this breaks:
java -classpath "/theFully/qualifiedPath/toYourChosenDir/fooBarTheJarFile.jar;/theFully/qualifiedPath/toYourChosenDir/fooBarTheJarFile_lib/*" somepackages.inside.yourJar.leadingToTheMain.TheClassWithYourMain
restated with relative paths:
cd /theFully/qualifiedPath/toYourChosenDir/;
BREAKS: java -cp "fooBarTheJarFile_lib/*" package.path_to.the_class_with.your_main.TheClassWithYourMain
BREAKS: java -cp ".;fooBarTheJarFile_lib/*" package.path_to.the_class_with.your_main.TheClassWithYourMain
BREAKS: java -cp ".;fooBarTheJarFile_lib/*" -jar package.path_to.the_class_with.your_main.TheClassWithYourMain
WORKS: java -cp ".;fooBarTheJarFile_lib/*" -jar fooBarTheJarFile.jar package.path_to.the_class_with.your_main.TheClassWithYourMain
(using java version "1.6.0_27"; via OpenJDK 64-Bit Server VM on ubuntu 12.04)
You need to add them all separately. Alternatively, if you really need to just specify a directory, you can unjar everything into one dir and add that to your classpath. I don't recommend this approach however as you risk bizarre problems in classpath versioning and unmanagability.
The only way I know how is to do it individually, for example:
setenv CLASSPATH /User/username/newfolder/jarfile.jar:jarfile2.jar:jarfile3.jar:.
Hope that helps!
class from wepapp:
> mvn clean install
> java -cp "webapp/target/webapp-1.17.0-SNAPSHOT/WEB-INF/lib/tool-jar-1.17.0-SNAPSHOT.jar;webapp/target/webapp-1.17.0-SNAPSHOT/WEB-INF/lib/*" com.xx.xx.util.EncryptorUtils param1 param2
Think of a jar file as the root of a directory structure. Yes, you need to add them all separately.
Not a direct solution to being able to set /* to -cp but I hope you could use the following script to ease the situation a bit for dynamic class-paths and lib directories.
libDir2Scan4jars="../test";cp=""; for j in `ls ${libDir2Scan4jars}/*.jar`; do if [ "$j" != "" ]; then cp=$cp:$j; fi; done; echo $cp| cut -c2-${#cp} > .tmpCP.tmp; export tmpCLASSPATH=`cat .tmpCP.tmp`; if [ "$tmpCLASSPATH" != "" ]; then echo .; echo "classpath set, you can now use ~> java -cp \$tmpCLASSPATH"; echo .; else echo .; echo "Error please check libDir2Scan4jars path"; echo .; fi;
Scripted for Linux, could have a similar one for windows too. If proper directory is provided as input to the "libDir2Scan4jars"; the script will scan all the jars and create a classpath string and export it to a env variable "tmpCLASSPATH".
Set the classpath in a way suitable multiple jars and current directory's class files.
CLASSPATH=${ORACLE_HOME}/jdbc/lib/ojdbc6.jar:${ORACLE_HOME}/jdbc/lib/ojdbc14.jar:${ORACLE_HOME}/jdbc/lib/nls_charset12.jar;
CLASSPATH=$CLASSPATH:/export/home/gs806e/tops/jconn2.jar:.;
export CLASSPATH
I have multiple jars in a folder. The below command worked for me in JDK1.8 to include all jars present in the folder. Please note that to include in quotes if you have a space in the classpath
Windows
Compiling: javac -classpath "C:\My Jars\sdk\lib\*" c:\programs\MyProgram.java
Running: java -classpath "C:\My Jars\sdk\lib\*;c:\programs" MyProgram
Linux
Compiling: javac -classpath "/home/guestuser/My Jars/sdk/lib/*" MyProgram.java
Running: java -classpath "/home/guestuser/My Jars/sdk/lib/*:/home/guestuser/programs" MyProgram
Order of arguments to java command is also important:
c:\projects\CloudMirror>java Javaside -cp "jna-5.6.0.jar;.\"
Error: Unable to initialize main class Javaside
Caused by: java.lang.NoClassDefFoundError: com/sun/jna/Callback
versus
c:\projects\CloudMirror>java -cp "jna-5.6.0.jar;.\" Javaside
Exception in thread "main" java.lang.UnsatisfiedLinkError: Unable

Compile Hadoop 2.2.0 job?

It seems that all of the examples are constructed with older versions in mind.
How do I compile my java program on Ubuntu such that it will refer to hadoop-2.2.0 libraries?
Where are the jar files that I am supposed to include?
What is the command?
Is it like -
javac -classpath libraries wordcount.java
Thank you.
The simplest solution for Linux machines would be:
javac -classpath `yarn classpath` -d . WordCount.java
Or:
export CLASSPATH=`yarn classpath`
javac -classpath $CLASSPATH -d . WordCount.java
I found the following:
javac -classpath $HADOOP_HOME/share/hadoop/common/hadoop-common-2.2.0.jar:$HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.2.0.jar:$HADOOP_HOME/share/hadoop/common/lib/commons-cli-1.2.jar -d wordcount_classes myWordCount.java
This allowed me to compile the Wordcount example (or in this case a copy of mine called myWordCount).
Hadoop has a command "hadoop classpath" that supplies you with the necessary classpath.
ie
hadoop classpath
/etc/hadoop/conf:/usr/lib/hadoop/lib/:/usr/lib/hadoop/.//:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/:/usr/lib/hadoop-hdfs/.//:/usr/lib/hadoop-yarn/lib/:/usr/lib/hadoop-yarn/.//:/usr/lib/hadoop-mapreduce/lib/:/usr/lib/hadoop-mapreduce/.//
So if you wanna compile you can use it this way..
javac -classpath $(hadoop classpath) -d . WordCount.java
you have to instal Cygin and there you can run your hadoop example and also you can configure your hadoop with eclipse
Run the command: "yarn classpath" to see a list of directories. When I use this list as my -classpath option for javac, my Java program compiles.
I am running HortonWorks v2.0, Apache Hadoop 2.2.0.
I'm having bumpy ride with Hadoop Example jars too. Information in many videos/tutorials/blogs is based on older version.
When we compile these examples or write any of our own MapReduce program, that is going to use hadoop packages (i.e. import jar in IDE/add reference to external jars - akin to Add reference to .dll in MS Visual Studio), and IDE will take care of correctly calling javac for each class.
Now for manually compiling any class e.g. WordCount.java, we need to tell javac which all jars our class is dependent on. I followed outdated videos but that shared one information i.e. to set a variable in .bashrc, having reference to all Hadoop related jar files and then use that in javac -classpath $VARIABLE filename.java.
e.g. I'm using name as $HADOOP_CLASSPATH and values as shown here (I'm on Mac OS X)
/usr/local/hadoop/etc/hadoop:/usr/local/hadoop/etc/hadoop:/usr/local/hadoop/etc/hadoop:/usr/local/hadoop/share/hadoop/common/lib/:/usr/local/hadoop/share/hadoop/common/:/usr/local/hadoop/share/hadoop/hdfs:/usr/local/hadoop/share/hadoop/hdfs/lib/:/usr/local/hadoop/share/hadoop/hdfs/:/usr/local/hadoop/share/hadoop/yarn/lib/:/usr/local/hadoop/share/hadoop/yarn/:/usr/local/hadoop/share/hadoop/mapreduce/lib/:/usr/local/hadoop/share/hadoop/mapreduce/:/contrib/capacity-scheduler/.jar:/usr/local/hadoop/share/hadoop/yarn/:/usr/local/hadoop/share/hadoop/yarn/lib/*
with this variable, I could compile class successfully.
"javac -classpath $HADOOP_CLASSPATH WordCount.java "

Categories