How do I run a JAR file on an EC2 instance? - java

I wrote a basic MapReduce program on my MacBook utilizing Apache's resource here:
https://hadoop.apache.org/docs/r1.2.1/mapred_tutorial.html
After I finished, I Exported a jar of my project and transferred it to my EC2 instance through SSH.
After that, I ran this command through the terminal of my EC2 instance:
/usr/local/hadoop/bin/hadoop jar test.jar com.map.reduce games.tar.gz output
Where /usr/local/hadoop/bin/hadoop is where hadoop is installed on the EC2, test.jar is the transfered jar file and com.map.reduce is the package name where all of my classes are hosted. games.tar.gz is the directory I will be working with and output is where I want to see my results.
But I am getting the exception:
Exception in thread "main" java.lang.ClassNotFoundException:
at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:466)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:566)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:499)
at java.base/java.lang.Class.forName0(Native Method)
at java.base/java.lang.Class.forName(Class.java:374)
at org.apache.hadoop.util.RunJar.run(RunJar.java:311)
at org.apache.hadoop.util.RunJar.main(RunJar.java:232)
I am wondering if this is an issue with the JARS I am using locally. Any help is appreciated.

Related

Run co-occurrence algorithm in hadoop

I found the following project on github https://github.com/fbukevin/hadoop-cooccurrence which uses a co-occurrence algorithm in hadoop.
I’m using a virtualized Ubuntu 14.04 and managed to install hadoop as a single node cluster with this instruction http://www.bogotobogo.com/Hadoop/BigData_hadoop_Install_on_ubuntu_single_node_cluster.php. I'm new to hadoop and this are my first attempts to run a program with yarn.
I can execute the command yarn in command line, but I don’t know how to run the co-occurrence algorithm in yarn. In the description it says that the program can be used with the following command
$ yarn jar <hadoop>.jar [pairs | stripes] <input_file>
So I tried this:
$ yarn jar /home/vmiller/Downloads/hadoop-2.7.2/share/hadoop/common/hadoop-common-2.7.2.jar pairs pg100.txt
Exception in thread "main" java.lang.ClassNotFoundException: pairs
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:278)
at org.apache.hadoop.util.RunJar.run(RunJar.java:214)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
This is definitely not correct but I don't know how to run the command correctly. Somehow I have to tell yarn to use the Cooccurrence.java located in hadoop-cooccurrence/src/main/java/cooc/Cooccurrence.java because this file seems to be the one that executes the co-occurrence algorithm. But how do I tell yarn to use this file with the pairs and stripesarguments on the input file?
You should give jar the path to the jar including the Cooccurrence class.
Jar is in target folder (cooc-1.0-SNAPSHOT.jar).
You don't need to indicate the class name as it is set up in the Manifest file
I actually managed to run the programm. My approach wasn't that wrong, as tokiloutok mentioned I had to include the right jar file.
Before I could execute the command I had to import the pg100.txt into HDFS.
So I had to deactivate the safe mode of the name node with
hdfs dfsadmin -safemode leave
and import the file with
hdfs dfs -put /home/vmiller/workspace/hadoop-cooccurrence/pg100.txt /user/hadoop/
so that I could finally run
yarn jar target/cooc-1.0-SNAPSHOT.jar pairs pg100.txt
without getting any errors.

ClassNotFoundException when running hadoop jar

I'm attempting to run a MapReduce job from a jar file and keep getting a ClassNotFoundException error. I'm running Hadoop 1.2.1 on a Centos 6 virtual machine.
First I compiled the file exercise.java (and class) into a jar file exercise.jar using the following shell script compile.sh :
#!/bin/bash
javac -classpath /pathto/hadoop-common-1.2.1.jar:\
/pathto/hadoop-core-1.2.1.jar /pathto/exercise.java
jar cvf exercise.jar /pathto/*.class
This runs fine and the jar completes successfully. I then attempt to run the actual MapReduce job using shell script exec.sh:
#!/bin/bash
export CLASSPATH=$CLASSPATH:/pathto/hadoop-common-1.2.1.jar:\
/pathto/hadoop-core-1.2.1.jar:/pathto/exercise.class
hadoop jar exercise.jar exercise /data/input/inputfile.txt /data/output
This trows the ClassNotFoundException error :
Exception in thread "main" java.lang.ClassNotFoundException: exercise
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:274)
at org.apache.hadoop.util.RunJar.main(RunJar.java:153)
I realize the explicit path names might not be necessary but I've been a little desperate to double check everything. I've confirmed that in my exercise.java file exercise.class is in the job configuration via job.setJarByClass(exercise.class); and confirmed exercise.class is contained in exercise.jar. Can't seem to figure it out.
UPDATE
The exec.sh script with the full path of exercise.class. It's stored in my Eclipse project directory:
#!/bin/bash
export CLASSPATH=$CLASSPATH:/pathto/hadoop-common-1.2.1.jar:\
/pathto/hadoop-core-1.2.1.jar:/home/username/workspace/MVN_Hadoop/src/main/java.com.amend.hadoop.MapReduce/*
hadoop jar \
exercise.jar \
/home/username/workspace/MVN_Hadoop/src/main/java.com.amend.hadoop.MapReduce/exercise \
/data/input/inputfile.txt \
/data/output
When I actually try and run the exec.sh script using the explicitly written out path names, I also get a completely different set of errors:
Exception in thread "main" java.lang.ClassNotFoundException: /home/hdadmin/workspace/MVN_Hadoop/src/main/java/come/amend/hadoop/MapReduce/exercise
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:274)
at org.apache.hadoop.util.RunJar.main(RunJar.java:153)
I could see this possible errors.
From the Hadoop jar exercise.jar exercise /data/input/inputfile.txt /data/output please specify the full path of the exercise class. I.e org.name.package.exercise if exists. To cross check open the jar file and check the location of exercise.class location.
To continue, Hadoop doesn't expect jars to be included within the jars, since the path of Hadoop is set globally.
NEW:
See, the following path is some thing weird. "/home/hdadmin/workspace/MVN_Hadoop/src/main/java/come/amend/hadoop/MapReduce/exercise"
If you are running using your jar, how could a class path be so specific, instead of jar path. It could only be "come/amend/hadoop/MapReduce/exercise" this.

EMR-4.2.0 Error during running of custom jar (command-runner)

I am running sqoop installation script in AWS - EMR-4.2.0 Version, followed this documentation.
After created cluster (at Steps), I have submitted my sqoop script as an arguments and s3://elasticmapreduce/libs/script-runner/script-runner.jar/ command-runner.jar as a jar file, but getting error like this. Can you help me pls what is the cause and problem?
Error:
Exception in thread "main" java.lang.RuntimeException: java.io.IOException: Cannot run program "s3://bmsgcm/spark/install-sqoop.sh" (in directory "."): error=2, No such file or directory
at com.amazonaws.emr.command.runner.ProcessRunner.exec(ProcessRunner.java:139)
at com.amazonaws.emr.command.runner.CommandRunner.main(CommandRunner.java:13)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: java.io.IOException: Cannot run program "s3://bmsgcm/spark/install-sqoop.sh" (in directory "."): error=2, No such file or directory
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1047)
at com.amazonaws.emr.command.runner.ProcessRunner.exec(ProcessRunner.java:92)
... 7 more
Caused by: java.io.IOException: error=2, No such file or directory
at java.lang.UNIXProcess.forkAndExec(Native Method)
at java.lang.UNIXProcess.<init>(UNIXProcess.java:187)
at java.lang.ProcessImpl.start(ProcessImpl.java:130)
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1028)
... 8 more
command-runner.jar can only read local files. You can add a bootstrap script to copy files from S3 to local file system.
Piggybox is correct. Unlike script-runner.jar that was used on the 2.x and 3.x EMR AMIs, command-runner.jar can only run local commands. A bootstrap script is the best way to do this.
For instance, if you have a few spark drivers on S3, and you have a shell script (also on S3) to copy them to the master node for later use in a job flow step with spark-submit, then you might have had a step like this:
Steps=[
{
'Name': 'Install My Spark Drivers',
'ActionOnFailure':'TERMINATE_JOB_FLOW',
'HadoopJarStep':
'Jar': 's3://us-east-1.elasticmapreduce/libs/script-runner/script-runner.jar',
'Args': [
's3://my-bucket/spark-driver-install.sh',
]
}
},
...other steps...
]
Which, as you've experienced, will fail on EMR 4.x if you just swap command-runner.jar for script-runner.jar, there.
Instead, make a bootstrap action to call the script, like:
BootstrapActions=[
{
'Name': 'Install My Spark Drivers',
'ScriptBootstrapAction': {
'Path': 's3://my-bucket/spark-driver-install.sh',
'Args': []
}
}
]
The above example is expressed as boto3 run_job_flow kwargs. It's not immediately obvious to me how to accomplish the same thing in the web console, though.
For executing a script you can use script-runner. I was also facing the same issue. My script had ^M characters which was causing this issue. Removing those worked.

Exception during wordcount in Hadoop

I have installed Hadoop successfully and now I want to run Wordcount.jar. As shown below, my source address is /user/amir/dft/pg5000.txt and destination address to save results is /user/amir/dft/output.txt.
I have downloaded the .jar file from this url.
Now I'm facing this error message when I run the below command. I followed the instructions found at this url and now my problem is on "Run the MapReduce job" step. How can I overcome it?
amir#amir-Aspire-5820TG:/usr/local/hadoop$ bin/hadoop jar /usr/local/hadoop/wordcount.jar wordcount /user/amir/dft/pg5000.txt /user/amir/dft/output.txt
Exception in thread "main" java.lang.ClassNotFoundException: wordcount
at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
at java.lang.ClassLoader.loadClass(ClassLoader.java:323)
at java.lang.ClassLoader.loadClass(ClassLoader.java:268)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:270)
at org.apache.hadoop.util.RunJar.main(RunJar.java:205)
amir#amir-Aspire-5820TG:/usr/local/hadoop$
It means u have a typo or somthing wrong with the main class your specifying. Do u mean org.apache.hadoop.examples.WordCount instead of wordcount.
You don't need to download a new .jar file. A wordcount jar is already there in the examples of hadoop. Just use the command:
bin/hadoop jar hadoop*examples*.jar wordcount /user/amir/dft /user/amir/dft-output
The input and output paths should be directories on HDFS, not files. This will run the wordcount program on all the files that are uploaded on HDFS under the /user/amir/dft/ path, (including your pg5000.txt file).
EDIT: If you want to run this specific jar that you have downloaded, though, follow #samthebest's answer (keeping in mind that the input&output paths are directories).
EDIT2: Following the comments of this answer, it seems that the hadoop version used is newer than the one described in the tutorial. So the .jar for the wordcount program is located at the path hadoop_root/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar, as mentioned in this post.

Jar file with Jaudiotagger, NoClassDefFoundError

I made a java class using the library Jaudiotagger, and it runs without problems. I then made a jar out of it and I got NoClassDefFoundError. Here's how it went:
I put the main class id3tag.java and the library jaudiotagger-2.0.3.jar in a folder and compiled using the command line. The program ran smoothly without problems.
javac -cp .;jaudiotagger-2.0.3.jar id3tag.java
java -cp .;jaudiotagger-2.0.3.jar id3tag
I then created the manifest and the jar file.
echo Main-Class: id3tag >manifest.txt
jar cvfm id3tag.jar manifest.txt id3tag.class jaudiotagger-2.0.3.jar
I got the following output:
added manifest
adding: id3tag.class(in = 5952) (out= 2997)(deflated 49%)
adding: jaudiotagger-2.0.3.jar(in = 811441) (out= 740599)(deflated 8%)
I then ran the jar file, and got "A Java Exception has occurred.". I also tried:
java -jar id3tag.jar
And I got the output:
Exception in thread "main" java.lang.NoClassDefFoundError: org/jaudiotagger/tag/
FieldDataInvalidException
at java.lang.Class.getDeclaredMethods0(Native Method)
at java.lang.Class.privateGetDeclaredMethods(Class.java:2451)
at java.lang.Class.getMethod0(Class.java:2694)
at java.lang.Class.getMethod(Class.java:1622)
at sun.launcher.LauncherHelper.getMainMethod(LauncherHelper.java:494)
at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:486)
Caused by: java.lang.ClassNotFoundException: org.jaudiotagger.tag.FieldDataInval
idException
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
... 6 more
I then replaced all (both) error classes used from Jaudiotagger with Exception and recreated the jar. Now the same thing happens as for seemingly all other jar files when I run them (with Java(TM) Platform SE binary, or through writing id3tag.jar in the command prompt) - nothing. java -jar, however, works and gave me a runtime error.
Exception in thread "main" java.lang.NoClassDefFoundError: org/jaudiotagger/audi
o/AudioFileIO
at id3tag.tagSong(id3tag.java:112)
at id3tag.tagAlbum(id3tag.java:82)
at id3tag.tagArtist(id3tag.java:40)
at id3tag.main(id3tag.java:170)
Caused by: java.lang.ClassNotFoundException: org.jaudiotagger.audio.AudioFileIO
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
... 4 more
It seems like the entire library just doesn't do shit for me. How do I fix this, and how do I get jar files to run without java -jar? (Also, how do I get the full pile'o'errors in case someone needs it, rather than just having it say "x more"?)
I'm not very savvy with this kinda shit so the more specific the answer, the better. Thanks.
I'm on windows 8 and latest java (1.7.0_17).
this is informing you that you have no main class, as you might or might not already know all programs in almost all non web based programming languages must have a main class to start from. In java it looks like this
public static int main(String[] args){
}
The message is informing you that this class is no longer there, this could be a compiler error or something entirely different.
please try downloading the library again, and downloading eclipse ide for java ee developers.
this ide has a built in compiler that will work better than a command line sometimes.
i hope this helped.
I worked around it by including the org folder from jaudiotagger in my .jar instead of the actual .jar file, and I then used a .bat file to run it instead of running the .jar directly. Feel free to add your own answer if you find a better solution, and I'll check back.
I extracted jaudiotagger-2.0.3.jar with winrar so that the org folder was in the same folder as my main class. Then I could compile and run the main class simply by:
javac id3tag.java
java id3tag
I then created the manifest file the same way as before, and created the jar:
jar cvfm id3tag.jar manifest.txt id3tag.class org
The jar file worked with java -jar id3tag.jar, but simply writing id3tag.jar still did nothing. Turns out it's because jar files are by default run by the javaw.exe file, so you have to right click the jar file -> open with -> choose default program and navigate to the java.exe file in your java folder (just search the folder). Running the .jar then gets you "Error: Could not find or load main class path\id3tag.jar". I worked around this by using a .bat file. I entered
java -jar path\id3tag.jar
into Notepad and saved as whatevername.bat (save as type: all files).
I created a new question for getting the .jar file to work properly. See .jar error - could not find or load main class.

Categories