Running wordcount.jar on hadoop in windows using command line - java

I am trying to run a simple wordcount program on hadoop, but facing an error as below.
Exception in thread "main" java.io.IOException: Error opening job jar: /user/asiapac/bmohanty6/wordcount/wordcount.jar
at org.apache.hadoop.util.RunJar.main(RunJar.java:90)
Caused by: java.io.FileNotFoundException: \user\asiapac\bmohanty6\wordcount\wordcount.jar (The system cannot find the path specified)
at java.util.zip.ZipFile.open(Native Method)
at java.util.zip.ZipFile.<init>(Unknown Source)
at java.util.zip.ZipFile.<init>(Unknown Source)
at java.util.jar.JarFile.<init>(Unknown Source)
at java.util.jar.JarFile.<init>(Unknown Source)
at org.apache.hadoop.util.RunJar.main(RunJar.java:88)
I am using below command.
$ bin/hadoop jar /user/asiapac/bmohanty6/wordcount/wordcount.jar WordCount /user/asiapac/bmohanty6/wo
rdcount/input /user/asiapac/bmohanty6/wordcount/output
I am using Cygwin, hadoop-0.20.2 with pseudo node set up. I have also uploaded the wordcount.jar to my DFS. See below my DFS screenshot
I am able to run the same wordcount program in eclipse successfully. I have created the wordcount.jar file via eclipse as per this tutorial. I searched a lot in web but could not understand how to solve this. Please help me.

You need to add / before user:
bin/hadoop jar /user/asiapac/bmohanty6/wordcount/wordcount.jar WordCount /user/asiapac/bmohanty6/wordcount/input /user/asiapac/bmohanty6/wordcount/output
This makes them fully-qualified paths. If you omit / before user, Hadoop will search from the current directory.

Related

Run co-occurrence algorithm in hadoop

I found the following project on github https://github.com/fbukevin/hadoop-cooccurrence which uses a co-occurrence algorithm in hadoop.
I’m using a virtualized Ubuntu 14.04 and managed to install hadoop as a single node cluster with this instruction http://www.bogotobogo.com/Hadoop/BigData_hadoop_Install_on_ubuntu_single_node_cluster.php. I'm new to hadoop and this are my first attempts to run a program with yarn.
I can execute the command yarn in command line, but I don’t know how to run the co-occurrence algorithm in yarn. In the description it says that the program can be used with the following command
$ yarn jar <hadoop>.jar [pairs | stripes] <input_file>
So I tried this:
$ yarn jar /home/vmiller/Downloads/hadoop-2.7.2/share/hadoop/common/hadoop-common-2.7.2.jar pairs pg100.txt
Exception in thread "main" java.lang.ClassNotFoundException: pairs
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:278)
at org.apache.hadoop.util.RunJar.run(RunJar.java:214)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
This is definitely not correct but I don't know how to run the command correctly. Somehow I have to tell yarn to use the Cooccurrence.java located in hadoop-cooccurrence/src/main/java/cooc/Cooccurrence.java because this file seems to be the one that executes the co-occurrence algorithm. But how do I tell yarn to use this file with the pairs and stripesarguments on the input file?
You should give jar the path to the jar including the Cooccurrence class.
Jar is in target folder (cooc-1.0-SNAPSHOT.jar).
You don't need to indicate the class name as it is set up in the Manifest file
I actually managed to run the programm. My approach wasn't that wrong, as tokiloutok mentioned I had to include the right jar file.
Before I could execute the command I had to import the pg100.txt into HDFS.
So I had to deactivate the safe mode of the name node with
hdfs dfsadmin -safemode leave
and import the file with
hdfs dfs -put /home/vmiller/workspace/hadoop-cooccurrence/pg100.txt /user/hadoop/
so that I could finally run
yarn jar target/cooc-1.0-SNAPSHOT.jar pairs pg100.txt
without getting any errors.

ClassNotFoundException when running hadoop jar

I'm attempting to run a MapReduce job from a jar file and keep getting a ClassNotFoundException error. I'm running Hadoop 1.2.1 on a Centos 6 virtual machine.
First I compiled the file exercise.java (and class) into a jar file exercise.jar using the following shell script compile.sh :
#!/bin/bash
javac -classpath /pathto/hadoop-common-1.2.1.jar:\
/pathto/hadoop-core-1.2.1.jar /pathto/exercise.java
jar cvf exercise.jar /pathto/*.class
This runs fine and the jar completes successfully. I then attempt to run the actual MapReduce job using shell script exec.sh:
#!/bin/bash
export CLASSPATH=$CLASSPATH:/pathto/hadoop-common-1.2.1.jar:\
/pathto/hadoop-core-1.2.1.jar:/pathto/exercise.class
hadoop jar exercise.jar exercise /data/input/inputfile.txt /data/output
This trows the ClassNotFoundException error :
Exception in thread "main" java.lang.ClassNotFoundException: exercise
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:274)
at org.apache.hadoop.util.RunJar.main(RunJar.java:153)
I realize the explicit path names might not be necessary but I've been a little desperate to double check everything. I've confirmed that in my exercise.java file exercise.class is in the job configuration via job.setJarByClass(exercise.class); and confirmed exercise.class is contained in exercise.jar. Can't seem to figure it out.
UPDATE
The exec.sh script with the full path of exercise.class. It's stored in my Eclipse project directory:
#!/bin/bash
export CLASSPATH=$CLASSPATH:/pathto/hadoop-common-1.2.1.jar:\
/pathto/hadoop-core-1.2.1.jar:/home/username/workspace/MVN_Hadoop/src/main/java.com.amend.hadoop.MapReduce/*
hadoop jar \
exercise.jar \
/home/username/workspace/MVN_Hadoop/src/main/java.com.amend.hadoop.MapReduce/exercise \
/data/input/inputfile.txt \
/data/output
When I actually try and run the exec.sh script using the explicitly written out path names, I also get a completely different set of errors:
Exception in thread "main" java.lang.ClassNotFoundException: /home/hdadmin/workspace/MVN_Hadoop/src/main/java/come/amend/hadoop/MapReduce/exercise
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:274)
at org.apache.hadoop.util.RunJar.main(RunJar.java:153)
I could see this possible errors.
From the Hadoop jar exercise.jar exercise /data/input/inputfile.txt /data/output please specify the full path of the exercise class. I.e org.name.package.exercise if exists. To cross check open the jar file and check the location of exercise.class location.
To continue, Hadoop doesn't expect jars to be included within the jars, since the path of Hadoop is set globally.
NEW:
See, the following path is some thing weird. "/home/hdadmin/workspace/MVN_Hadoop/src/main/java/come/amend/hadoop/MapReduce/exercise"
If you are running using your jar, how could a class path be so specific, instead of jar path. It could only be "come/amend/hadoop/MapReduce/exercise" this.

How do I use mysqlimport?

Here is the command I am running using Runtime.getRuntime().exec() in Java:
mysqlimport --fields-terminated-by=, --lines-terminated-by="|" --local
--user=u --password=p DatabaseName
txtpath
Here is the error I get:
java.io.IOException: Cannot run program "mysqlimport": CreateProcess error=2, The system cannot find the file specified
at java.lang.ProcessBuilder.start(Unknown Source)
at java.lang.Runtime.exec(Unknown Source)
at java.lang.Runtime.exec(Unknown Source)
at java.lang.Runtime.exec(Unknown Source)
at databaseCommunication.UploadThread.run(UploadThread.java:66)
Caused by: java.io.IOException: CreateProcess error=2, The system cannot find the file specified
at java.lang.ProcessImpl.create(Native Method)
at java.lang.ProcessImpl.<init>(Unknown Source)
at java.lang.ProcessImpl.start(Unknown Source)
... 5 more
It seems like the problem should be that mysqlimport.exe is not installed or is not installed in the correct place, but I have tried downloading the mySQL utilities from http://dev.mysql.com/downloads/windows/installer/5.6.html and from https://dev.mysql.com/downloads/utilities/.
In order to make sure the problem was not that it could not find the file at "txtpath," I typed the full path into the command prompt, and the correct file was opened, so the error is definitely referring to mysqlimport.exe.
Googling my problem, the only threads I've been able to find refer to something called "Sqoop" which I am not familiar with, and they usually recommend downloading the mysql utilities.
For more context, I have been using BCP to upload data from a txt file to a sql server database, but now I need to do the same thing with mysql. If there is any way to use BCP (I'm pretty sure there isn't) or something else to bulk upload data from a local file I would be open to hearing that as well.
EDIT:
I am using Windows 8 on a remote desktop. I have manually added mysqlimport.exe to the PATH environment variable and it still gives the same error.
If you are on Windows, then try putting mysqlimport.exe as the command rather than just mysqlimport. If that still does not solve your problem, make sure that 'mysqlimport.exe' is in your PATH environment variable.

Exception during wordcount in Hadoop

I have installed Hadoop successfully and now I want to run Wordcount.jar. As shown below, my source address is /user/amir/dft/pg5000.txt and destination address to save results is /user/amir/dft/output.txt.
I have downloaded the .jar file from this url.
Now I'm facing this error message when I run the below command. I followed the instructions found at this url and now my problem is on "Run the MapReduce job" step. How can I overcome it?
amir#amir-Aspire-5820TG:/usr/local/hadoop$ bin/hadoop jar /usr/local/hadoop/wordcount.jar wordcount /user/amir/dft/pg5000.txt /user/amir/dft/output.txt
Exception in thread "main" java.lang.ClassNotFoundException: wordcount
at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
at java.lang.ClassLoader.loadClass(ClassLoader.java:323)
at java.lang.ClassLoader.loadClass(ClassLoader.java:268)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:270)
at org.apache.hadoop.util.RunJar.main(RunJar.java:205)
amir#amir-Aspire-5820TG:/usr/local/hadoop$
It means u have a typo or somthing wrong with the main class your specifying. Do u mean org.apache.hadoop.examples.WordCount instead of wordcount.
You don't need to download a new .jar file. A wordcount jar is already there in the examples of hadoop. Just use the command:
bin/hadoop jar hadoop*examples*.jar wordcount /user/amir/dft /user/amir/dft-output
The input and output paths should be directories on HDFS, not files. This will run the wordcount program on all the files that are uploaded on HDFS under the /user/amir/dft/ path, (including your pg5000.txt file).
EDIT: If you want to run this specific jar that you have downloaded, though, follow #samthebest's answer (keeping in mind that the input&output paths are directories).
EDIT2: Following the comments of this answer, it seems that the hadoop version used is newer than the one described in the tutorial. So the .jar for the wordcount program is located at the path hadoop_root/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar, as mentioned in this post.

Error while install hadoop on mac

I was trying to install hadoob on mac. I got the following error. What could be the issue?
hadoop-0.20.203.0 administrator$ bin/hadoop jar hadoop-*-examples.jar grep input output 'dfs[a-z.]+'
Exception in thread "main" java.io.IOException: Error opening job jar: hadoop-*-examples.jar
at org.apache.hadoop.util.RunJar.main(RunJar.java:90)
Caused by: java.util.zip.ZipException: error in opening zip file
at java.util.zip.ZipFile.open(Native Method)
at java.util.zip.ZipFile.<init>(ZipFile.java:127)
at java.util.jar.JarFile.<init>(JarFile.java:135)
at java.util.jar.JarFile.<init>(JarFile.java:72)
at org.apache.hadoop.util.RunJar.main(RunJar.java:88)
-Anish-
I had a similar issue with an example from the book Hadoop In Action. It turns out either the book had it wrong or the example jars now have a different naming scheme. In any case, your command should now begin with:
bin/hadoop jar hadoop-examples-*.jar

Categories