How to add OracleDriver to hadoop classpath while executing the job? - java

I'm using hadoop for processing a large amount of data from the database. I'm using oracle's jdbc driver to connect to Oracle DB and do the processing. But when i try to execute the hadoop job via bin/hadoop with the packaged JAR file, it's showing OracleDriver class not found. how do i fix this?
$ bin/hadoop jar ~/hadoop1.jar name.hadoop.Hadoop1 ~/output
Exception in thread "main" java.lang.RuntimeException: Error in configuring object
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
at org.apache.hadoop.mapred.JobConf.getInputFormat(JobConf.java:575)
at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:1051)
at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1043)
at org.apache.hadoop.mapred.JobClient.access$700(JobClient.java:179)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:959)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:912)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1136)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:912)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:886)
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1323)
at name.shahalpk.poc.hadoop.Hadoop1.main(Hadoop1.java:73)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
... 20 more
Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: oracle.jdbc.driver.OracleDriver
at org.apache.hadoop.mapred.lib.db.DBInputFormat.configure(DBInputFormat.java:271)
... 25 more
Caused by: java.lang.ClassNotFoundException: oracle.jdbc.driver.OracleDriver
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:186)
at org.apache.hadoop.mapred.lib.db.DBConfiguration.getConnection(DBConfiguration.java:123)
at org.apache.hadoop.mapred.lib.db.DBInputFormat.configure(DBInputFormat.java:266)
... 25 more

Add ojdbc5.jar into class path:
${JRE_HOME}\jre\lib\ext
Note:
${JRE_HOME} means JRE(Java Runtime Environment) Installed Directory; Like below
${JRE_HOME}=C:\Program Files\Java\jre6\

You can use the -libjars generic option to achieve this more easily, and it handles distribution of your jar to the cluster nodes too:
$ bin/hadoop jar ~/hadoop1.jar name.hadoop.Hadoop1 -libjars ojdbc5.jar ~/output
This does assume that your main class (name.hadoop.Hadoop1) is using the ToolRunner.run() method to launch your job:
public class Hadoop1 extends Configured implements Tool {
public static void main(String args[]) throws Exception {
ToolRunner.run(new Hadoop1(), args);
}
public int run(String args[]) {
JobConf job = new JobConf(getConf());
// rest of your job init code...
RunningJob rj = JobClient.runJob(job);
rj.waitForCompletion();
return rj.isSuccessful() ? 0 : 1;
}
}
(Hand typed code, apologises for any typos or compile errors)

Related

ClassNotFoundException while running a jar though it contails all tha classes

I have created a jar using eclipse for the mapreduce jobs. If you extract the jar you can see all the class present there. When you run the jar in hadoop``usinghadoop` command it shows error shown below.
Its not able to recognize only one class i.e Test_project$TwoDArrayWritables. Test_project is the main class and TwoDArrayWritables is a class within Test_project. TwoDArrayWritables inherit TwoDArrayWritable built-in class in hadoop.
Jar extract Image:
Error:
16/04/05 15:48:28 INFO mapred.JobClient: Task Id : attempt_201604051120_0002_m_000000_1, Status : FAILED
java.lang.RuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: mapreduce.Test_project$TwoDArrayWritables
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:889)
at org.apache.hadoop.mapred.JobConf.getMapOutputValueClass(JobConf.java:747)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.<init>(MapTask.java:966)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:422)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: mapreduce.Test_project$TwoDArrayWritables
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:857)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:881)
... 9 more
Caused by: java.lang.ClassNotFoundException: mapreduce.Test_project$TwoDArrayWritables
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:810)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:855)
... 10 more
16/04/05 15:48:34 INFO mapred.JobClient: Task Id : attempt_201604051120_0002_m_000000_2, Status : FAILED
java.lang.RuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: mapreduce.Test_project$TwoDArrayWritables
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:889)
at org.apache.hadoop.mapred.JobConf.getMapOutputValueClass(JobConf.java:747)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.<init>(MapTask.java:966)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:422)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: mapreduce.Test_project$TwoDArrayWritables
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:857)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:881)
... 9 more
Caused by: java.lang.ClassNotFoundException: mapreduce.Test_project$TwoDArrayWritables
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:810)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:855)
... 10 more
Googled many solutions but nothing worked. Please help!
try this: bin/hadoop jar /path/to/jarfile/newproj.jar Test_project.TwoDArrayWritables /user/hduser/input /user/hduser/output1
Struggling for about weeks , something clicked in my mind i.e
I was using two reducers in my job so defining two Jobconf for each:
My earlier(wrong code)
JobConf conf = new JobConf(getConf(), Test_project.class);
JobConf conf2 = new JobConf(getConf());
As I thought that configuration is already defined so didn't mention class Test_project.class in conf2
My present (Correct Code)
JobConf conf = new JobConf(getConf(), Test_project.class);
JobConf conf2 = new JobConf(getConf(), Test_project.class);
The error was thrown because on execution it was searching for Test_project$TwoDArrayWritables , as it has no Test_project.class class , it can not locate TwoDArrayWritables
Now it works fine.

mahout kmeans class not found exception

I have configured Hadoop in Pseudo - Distributed mode. I have successfully created sequence-files and tf-idf vectors(using seq2sparse) and am trying to run mahout kmeans from command-line as follows:
mahout kmeans -i /user/mahout/vectors/tfidf-vectors -c
/user/mahout/cluster-centroids -o /user/mahout/kmeans -dm
org.apache.mahout.commom.distance.CosineDistanceMeasure -x 10 -k 20
-ow --clustering -cl
I am getting the following error:
Running on hadoop, using /usr/local/hadoop/bin/hadoop and HADOOP_CONF_DIR=
MAHOUT-JOB: /home/mahout/apache-mahout-distribution-0.10.1/mahout-examples-0.10.1-job.jar
15/07/01 11:14:15 INFO AbstractJob: Command line arguments: {--clustering=null, --clusters=[/user/mahout/cluster-centroids], --convergenceDelta=[0.5], --distanceMeasure=[org.apache.mahout.commom.distance.CosineDistanceMeasure], --endPhase=[2147483647], --input=[/user/mahout/vectors/tfidf-vectors], --maxIter=[10], --method=[mapreduce], --numClusters=[20], --output=[/user/mahout/kmeans], --overwrite=null, --startPhase=[0], --tempDir=[temp]}
15/07/01 11:14:16 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Exception in thread "main" java.lang.IllegalStateException: java.lang.ClassNotFoundException: org.apache.mahout.commom.distance.CosineDistanceMeasure
at org.apache.mahout.common.ClassUtils.instantiateAs(ClassUtils.java:30)
at org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:91)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.mahout.clustering.kmeans.KMeansDriver.main(KMeansDriver.java:47)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71)
at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:152)
at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: java.lang.ClassNotFoundException: org.apache.mahout.commom.distance.CosineDistanceMeasure
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:191)
at org.apache.mahout.common.ClassUtils.instantiateAs(ClassUtils.java:28)
... 17 more
I have tried setting HADOOP_CLASSPATH to required jar files.

Kafka Utils wrong classpath: org.apache.kafka.common.utils.Utils

I'm attempting to make a very simple Kafka Producer and am currently following the producer example except my producer does not have a partitioner class.
After exporting required files into a jar I transfer them to my Linux image and try to run it.
I get this exception:
Exception in thread "main" java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.eclipse.jdt.internal.jarinjarloader.JarRsrcLoader.main(JarRsrcLoader.java:58)
Caused by: java.lang.NoClassDefFoundError: org/apache/kafka/common/utils/Utils
at kafka.client.ClientUtils$$anonfun$parseBrokerList$1.apply(ClientUtils.scala:103)
at kafka.client.ClientUtils$$anonfun$parseBrokerList$1.apply(ClientUtils.scala:102)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:194)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:194)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:60)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:44)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:194)
at scala.collection.mutable.ArrayBuffer.map(ArrayBuffer.scala:44)
at kafka.client.ClientUtils$.parseBrokerList(ClientUtils.scala:102)
at kafka.producer.BrokerPartitionInfo.<init>(BrokerPartitionInfo.scala:32)
at kafka.producer.async.DefaultEventHandler.<init>(DefaultEventHandler.scala:41)
at kafka.producer.Producer.<init>(Producer.scala:60)
at kafka.javaapi.producer.Producer.<init>(Producer.scala:26)
at producers.HelloWorldProducer.main(HelloWorldProducer.java:20)
... 5 more
Caused by: java.lang.ClassNotFoundException: org.apache.kafka.common.utils.Utils
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
... 19 more
After looking at the kafka jar I see that the utils is its own package now and not located within common.
What would be the best way to solve this issue?
The answer ended up being real silly ... I needed to use the kafka-clients-0.8.2.0.jar instead.

Hadoop: ClassNotFoundException $Reduce

I've got a word count example class called by a Jetty server's handle method. The job is submitted with:
JobClient.runJob(conf);
Sadly, the JobTracker sees nothing, the job does not get submitted, but it finishes and creates output. I've found out, that I have to inject my configurations with:
conf.addResource(new Path("/usr/local/hadoop/conf/mapred-site.xml"));
After adding this line, the JobTracker gets the job, but it fails with the following exception:
INFO: Task Id : attempt_201312071601_0005_m_000000_1, Status : FAILED
java.lang.RuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: WC2$Reduce
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:889)
at org.apache.hadoop.mapred.JobConf.getCombinerClass(JobConf.java:1049)
at org.apache.hadoop.mapred.Task$CombinerRunner.create(Task.java:1385)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.<init>(MapTask.java:986)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:422)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: WC2$Reduce
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:857)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:881)
... 10 more
Caused by: java.lang.ClassNotFoundException: WC2$Reduce
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:270)
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:810)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:855)
... 11 more
My combiner class is the same as the reducer class. I've downloaded the Hadoop WordCount example without modifications. Only the mentioned addResource.

Not able to use third party jar in hadoop java.lang.NoClassDefFoundError

I am trying to join two datasets.. One of which is json..
I am relying on json-simple library to parse that json..
I am trying to use libjars.. So far .. for simple data processing.. the approach has worked.. but now i am getting the following error
Exception in thread "main" java.lang.NoClassDefFoundError: org/json/simple/parser/ParseException
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:264)
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:820)
at org.apache.hadoop.mapreduce.lib.input.MultipleInputs.getMapperTypeMap(MultipleInputs.java:141)
at org.apache.hadoop.mapreduce.lib.input.DelegatingInputFormat.getSplits(DelegatingInputFormat.java:60)
at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:1024)
at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1041)
at org.apache.hadoop.mapred.JobClient.access$700(JobClient.java:179)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:959)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:912)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1136)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:912)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:500)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:530)
at org.select.Driver.run(Driver.java:130)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.select.Driver.main(Driver.java:139)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
Caused by: java.lang.ClassNotFoundException: org.json.simple.parser.ParseException
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
I think I have implemented toolrunner.
hadoop jar domain_gold.jar org.select.Driver \
-libjars json-simple-1.1.1.jar $INPUT1 $INPUT2 $OUTPUT
The code
http://pastebin.com/7XnyVnkv
Place your third party JAR file on one of hadoop's default lib folder. For example, $HADOOP_HOME/share/hadoop/common/lib.

Categories