ClassNotFoundException error while interacting with database - java

I was trying to run a mapreduce program which runs on top of an underlying database. When I installed a distribution of hadoop which is available in hadoop downloads. The programs worked fine for this distribution. But when I compiled my own distribution of hadoop and tried to run the same programs I am getting the below error. I followed the procedures like putting the mysql connector jar in the hadoop/lib directory and putting one in the distributed cache. While these procedures worked for the distribution which was available under hadoop downloads but they did not work for the distribution which I created.
Can anyone kindly tell what might have gone wrong ? I tried all other ways like updating classpath and HADOOP_CLASSPATH variable but none worked.
hduser#ramanujan:~$ hadoop jar SimpleConn.jar
13/04/15 13:50:16 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
13/04/15 13:50:17 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is inited.
13/04/15 13:50:17 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is started.
13/04/15 13:50:17 INFO mapreduce.JobSubmitter: Cleaning up the staging area /tmp/hadoop-yarn/staging/hduser/.staging/job_1366013851608_0001
Exception in thread "main" java.lang.RuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: com.mysql.jdbc.Driver
at org.apache.hadoop.mapreduce.lib.db.DBInputFormat.setConf(DBInputFormat.java:169)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:70)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:130)
at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:470)
at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:490)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:387)
at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1218)
at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1215)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1489)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1215)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1236)
at DBCountPageView.run(DBCountPageView.java:227)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at DBCountPageView.main(DBCountPageView.java:236)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: com.mysql.jdbc.Driver
at org.apache.hadoop.mapreduce.lib.db.DBInputFormat.getConnection(DBInputFormat.java:195)
at org.apache.hadoop.mapreduce.lib.db.DBInputFormat.setConf(DBInputFormat.java:163)
... 21 more
Caused by: java.lang.ClassNotFoundException: com.mysql.jdbc.Driver
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:188)
at org.apache.hadoop.mapreduce.lib.db.DBConfiguration.getConnection(DBConfiguration.java:148)
at org.apache.hadoop.mapreduce.lib.db.DBInputFormat.getConnection(DBInputFormat.java:189)
... 22 more

Be sure to add any dependencies to both the HADOOP_CLASSPATH and -libjars upon submitting a job like in the following examples:
Use the following to add all the jar dependencies from current and lib directories:
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:`echo *.jar`:`echo lib/*.jar | sed 's/ /:/g'`
Bear in mind that when starting a job through hadoop jar you'll need to also pass it the jars of any dependencies through use of -libjars. I like to use:
hadoop jar <jar> <class> -libjars `echo ./lib/*.jar | sed 's/ /,/g'` [args...]
NOTE: The sed commands require a different delimiter character; the HADOOP_CLASSPATH is : separated and the -libjars need to be , separated.

Related

How to solve java.lang.ClassNotFoundException: org.apache.tinkerpop.gremlin.spark.structure.io.gryo.GryoSerializer

I am using tinkerpop + Janus Graph + Spark
build.gradle
compile group: 'org.apache.tinkerpop', name: 'spark-gremlin', version: '3.1.0-incubating'
below is some critical configuration that we have
spark.serializer: org.apache.tinkerpop.gremlin.spark.structure.io.gryo.GryoSerializer
In the logs corresponding long entry which refer the jar containing the above class is loaded
{"#timestamp":"2020-02-18T07:24:21.720+00:00","#version":1,"message":"Added JAR /opt/data/janusgraph/applib2/spark-gremlin-827a65ae26.jar at spark://gdp-identity-stage.target.com:38876/jars/spark-gremlin-827a65ae26.jar with timestamp 1582010661720","logger_name":"o.a.s.SparkContext","thread_name":"SparkGraphComputer-boss","level":"INFO","level_value":20000}
but my spark job submitted by SparkGraphComputer is failed, when we see executor logs, we saw
Caused by: java.lang.ClassNotFoundException: org.apache.tinkerpop.gremlin.spark.structure.io.gryo.GryoSerializer
Why this exception is coming even though the corresponding jar is loaded?
Anyone, please suggest on this.
As I mention seeing this exception in spark executor when I opened one of the worker logs below complete exception
Spark Executor Command: "/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.222.b10-0.el7_6.x86_64/bin/java" "-cp" "/opt/spark/spark-2.4.0/conf/:/opt/spark/spark-2.4.0/jars/*:/opt/hadoop/hadoop-3_1_1/etc/hadoop/" "-Xmx56320M" "-Dspark.driver.port=43137" "-XX:+UseG1GC" "-XX:+PrintGCDetails" "-XX:+PrintGCTimeStamps" "-Xloggc:/opt/spark/gc.log" "-Dtinkerpop.gremlin.io.kryoShimService=org.apache.tinkerpop.gremlin.hadoop.structure.io.HadoopPoolShimService" "org.apache.spark.executor.CoarseGrainedExecutorBackend" "--driver-url" "spark://CoarseGrainedScheduler#gdp-identity-stage.target.com:43137" "--executor-id" "43392" "--hostname" "192.168.192.10" "--cores" "6" "--app-id" "app-20200220094335-0001" "--worker-url" "spark://Worker#192.168.192.10:36845"
========================================
Exception in thread "main" java.lang.reflect.UndeclaredThrowableException
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1713)
at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:64)
at org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:188)
at org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:281)
at org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.tinkerpop.gremlin.spark.structure.io.gryo.GryoSerializer
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.spark.util.Utils$.classForName(Utils.scala:238)
at org.apache.spark.SparkEnv$.instantiateClass$1(SparkEnv.scala:259)
at org.apache.spark.SparkEnv$.instantiateClassFromConf$1(SparkEnv.scala:280)
at org.apache.spark.SparkEnv$.create(SparkEnv.scala:283)
at org.apache.spark.SparkEnv$.createExecutorEnv(SparkEnv.scala:200)
at org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$run$1.apply$mcV$sp(CoarseGrainedExecutorBackend.scala:221)
at org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(SparkHadoopUtil.scala:65)
at org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(SparkHadoopUtil.scala:64)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
... 4 more
when I am setting the spark. jars property on graph, am passing this jar location also
Jar which we created from the application is of fat jar type means it contains the actual code and all the required dependencies also, please see below screenshots .
If you look at the logs, you see this
java" "-cp" "/opt/spark/spark-2.4.0/conf/:/opt/spark/spark-2.4.0/jars/*:/opt/hadoop/hadoop-3_1_1/etc/hadoop/"
Unless you have the gremlin JARs in your /opt/spark/spark-2.4.0/jars/* folder on each Spark worker, then the class you're using doesn't exist.
The recommended way to include it for your specific application would be the Gradle Shadow plugin rather than --packages or spark.jars

hadoop NoClassDefFoundError despite DistributedCache setting

I am trying to get rid of some NoClassDefFoundError due to some jars not found at run time. So I put in my hdfs system some lib and I call and I put this
String lib = "/path/to/lib";
Path hdfsJar = new Path(lib);
DistributedCache.addFileToClassPath(hdfsJar, conf);
Now, I am still getting the error. However, if I set the jars in the $HADOOP_CLASSPATH. Am I doing wrong with the DistributedCache call ?
edit :
java.lang.RuntimeException: java.lang.NoClassDefFoundError: gov/nih/nlm/nls/metamap/MetaMapApi
at org.apache.hadoop.mapreduce.lib.chain.Chain.joinAllThreads(Chain.java:526)
at org.apache.hadoop.mapreduce.lib.chain.ChainMapper.run(ChainMapper.java:169)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.NoClassDefFoundError: gov/nih/nlm/nls/metamap/MetaMapApi
at org.avrosation.metamap.ChainMetaProcess$TokenizerMapper.map(ChainMetaProcess.java:25)
at org.avrosation.metamap.ChainMetaProcess$TokenizerMapper.map(ChainMetaProcess.java:16)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.hadoop.mapreduce.lib.chain.Chain$MapRunner.run(Chain.java:321)
Caused by: java.lang.ClassNotFoundException: gov.nih.nlm.nls.metamap.MetaMapApi
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
Try providing fully qualified HDFS path. try below code:
Make sure you upload jar to HDFS (any location on hdfs, i am assuming /tmp).
hadoop fs -copyFromLocal my.jar /tmp
Then edit your java code like :
String lib = "hdfs://localhost:9000/tmp/my.jar";
Path hdfsJar = new Path(lib);
DistributedCache.addFileToClassPath(hdfsJar, conf);
This doc detail about distributed cache : https://hadoop.apache.org/docs/stable/api/org/apache/hadoop/filecache/DistributedCache.html
In fact, I suspected from the beginning some problem with my ide IntelliJ Idea 14 because I had to deal with a major refactoring of the code which then made the code reveal the issue. To begin I tried to clean the building with no success, then I simply created an other project and copy-pasted the classes and libraries' import and that made the trick !

GridGain Error : java.lang.NoClassDefFoundError: org/apache/hadoop/mapreduce/JobContext

I have configured gridgain-hadoop-os-6.6.2.zip, and followed steps as mentioned in docs/hadoop_readme.pdf . started gridgain using bin/ggstart.sh command, now am running a simple wordcount code in gridgain with hadoop-2.2.0. using command
hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/*-mapreduce-examples-*.jar wordcount /input /output
Steps I tried:
Step 1:
Extracted hadoop-2.2.0 and gridgain-hadoop-os-6.6.2.zip file in usr/local folder and changed name for gridgain folder as "gridgain".
Step 2:
Set the path for export GRIDGAIN_HOME=/usr/local/gridgain.. and path for hadoop-2.2.0 with JAVA_HOME as
# Set Hadoop-related environment variables
export HADOOP_PREFIX=/usr/local/hadoop-2.2.0
export HADOOP_HOME=/usr/local/hadoop-2.2.0
export HADOOP_MAPRED_HOME=/usr/local/hadoop-2.2.0
export HADOOP_COMMON_HOME=/usr/local/hadoop-2.2.0
export HADOOP_HDFS_HOME=/usr/local/hadoop-2.2.0
export YARN_HOME=/usr/local/hadoop-2.2.0
export HADOOP_CONF_DIR=/usr/local/hadoop-2.2.0/etc/hadoop
export GRIDGAIN_HADOOP_CLASSPATH='/usr/local/hadoop-2.2.0/lib/*:/usr/local/hadoop-2.2.0/lib/*:/usr/local/hadoop-2.2.0/lib/*'
Step 3:
now i run command as bin/setup-hadoop.sh ... answer Y to every prompt.
Step 4:
started gridgain using command
bin/ggstart.sh
Step 5:
now i created dir and uploaded file using :
hadoop fs -mkdir /input
hadoop fs -copyFromLocal $HADOOP_HOME/README.txt /input/WORD_COUNT_ME.
txt
Step 6:
Running this command gives me error:
hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/*-mapreduce-examples-*.
jar wordcount /input /output
Getting following error:
15/02/22 12:49:13 INFO Configuration.deprecation: mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir
15/02/22 12:49:13 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_091ebfbd-2993-475f-a506-28280dbbf891_0002
15/02/22 12:49:13 INFO mapreduce.JobSubmitter: Cleaning up the staging area /tmp/hadoop-yarn/staging/hduser/.staging/job_091ebfbd-2993-475f-a506-28280dbbf891_0002
java.lang.NullPointerException
at org.gridgain.client.hadoop.GridHadoopClientProtocol.processStatus(GridHadoopClientProtocol.java:329)
at org.gridgain.client.hadoop.GridHadoopClientProtocol.submitJob(GridHadoopClientProtocol.java:115)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:430)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1268)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1265)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1265)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1286)
at org.apache.hadoop.examples.WordCount.main(WordCount.java:84)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:74)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
and gridgain console error as:
sLdrId=a0b8610bb41-091ebfbd-2993-475f-a506-28280dbbf891, userVer=0, loc=true, sampleClsName=java.lang.String, pendingUndeploy=false, undeployed=false, usage=0]], taskClsName=o.g.g.kernal.processors.hadoop.proto.GridHadoopProtocolSubmitJobTask, sesId=e129610bb41-091ebfbd-2993-475f-a506-28280dbbf891, startTime=1424589553332, endTime=9223372036854775807, taskNodeId=091ebfbd-2993-475f-a506-28280dbbf891, clsLdr=sun.misc.Launcher$AppClassLoader#1bdcbb2, closed=false, cpSpi=null, failSpi=null, loadSpi=null, usage=1, fullSup=false, subjId=091ebfbd-2993-475f-a506-28280dbbf891], jobId=f129610bb41-091ebfbd-2993-475f-a506-28280dbbf891]]
java.lang.NoClassDefFoundError: org/apache/hadoop/mapreduce/JobContext
at java.lang.Class.getDeclaredConstructors0(Native Method)
at java.lang.Class.privateGetDeclaredConstructors(Class.java:2585)
at java.lang.Class.getConstructor0(Class.java:2885)
at java.lang.Class.getConstructor(Class.java:1723)
at org.gridgain.grid.hadoop.GridHadoopDefaultJobInfo.createJob(GridHadoopDefaultJobInfo.java:107)
at org.gridgain.grid.kernal.processors.hadoop.jobtracker.GridHadoopJobTracker.job(GridHadoopJobTracker.java:959)
at org.gridgain.grid.kernal.processors.hadoop.jobtracker.GridHadoopJobTracker.submit(GridHadoopJobTracker.java:222)
at org.gridgain.grid.kernal.processors.hadoop.GridHadoopProcessor.submit(GridHadoopProcessor.java:188)
at org.gridgain.grid.kernal.processors.hadoop.GridHadoopImpl.submit(GridHadoopImpl.java:73)
at org.gridgain.grid.kernal.processors.hadoop.proto.GridHadoopProtocolSubmitJobTask.run(GridHadoopProtocolSubmitJobTask.java:54)
at org.gridgain.grid.kernal.processors.hadoop.proto.GridHadoopProtocolSubmitJobTask.run(GridHadoopProtocolSubmitJobTask.java:37)
at org.gridgain.grid.kernal.processors.hadoop.proto.GridHadoopProtocolTaskAdapter$Job.execute(GridHadoopProtocolTaskAdapter.java:95)
at org.gridgain.grid.kernal.processors.job.GridJobWorker$2.call(GridJobWorker.java:484)
at org.gridgain.grid.util.GridUtils.wrapThreadLoader(GridUtils.java:6136)
at org.gridgain.grid.kernal.processors.job.GridJobWorker.execute0(GridJobWorker.java:478)
at org.gridgain.grid.kernal.processors.job.GridJobWorker.body(GridJobWorker.java:429)
at org.gridgain.grid.util.worker.GridWorker.run(GridWorker.java:151)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException: Failed to load class: org.apache.hadoop.mapreduce.JobContext
at org.gridgain.grid.kernal.processors.hadoop.GridHadoopClassLoader.loadClass(GridHadoopClassLoader.java:125)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
... 20 more
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.mapreduce.JobContext
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at org.gridgain.grid.kernal.processors.hadoop.GridHadoopClassLoader.loadClassExplicitly(GridHadoopClassLoader.java:196)
at org.gridgain.grid.kernal.processors.hadoop.GridHadoopClassLoader.loadClass(GridHadoopClassLoader.java:106)
... 21 more
^[[B
Help here....
Edited Here :
raj#ubuntu:~$ hadoop classpath
/usr/local/hadoop-2.2.0/etc/hadoop:/usr/local/hadoop-2.2.0/share/hadoop/common/lib/*:/usr/local/hadoop-2.2.0/share/hadoop/common/*:/usr/local/hadoop-2.2.0/share/hadoop/hdfs:/usr/local/hadoop-2.2.0/share/hadoop/hdfs/lib/*:/usr/local/hadoop-2.2.0/share/hadoop/hdfs/*:/usr/local/hadoop-2.2.0/share/hadoop/yarn/lib/*:/usr/local/hadoop-2.2.0/share/hadoop/yarn/*:/usr/local/hadoop-2.2.0/share/hadoop/mapreduce/lib/*:/usr/local/hadoop-2.2.0/share/hadoop/mapreduce/*:/usr/local/hadoop-2.2.0/contrib/capacity-scheduler/*.jar
raj#ubuntu:~$ jps
3529 GridCommandLineStartup
3646 Jps
raj#ubuntu:~$ echo $GRIDGAIN_HOME
/usr/local/gridgain
raj#ubuntu:~$ echo $HADOOP_HOME
/usr/local/hadoop-2.2.0
raj#ubuntu:~$ hadoop version
Hadoop 2.2.0
Subversion https://svn.apache.org/repos/asf/hadoop/common -r 1529768
Compiled by hortonmu on 2013-10-07T06:28Z
Compiled with protoc 2.5.0
From source with checksum 79e53ce7994d1628b240f09af91e1af4
This command was run using /usr/local/hadoop-2.2.0/share/hadoop/common/hadoop-common-2.2.0.jar
raj#ubuntu:~$ cd /usr/local/hadoop-2.2.0/share/hadoop/mapreduce
raj#ubuntu:/usr/local/hadoop-2.2.0/share/hadoop/mapreduce$ ls
hadoop-mapreduce-client-app-2.2.0.jar hadoop-mapreduce-client-hs-2.2.0.jar hadoop-mapreduce-client-jobclient-2.2.0-tests.jar lib
hadoop-mapreduce-client-common-2.2.0.jar hadoop-mapreduce-client-hs-plugins-2.2.0.jar hadoop-mapreduce-client-shuffle-2.2.0.jar lib-examples
hadoop-mapreduce-client-core-2.2.0.jar hadoop-mapreduce-client-jobclient-2.2.0.jar hadoop-mapreduce-examples-2.2.0.jar sources
raj#ubuntu:/usr/local/hadoop-2.2.0/share/hadoop/mapreduce$
I configured exactly the versions you mentioned (gridgain-hadoop-os-6.6.2.zip + hadoop-2.2.0) -- the "wordcount" sample works fine.
[UPD after question's author log analysis:]
Raju, thanks for the detailed logs.
The cause of the problem are incorrectly set env variables
export HADOOP_MAPRED_HOME=${HADOOP_HOME}
export HADOOP_COMMON_HOME=${HADOOP_HOME}
export HADOOP_HDFS_HOME=${HADOOP_HOME}
You explicitly set all these variables to ${HADOOP_HOME} value, what is wrong. This causes GG to compose incorrect hadoop classpath, as seen from the below GG Node log:
+++ HADOOP_PREFIX=/usr/local/hadoop-2.2.0
+++ [[ -z /usr/local/hadoop-2.2.0 ]]
+++ '[' -z /usr/local/hadoop-2.2.0 ']'
+++ HADOOP_COMMON_HOME=/usr/local/hadoop-2.2.0
+++ HADOOP_HDFS_HOME=/usr/local/hadoop-2.2.0
+++ HADOOP_MAPRED_HOME=/usr/local/hadoop-2.2.0
+++ GRIDGAIN_HADOOP_CLASSPATH='/usr/local/hadoop-2.2.0/lib/*:/usr/local/hadoop-2.2.0/lib/*:/usr/local/hadoop-2.2.0/lib/*'
So, to fix the issue please don't set unnecessary env variables. JAVA_HOME and HADOOP_HOME is quite enough, nothing else is needed.
many thnaks to Ivan, thanks for your help and support, the solution you gave was good to get me out of the problem.
The issue was not to set other hadoop related environment variables. this is enough to set.
JAVA_HOME , HADOOP_HOME and GRIDGAIN_HOME

My cdh5.2 cluster get FileNotFoundException when running hbase MR jobs

My cdh5.2 cluster has a problem to run hbase MR jobs.
For example, I added the hbase classpath into the hadoop classpath:
vi /etc/hadoop/conf/hadoop-env.sh
add the line:
export HADOOP_CLASSPATH="/usr/lib/hbase/bin/hbase classpath:$HADOOP_CLASSPATH"
And when I am running:
hadoop jar /usr/lib/hbase/hbase-server-0.98.6-cdh5.2.1.jar rowcounter "mytable"
I get the following exception:
14/12/09 03:44:02 WARN security.UserGroupInformation: PriviledgedActionException as:root (auth:SIMPLE) cause:java.io.FileNotFoundException: File does not exist: hdfs://clusterName/usr/lib/hbase/lib/hbase-client-0.98.6-cdh5.2.1.jar
Exception in thread "main" java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.hbase.mapreduce.Driver.main(Driver.java:54)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Caused by: java.io.FileNotFoundException: File does not exist: hdfs://clusterName/usr/lib/hbase/lib/hbase-client-0.98.6-cdh5.2.1.jar
at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1083)
at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1075)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1075)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:288)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:224)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:93)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestampsAndCacheVisibilities(ClientDistributedCacheManager.java:57)
at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:265)
at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:301)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:394)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1295)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1292)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1292)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1313)
at org.apache.hadoop.hbase.mapreduce.RowCounter.main(RowCounter.java:191)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:145)
at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:153)
Even I have the same problem with CDH 5.2.0. as a work around,
I manually copied the jar file in to hdfs, then exceptions are not coming
So, The problem was environment issue:
When I added the below jars into /usr/lib/hadoop/lib. all worked fine
hbase-client-0.98.6-cdh5.2.1.jar
hbase-common-0.98.6-cdh5.2.1.jar
hbase-protocol-0.98.6-cdh5.2.1.jar
hbase-server-0.98.6-cdh5.2.1.jar
hbase-prefix-tree-0.98.6-cdh5.2.1.jar
hadoop-core-2.5.0-mr1-cdh5.2.1.jar
htrace-core-2.04.jar
My machine has the following rpms in it:
>> rpm -qa | grep cdh
zookeeper-3.4.5+cdh5.2.1+84-1.cdh5.2.1.p0.13.el6.x86_64
hadoop-2.5.0+cdh5.2.1+578-1.cdh5.2.1.p0.14.el6.x86_64
hadoop-0.20-mapreduce-2.5.0+cdh5.2.1+578-1.cdh5.2.1.p0.14.el6.x86_64
hbase-regionserver-0.98.6+cdh5.2.1+64-1.cdh5.2.1.p0.9.el6.x86_64
cloudera-cdh-5-0.x86_64
bigtop-utils-0.7.0+cdh5.2.1+0-1.cdh5.2.1.p0.13.el6.noarch
bigtop-jsvc-0.6.0+cdh5.2.1+578-1.cdh5.2.1.p0.13.el6.x86_64
parquet-1.5.0+cdh5.2.1+38-1.cdh5.2.1.p0.12.el6.noarch
hadoop-hdfs-2.5.0+cdh5.2.1+578-1.cdh5.2.1.p0.14.el6.x86_64
hadoop-mapreduce-2.5.0+cdh5.2.1+578-1.cdh5.2.1.p0.14.el6.x86_64
hadoop-0.20-mapreduce-tasktracker-2.5.0+cdh5.2.1+578-1.cdh5.2.1.p0.14.el6.x86_64
hbase-0.98.6+cdh5.2.1+64-1.cdh5.2.1.p0.9.el6.x86_64
avro-libs-1.7.6+cdh5.2.1+69-1.cdh5.2.1.p0.13.el6.noarch
parquet-format-2.1.0+cdh5.2.1+6-1.cdh5.2.1.p0.14.el6.noarch
hadoop-yarn-2.5.0+cdh5.2.1+578-1.cdh5.2.1.p0.14.el6.x86_64
hadoop-hdfs-datanode-2.5.0+cdh5.2.1+578-1.cdh5.2.1.p0.14.el6.x86_64
I still wonder which rpm is missing.
Instead of adding the file manually to HDFS, you can add the hbase library path to the .bashrc file. Add the lib folder in hbase to the CLASSPATH.
Also, add the classpath of hbase to HADOOP_CLASSPATH.
Your .bashrc file should contain the following:
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:`${HBASE_HOME}/bin/hbase classpath`
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:`${HBASE_HOME}/bin/hbase mapredcp`
export CLASSPATH=${HBASE_HOME}/lib/*
Note: The CLASSPATH should point to the lib folder of your hbase installation folder. Use the following to compile and run your java code.
javac Example.java
java -classpath $CLASSPATH:. Example

Pentaho RowListener ClassNotFoundException

I am trying to perform map reduce on Pentaho 5. For Pentaho 5, the Pentaho applications come pre-configured for Apache Hadoop 0.20.2 and it says no further configuration is required for this version. I installed Hadoop 0.20.2 on windows using cygwin and every thing works fine. I run a simple job on Pentaho, which copies file in HDFS which finished successfully and the file system was copied into HDFS. But as soon as I run map reduce job, it says the job was finished on pentaho but the map reduce task was failed and on the output directory on HDFS the result is missing and the log file says:
Error: java.lang.ClassNotFoundException: org.pentaho.di.trans.step.RowListener
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:270)
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:762)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:807)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:833)
at org.apache.hadoop.mapred.JobConf.getMapRunnerClass(JobConf.java:790)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:354)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
at org.apache.hadoop.mapred.Child.main(Child.java:170
Please Help me out.
Maybe a little bit old but I thought it might help someone.
This can be caused by:
For Hadoop version > 0.20: check if you set up your environment correctly, see Pentaho Support : Creating a New Hadoop Configuration
Check on HDFS if you have an /opt/pentaho/mapreduce/ and check folder permissions on HDFS for /opt/pentaho (Did you find the kettle-*.jar files in lib folder?)
Check classpath separator ("," in Windows, ":" in Linux). In order to change it, edit spoon.sh (or spoon.bat) and modify OPT variable like this: OPT="$OPT -Dhadoop.cluster.path.separator=,"

Categories