mahout kmeans class not found exception - java

I have configured Hadoop in Pseudo - Distributed mode. I have successfully created sequence-files and tf-idf vectors(using seq2sparse) and am trying to run mahout kmeans from command-line as follows:
mahout kmeans -i /user/mahout/vectors/tfidf-vectors -c
/user/mahout/cluster-centroids -o /user/mahout/kmeans -dm
org.apache.mahout.commom.distance.CosineDistanceMeasure -x 10 -k 20
-ow --clustering -cl
I am getting the following error:
Running on hadoop, using /usr/local/hadoop/bin/hadoop and HADOOP_CONF_DIR=
MAHOUT-JOB: /home/mahout/apache-mahout-distribution-0.10.1/mahout-examples-0.10.1-job.jar
15/07/01 11:14:15 INFO AbstractJob: Command line arguments: {--clustering=null, --clusters=[/user/mahout/cluster-centroids], --convergenceDelta=[0.5], --distanceMeasure=[org.apache.mahout.commom.distance.CosineDistanceMeasure], --endPhase=[2147483647], --input=[/user/mahout/vectors/tfidf-vectors], --maxIter=[10], --method=[mapreduce], --numClusters=[20], --output=[/user/mahout/kmeans], --overwrite=null, --startPhase=[0], --tempDir=[temp]}
15/07/01 11:14:16 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Exception in thread "main" java.lang.IllegalStateException: java.lang.ClassNotFoundException: org.apache.mahout.commom.distance.CosineDistanceMeasure
at org.apache.mahout.common.ClassUtils.instantiateAs(ClassUtils.java:30)
at org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:91)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.mahout.clustering.kmeans.KMeansDriver.main(KMeansDriver.java:47)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71)
at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:152)
at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: java.lang.ClassNotFoundException: org.apache.mahout.commom.distance.CosineDistanceMeasure
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:191)
at org.apache.mahout.common.ClassUtils.instantiateAs(ClassUtils.java:28)
... 17 more
I have tried setting HADOOP_CLASSPATH to required jar files.

Related

Caused by: java.lang.ClassNotFoundException: org.apache.spark.deploy.SparkHadoopUtil

I am trying to run a java jar from a Hadoop command as seen below
hadoop jar <jar>
but I am getting an exception as seen below
Exception in thread "main" java.lang.NoClassDefFoundError:org/apache/spark/deploy/SparkHadoopUtil
......
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:226)
at org.apache.hadoop.util.RunJar.main(RunJar.java:141)
Caused by: java.lang.ClassNotFoundException: org.apache.spark.deploy.SparkHadoopUtil
at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
what do I need to add to the command to get through the error?

VS Java hadoop WordCount error: Exception in thread "main" java.lang.UnsatisfiedLinkError

I compiled my program, did "jar cf WordCount.jar WordCount*.class" on the class file generated in the bin file, then "hadoop jar ./WordCount.jar WordCount \Users\blast\Desktop\input \Users\blast\Desktop\output\output1". But when I did that this is what was generated. Thanks.
2022-02-28 21:58:00,538 INFO
client.DefaultNoHARMFailoverProxyProvider: Connecting to
ResourceManager at /0.0.0.0:8032 Exception in thread "main"
java.lang.UnsatisfiedLinkError:
org.apache.hadoop.io.nativeio.NativeIO$POSIX.stat(Ljava/lang/String;)Lorg/apache/hadoop/io/nativeio/NativeIO$POSIX$Stat;
at org.apache.hadoop.io.nativeio.NativeIO$POSIX.stat(Native Method)
at org.apache.hadoop.io.nativeio.NativeIO$POSIX.getStat(NativeIO.java:608)
at org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.loadPermissionInfoByNativeIO(RawLocalFileSystem.java:934)
at org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.loadPermissionInfo(RawLocalFileSystem.java:848)
at org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.getOwner(RawLocalFileSystem.java:824)
at org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:137)
at org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:113)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:148)
at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1571)
at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1568)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1568)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1589)
at WordCount.main(WordCount.java:59)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:323)
at org.apache.hadoop.util.RunJar.main(RunJar.java:236)

Java 7 to migration compatibility issue

After migrating the jobs from java 7 to 8 getting following error,even though compilation is successful.
Same version of java i.e java 8 was used in compilation and the same version is being used while runtime. So typically there should be any mismatch in the version of java.Even scala version are same.
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.JavaMain], main() threw exception, java.lang.NoSuchMethodError: scala.Predef$.ArrowAssoc(Ljava/lang/Object;)Ljava/lang/Object;
org.apache.oozie.action.hadoop.JavaMainException: java.lang.NoSuchMethodError: scala.Predef$.ArrowAssoc(Ljava/lang/Object;)Ljava/lang/Object;
at org.apache.oozie.action.hadoop.JavaMain.run(JavaMain.java:59)
at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:51)
at org.apache.oozie.action.hadoop.JavaMain.main(JavaMain.java:35)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:242)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
Caused by: java.lang.NoSuchMethodError: scala.Predef$.ArrowAssoc(Ljava/lang/Object;)Ljava/lang/Object;
at com.twitter.scalding.RichXHandler$.<init>(XHandler.scala:38)
at com.twitter.scalding.RichXHandler$.<clinit>(XHandler.scala)
at com.twitter.scalding.Tool$.main(Tool.scala:152)
at com.twitter.scalding.Tool.main(Tool.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.oozie.action.hadoop.JavaMain.run(JavaMain.java:56)

Kafka Utils wrong classpath: org.apache.kafka.common.utils.Utils

I'm attempting to make a very simple Kafka Producer and am currently following the producer example except my producer does not have a partitioner class.
After exporting required files into a jar I transfer them to my Linux image and try to run it.
I get this exception:
Exception in thread "main" java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.eclipse.jdt.internal.jarinjarloader.JarRsrcLoader.main(JarRsrcLoader.java:58)
Caused by: java.lang.NoClassDefFoundError: org/apache/kafka/common/utils/Utils
at kafka.client.ClientUtils$$anonfun$parseBrokerList$1.apply(ClientUtils.scala:103)
at kafka.client.ClientUtils$$anonfun$parseBrokerList$1.apply(ClientUtils.scala:102)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:194)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:194)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:60)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:44)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:194)
at scala.collection.mutable.ArrayBuffer.map(ArrayBuffer.scala:44)
at kafka.client.ClientUtils$.parseBrokerList(ClientUtils.scala:102)
at kafka.producer.BrokerPartitionInfo.<init>(BrokerPartitionInfo.scala:32)
at kafka.producer.async.DefaultEventHandler.<init>(DefaultEventHandler.scala:41)
at kafka.producer.Producer.<init>(Producer.scala:60)
at kafka.javaapi.producer.Producer.<init>(Producer.scala:26)
at producers.HelloWorldProducer.main(HelloWorldProducer.java:20)
... 5 more
Caused by: java.lang.ClassNotFoundException: org.apache.kafka.common.utils.Utils
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
... 19 more
After looking at the kafka jar I see that the utils is its own package now and not located within common.
What would be the best way to solve this issue?
The answer ended up being real silly ... I needed to use the kafka-clients-0.8.2.0.jar instead.

Not able to use third party jar in hadoop java.lang.NoClassDefFoundError

I am trying to join two datasets.. One of which is json..
I am relying on json-simple library to parse that json..
I am trying to use libjars.. So far .. for simple data processing.. the approach has worked.. but now i am getting the following error
Exception in thread "main" java.lang.NoClassDefFoundError: org/json/simple/parser/ParseException
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:264)
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:820)
at org.apache.hadoop.mapreduce.lib.input.MultipleInputs.getMapperTypeMap(MultipleInputs.java:141)
at org.apache.hadoop.mapreduce.lib.input.DelegatingInputFormat.getSplits(DelegatingInputFormat.java:60)
at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:1024)
at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1041)
at org.apache.hadoop.mapred.JobClient.access$700(JobClient.java:179)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:959)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:912)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1136)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:912)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:500)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:530)
at org.select.Driver.run(Driver.java:130)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.select.Driver.main(Driver.java:139)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
Caused by: java.lang.ClassNotFoundException: org.json.simple.parser.ParseException
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
I think I have implemented toolrunner.
hadoop jar domain_gold.jar org.select.Driver \
-libjars json-simple-1.1.1.jar $INPUT1 $INPUT2 $OUTPUT
The code
http://pastebin.com/7XnyVnkv
Place your third party JAR file on one of hadoop's default lib folder. For example, $HADOOP_HOME/share/hadoop/common/lib.

Categories