Graphbuilder with Hadoop - java

I am running graphbuilder with hadoop (single-node). I have followed the tutorial "https://01.org/graphbuilder/documentation/how-run-demo-application". However, when I run the command
bin/hadoop jar /usr/local/graphbuilder/target/graphbuilder-0.0.1-SNAPSHOT-hadoop-job.jar com.intel.hadoop.graphbuilder.demoapps.wikipedia.docwordgraph.TFIDFGraphEnd2End 1 /user/hduser/wiki-input /user/hduser/en-wiki-articles-output ingressCode*
I am getting error
INFO docwordgraph.CreateWordCountGraph: ========== Creating Graph ================
Exception in thread "main" java.lang.NoSuchMethodError: org.apache.hadoop.filecache.DistributedCache.addFileToClassPath(Lorg/apache/hadoop/fs/Path;Lorg/apache/hadoop/conf/Configuration;Lorg/apache/hadoop/fs/FileSystem;)V
at com.intel.hadoop.graphbuilder.util.FsUtil.distributedTempClassToClassPath(FsUtil.java:46)
at com.intel.hadoop.graphbuilder.job.AbstractPreprocessJob.run(AbstractPreprocessJob.java:111)
at com.intel.hadoop.graphbuilder.demoapps.wikipedia.docwordgraph.CreateWordCountGraph.main(CreateWordCountGraph.java:77)
at com.intel.hadoop.graphbuilder.demoapps.wikipedia.docwordgraph.TFIDFGraphEnd2End.main(TFIDFGraphEnd2End.java:69)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)"
Can you please guide me how to solve this issue?
Thanks

Looks like you're using a back level version of hadoop. Check the version of hadoop that your version of graph builder needs and make sure that's the version you're running.

Related

Griffin Error : java.lang.NoSuchMethodError

Need help. I'm trying this job given in the documentation.
I have the following versions installed in my system:
Hadoop:3.2.0
Scala: 2.13.1
Spark: 3.0.0
Java: 1.8.0
Hive: 3.1.2
I'm submitting the spark job to run data quality using griffin. I'm getting below exception can you pls help me in this regard.
2020-04-04 08:37:35,456 INFO cluster.YarnClientSchedulerBackend: SchedulerBackend is ready for scheduling beginning after waiting maxRegisteredResourcesWaitingTime: 30000000000(ns)
Exception in thread "main" java.lang.NoSuchMethodError: scala.Predef$.refArrayOps([Ljava/lang/Object;)Lscala/collection/mutable/ArrayOps;
at org.apache.griffin.measure.datasource.connector.batch.HiveBatchDataConnector.<init>(HiveBatchDataConnector.scala:47)
at org.apache.griffin.measure.datasource.connector.DataConnectorFactory$$anonfun$getDataConnector$1.apply(DataConnectorFactory.scala:63)
at org.apache.griffin.measure.datasource.connector.DataConnectorFactory$$anonfun$getDataConnector$1.apply(DataConnectorFactory.scala:62)
at scala.util.Try$.apply(Try.scala:213)
at org.apache.griffin.measure.datasource.connector.DataConnectorFactory$.getDataConnector(DataConnectorFactory.scala:61)
at org.apache.griffin.measure.datasource.DataSourceFactory$$anonfun$1.apply(DataSourceFactory.scala:58)
at org.apache.griffin.measure.datasource.DataSourceFactory$$anonfun$1.apply(DataSourceFactory.scala:57)
at scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:245)
at scala.collection.immutable.List.foreach(List.scala:392)
at scala.collection.TraversableLike.flatMap(TraversableLike.scala:245)
at scala.collection.TraversableLike.flatMap$(TraversableLike.scala:242)
at scala.collection.immutable.List.flatMap(List.scala:355)
at org.apache.griffin.measure.datasource.DataSourceFactory$.org$apache$griffin$measure$datasource$DataSourceFactory$$getDataSource(DataSourceFactory.scala:57)
at org.apache.griffin.measure.datasource.DataSourceFactory$$anonfun$getDataSources$1.apply(DataSourceFactory.scala:40)
at org.apache.griffin.measure.datasource.DataSourceFactory$$anonfun$getDataSources$1.apply(DataSourceFactory.scala:38)
at scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:245)
at scala.collection.immutable.List.foreach(List.scala:392)
at scala.collection.TraversableLike.flatMap(TraversableLike.scala:245)
at scala.collection.TraversableLike.flatMap$(TraversableLike.scala:242)
at scala.collection.immutable.List.flatMap(List.scala:355)
at org.apache.griffin.measure.datasource.DataSourceFactory$.getDataSources(DataSourceFactory.scala:38)
at org.apache.griffin.measure.launch.batch.BatchDQApp$$anonfun$run$1.apply$mcZ$sp(BatchDQApp.scala:75)
at org.apache.griffin.measure.launch.batch.BatchDQApp$$anonfun$run$1.apply(BatchDQApp.scala:67)
at org.apache.griffin.measure.launch.batch.BatchDQApp$$anonfun$run$1.apply(BatchDQApp.scala:67)
at scala.util.Try$.apply(Try.scala:213)
at org.apache.griffin.measure.launch.batch.BatchDQApp.run(BatchDQApp.scala:67)
at org.apache.griffin.measure.Application$.main(Application.scala:88)
at org.apache.griffin.measure.Application.main(Application.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:928)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
java.lang.NoSuchMethodError: scala.Predef$.refArrayOps([Ljava/lang/Object;)Lscala/collection/mutable/ArrayOps;
It definitely means that you have conflicts in the Scala version somewhere in your application.
There is a very similar question here. Try to follow the answer provided there (downgrade Scala version to 2.11.x).
I don't think any of the Spark versions have been compiled/released for Scala 2.13

Filepath on Mac for local Parquet files in Spark program in Java

I have a small Spark program in Java, that reads parquet files from a local directory on a Mac.
I have been trying to do this multiple ways, but nothing seems to be working.
Dataset<Row> dsuomcategoryconvfactor = spark.read().parquet(path + "file:///usr/local⁩/ParquetData/data1.parquet");
I think I am giving the path wrong for Spark to identify it, and it's throwing the below error.
20/01/06 10:58:29 INFO SharedState: Warehouse path is 'file:/usr/local/Cellar/apache-spark/2.4.4/libexec/work/driver-20200106105812-0006/spark-warehouse'.
20/01/06 10:58:29 INFO StateStoreCoordinatorRef: Registered StateStoreCoordinator endpoint
Exception in thread "main" java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:65)
at org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala)
Caused by: org.apache.spark.sql.AnalysisException: Path does not exist: file:/usr/local⁩/ParquetData/data1.parquet;
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$org$apache$spark$sql$execution$datasources$DataSource$$checkAndGlobPathIfNecessary$1.apply(DataSource.scala:558)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$org$apache$spark$sql$execution$datasources$DataSource$$checkAndGlobPathIfNecessary$1.apply(DataSource.scala:545)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.immutable.List.foreach(List.scala:392)
at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
at scala.collection.immutable.List.flatMap(List.scala:355)
at org.apache.spark.sql.execution.datasources.DataSource.org$apache$spark$sql$execution$datasources$DataSource$$checkAndGlobPathIfNecessary(DataSource.scala:545)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:359)
at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:223)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:211)
at org.apache.spark.sql.DataFrameReader.parquet(DataFrameReader.scala:644)
at org.apache.spark.sql.DataFrameReader.parquet(DataFrameReader.scala:628)
This works fine when run from an IDE, but when I submit the job from shell using spark-submit, this error is being thrown.
Any help would be appreciated.
Thanks!

How to resolve java.lang.NoSuchMethodError org.apache.spark.ml.util.SchemaUtils$.checkColumnType

I am trying to run the CountVectorizerDemo program provided here:
https://github.com/apache/spark/blob/master/examples/src/main/java/org/apache/spark/examples/ml/JavaCountVectorizerExample.java
I'm getting the following error and don't know what the problem is.
Exception in thread "main" java.lang.NoSuchMethodError: org.apache.spark.ml.util.SchemaUtils$.checkColumnType$default$4()Ljava/lang/String;
at org.apache.spark.ml.feature.CountVectorizerParams$class.validateAndTransformSchema(CountVectorizer.scala:71)
at org.apache.spark.ml.feature.CountVectorizer.validateAndTransformSchema(CountVectorizer.scala:107)
at org.apache.spark.ml.feature.CountVectorizer.transformSchema(CountVectorizer.scala:168)
at org.apache.spark.ml.PipelineStage.transformSchema(Pipeline.scala:59)
at org.apache.spark.ml.feature.CountVectorizer.fit(CountVectorizer.scala:130)
at com.bah.ossem.spark.topic.CountVectorizerDemo.main(CountVectorizerDemo.java:42)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:664)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:169)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:192)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:111)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
The problem was that my cluster was using Spark core 1.4 but my application was using Spark core 1.5.1 and MLlib 1.5.1. I updated my AWS cluster to Spark 1.5.1 and that fixed the problem.

Java JPG codec won't work

I have problem with my tomcat application, after changing the server and installing the last version of tomcat7 my application won't read/load jpg files..
I installed imageio and jai on the server, try to change java version but every time I have the same error..
Anybody have an idea?
Error: One factory fails for the operation "jpeg"
Occurs in: javax.media.jai.ThreadSafeOperationRegistry
java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at javax.media.jai.FactoryCache.invoke(FactoryCache.java:122)
at javax.media.jai.OperationRegistry.invokeFactory(OperationRegistry.java:1674)
at javax.media.jai.ThreadSafeOperationRegistry.invokeFactory(ThreadSafeOperationRegistry.java:473)
at javax.media.jai.registry.RIFRegistry.create(RIFRegistry.java:332)
at com.sun.media.jai.opimage.StreamRIF.create(StreamRIF.java:102)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at javax.media.jai.FactoryCache.invoke(FactoryCache.java:122)
at javax.media.jai.OperationRegistry.invokeFactory(OperationRegistry.java:1674)
at javax.media.jai.ThreadSafeOperationRegistry.invokeFactory(ThreadSafeOperationRegistry.java:473)
at javax.media.jai.registry.RIFRegistry.create(RIFRegistry.java:332)
at javax.media.jai.RenderedOp.createInstance(RenderedOp.java:819)
at javax.media.jai.RenderedOp.createRendering(RenderedOp.java:867)
at javax.media.jai.RenderedOp.getWidth(RenderedOp.java:2179)
The whole error log can be found here -> http://paste.ubuntu.com/7653452/.
Update: The problem is related to grails plugin called ImageTools
If you look at the code for JPEGImageDecoder you'll see it depends on com.sun.image.codec.jpeg.ImageFormatException in its imports.
However, com.sun.image.codec.jpeg was removed from Java 7 onwards.
So likely the problem is that JAI is simply out of date, and you would have to use a Java 6 runtime to use it.

Error running hadoop job from Eclipse

I am facing some issues while trying to launch a mapreduce job on our Hadoop cluster from eclipse. I have added a folder named "conf" as a class folder and under that folder, I have imported the "core-site.xml", "hdfs-site.xml", "mapred-site.xml" and "hbase-site.xml". My hadoop cluster runs Hadoop 0.20.205.0, HBase-0.94.1. We are able to successfully submit the jobs to the cluser using "hadoop jar" command. Since this is very cumbersome, I want to setup eclipse such that I could submit the Hadoop jobs to the cluster by just running the program.
After I added the required dependencies to the project, I am getting the following exception when I run the example "PiEstimator.java" (of Hadoop-0.20.205.0).
Number of Maps = 4
Samples per Map = 4
Exception in thread "main" org.apache.hadoop.ipc.RemoteException: java.io.IOException: java.lang.NoSuchMethodException: org.apache.hadoop.hdfs.protocol.ClientProtocol.create(java.lang.String, org.apache.hadoop.fs.permission.FsPermission, java.lang.String, boolean, boolean, short, long)
at java.lang.Class.getMethod(Class.java:1632)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:557)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1384)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1382)
at org.apache.hadoop.ipc.Client.call(Client.java:1066)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
at com.sun.proxy.$Proxy1.create(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at com.sun.proxy.$Proxy1.create(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.<init>(DFSClient.java:3245)
at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:713)
at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:182)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:555)
at org.apache.hadoop.io.SequenceFile$Writer.<init>(SequenceFile.java:892)
at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:393)
at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:284)
at com.amazon.seo.mapreduce.examples.PiEstimator.estimate(PiEstimator.java:265)
at com.amazon.seo.mapreduce.examples.PiEstimator.run(PiEstimator.java:325)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at com.amazon.seo.mapreduce.examples.PiEstimator.main(PiEstimator.java:333)
Can you please help me understand what part of my setup is wrong and how to fix it?
Were you able to resolve this issue?
I've resolved similar error with ClassDefinition as follows:
Create Jar as a Runnable Java File (File >> Export >> Runnable Java File)
export HADOOP_CLASSPATH =
This will make hadoop to pickup the correct class from your jar file.
Unfortunately, I believe that you will have to upgrade Hadoop's version to at least 2.5.0

Categories