When I run HiveRead.java from intellij ide I can successfully run and get result. Then I created jar file ( It's a maven project ) , then I tried to run from IDE, it gave me
ClassLoaderResolver for class "" gave error on creation : {1}
Then I looked at SO answers and found I had to add datanulcues jars, I did something like this
java -jar /home/saurab/sparkProjects/spark_hive/target/myJar-jar-with-dependencies.jar --jars jars/datanucleus-api-jdo-3.2.6.jar,jars/datanucleus-core-3.2.10.jar,jars/datanucleus-rdbms-3.2.9.jar,/home/saurab/hadoopec/hive/lib/mysql-connector-java-5.1.38.jar
Then I got this error
org.datanucleus.exceptions.NucleusUserException: Persistence process has been specified to use a ClassLoaderResolver of name "datanucleus" yet this has not been found by the DataNucleus plugin mechanism. Please check your CLASSPATH and plugin specification.
Somewhere I found I should do spark-submit. So I did like this
./bin/spark-submit --class HiveRead --master yarn --jars jars/datanucleus-api-jdo-3.2.6.jar,jars/datanucleus-core-3.2.10.jar,jars/datanucleus-rdbms-3.2.9.jar,/home/saurab/hadoopec/hive/lib/mysql-connector-java-5.1.38.jar --files /home/saurab/hadoopec/spark/conf/hive-site.xml /home/saurab/sparkProjects/spark_hive/target/myJar-jar-with-dependencies.jar
Now I get new type of error
Table or view not found: `bigmart`.`o_sales`;
HELP ME !! :)
I have copied my hive-site.xml to /spark/conf, started hive-metastore service ( hiveserver2 --service metastore )
Here is HiveRead.Java code if anyone is interested.
Spark session is not able to read the hive directory.
Provide the hive-site.xml file path with spark-submit command as below.
For hortonworks - file path /usr/hdp/current/spark2-client/conf/hive-site.xml
pass it as --files /usr/hdp/current/spark2-client/conf/hive-site.xml in spark-submit command.
Related
I am very new to Spark and I am following this document to submit Spark jobs via Livy https://learn.microsoft.com/en-us/azure/hdinsight/spark/apache-spark-livy-rest-interface
Here's my command:
curl -k --user "username:password!" -v -H "Content-Type: application/json" -X POST -d '{ "file":"/test4spark/test4sparkhaha.jar", "className":"helloworld4spark.test" }' "https://xxx.azurehdinsight.net/livy/batches" -H "X-Requested-By: username"
The file test4sparkhaha.jar is a super simple Java application, it contains only one class, and there's only a main method that print "hahaha", nothing else...
I exported the project in Eclipse to a Runnable Jar and I tried run it on my spark cluster using Java -jar and Spark-submit. Both worked good.
Then I started to try submit the job via Livy and it always failed, I found below errors in Yarn logs:
19/11/06 14:36:06 ERROR ApplicationMaster: Uncaught exception:
java.lang.IllegalStateException: User did not initialize spark context!
at org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:510)
at org.apache.spark.deploy.yarn.ApplicationMaster.org$apache$spark$deploy$yarn$ApplicationMaster$$runImpl(ApplicationMaster.scala:345)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$2.apply$mcV$sp(ApplicationMaster.scala:260)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$2.apply(ApplicationMaster.scala:260)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$2.apply(ApplicationMaster.scala:260)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$5.run(ApplicationMaster.scala:815)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)
at org.apache.spark.deploy.yarn.ApplicationMaster.doAsUser(ApplicationMaster.scala:814)
at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:259)
at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:839)
at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
Here's the Spark-submit command I tried:
spark-submit --class helloworld4spark.test test4sparkhaha.jar
It works perfectly...
Can you guys please help me to understand why it gives the error while Spark-submit works fine?
I guess you're trying to submit local .jar file with Livy.
For spark-submit it works (submitting job to Yarn supports it), but for Livy server it doesn't.
To make it work you need to upload your jar to the hdfs/wasbs/adls/http accessible location (be sure that your Spark is configured to access that location).
Please refer the first paragraph of this guide.
I was trying to configure Weblogic 12.2.1.3 on Linux account.
After going through oracle documentation, I understood we setup Weblogic by running config.sh script located in /opt/weblogic12213/wlserver_12.2.1.3/installation/oracle_common/common/bin folder.
But this script is giving error as :
Error: Could not find or load main class
com.oracle.cie.wizard.WizardController
Below is the last command which it executes in setup script and it is giving an error :
/usr/java/jdk1.8.0_192/bin/java -Dpython.cachedir=/tmp/cachedir
-Xms32m -Xmx1024m -Dweblogic.alternateTypesDirectory=/opt/weblogic12213/wlserver_12.2.1.3/installation/wlserver/../oracle_common/modules/oracle.oamprovider,/opt/weblogic12213/wlserver_12.2.1.3/installation/wlserver/../oracle_common/modules/oracle.jps
com.oracle.cie.wizard.WizardController -target=config-oneclick
scripts provided by weblogic doesn't set CLASSPATH correctly.
com.oracle.cie.wizard.WizardController comes under com.oracle.cie.wizard_7.8.2.0.jar .
Steps done.
1. Copied config_internal.sh script on some local path. (Because I didn't have the root permission.)
2. Appended this jar in CLASSPATH varibale like this :
CLASSPATH="${FMWCONFIG_CLASSPATH}${CLASSPATHSEP}${DERBY_CLASSPATH}:/opt/weblogic12213/wlserver_12.2.1.3/installation/oracle_common/modules/com.oracle.cie.wizard_7.8.2.0.jar"
I am running my spark job as yarn client mode for which,
I need guava jar for my spark job that needs to be in driver and executors classpath.
So, I am running below spark-submit command:
spark-submit --class "com.bk.App" --master yarn --deploy-mode client --executor-cores 2 --driver-memory 1G --driver-cores 1 --driver-class-path /home/my_account/spark-jars/guava-19.0.jar --conf spark.executor.extraClassPath=/home/my_account/spark-jars/guava-19.0.jar maprfs:///user/my_account/jobs/spark-jobs.jar parma1 parma2
which give me below exception:
Downloading maprfs:///user/my_account/jobs/spark-jobs.jar to /tmp/tmp732578642370806645/user/my_account/jobs/spark-jobs.jar.
2018-10-29 19:37:52,2025 ERROR JniCommon fs/client/fileclient/cc/jni_MapRClient.cc:566 Thread: 6832 Client initialization failed due to mismatch of libraries. Please make sure that the java library version matches the native build version 5.0.0.32987.GA and native patch version $Id: mapr-version: 5.0.0.32987.GA 40889:3056362e419b $
Exception in thread "main" java.io.IOException: Could not create FileClient
at com.mapr.fs.MapRFileSystem.lookupClient(MapRFileSystem.java:593)
at com.mapr.fs.MapRFileSystem.lookupClient(MapRFileSystem.java:654)
at com.mapr.fs.MapRFileSystem.getMapRFileStatus(MapRFileSystem.java:1310)
at com.mapr.fs.MapRFileSystem.getFileStatus(MapRFileSystem.java:942)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:345)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:297)
at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:2066)
at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:2035)
at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:2011)
at org.apache.spark.deploy.SparkSubmit$.downloadFile(SparkSubmit.scala:874)
at org.apache.spark.deploy.SparkSubmit$$anonfun$prepareSubmitEnvironment$1.apply(SparkSubmit.scala:316)
at org.apache.spark.deploy.SparkSubmit$$anonfun$prepareSubmitEnvironment$1.apply(SparkSubmit.scala:316)
at scala.Option.map(Option.scala:146)
at org.apache.spark.deploy.SparkSubmit$.prepareSubmitEnvironment(SparkSubmit.scala:316)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:153)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
I even tried putting my guava jar to hdfs location and added extra classpath using hdfs:// and even maprfs:// in my spark submit. Tried giving local:// too. All end up in same above exception.
Note: Job works absolutely fine, if no driver and executor extra classpath jars are given.
Any suggestions? am I using the class path param in wrong way?
I am trying a simple movie recommendation machine learning program in spark.
Spark version:2.1.1
Java version:java 8
Scala version: Scala code runner version 2.11.7
Env: windows 7
Running these commands to start master and worker slaves
//start master
spark-class org.apache.spark.deploy.master.Master
//start worker
spark-class org.apache.spark.deploy.worker.Worker spark://valid ip:7077
I am trying a very simple movie recommendation code from here: http://blogs.quovantis.com/recommendation-engine-using-apache-spark/
I have updated code to :
SparkConf conf = new SparkConf().setAppName("Collaborative Filtering Example").setMaster("spark://valid ip:7077");
conf.setJars(new String[] {"C:\\Spark2.1.1\\spark-2.1.1-bin-hadoop2.7\\jars\\spark-mllib_2.11-2.1.1.jar"});
I cannot run this thru intelliJ
Running mvn clean install and copying the jar to folder does not work.
The command I used to run on :
bin\spark-submit --verbose –-jars jars\spark-mllib_2.11-2.1.1.jar –-class “com.abc.enterprise.RecommendationEngine” –-master spark://valid ip:7077 C:\Spark2.1.1\spark-2.1.1-bin-hadoop2.7\spark-mllib-example\spark-poc-1.0-SNAPSHOT.jar C:\Spark2.1.1\spark-2.1.1-bin-hadoop2.7\spark-mllib-example\ratings.csv C:\Spark2.1.1\spark-2.1.1-bin-hadoop2.7\spark-mllib-example\movies.csv 10
The error I see is:
C:\Spark2.1.1\spark-2.1.1-bin-hadoop2.7>bin\spark-submit --verbose --class "com.sandc.enterprise.RecommendationEngine" --master spark://10.64.98.101:7077 C:\Spark2.1.1\spark-2.1.1-
bin-hadoop2.7\spark-mllib-example\spark-poc-1.0-SNAPSHOT.jar C:\Spark2.1.1\spark-2.1.1-bin-hadoop2.7\spark-mllib-example\ratings.csv C:\Spark2.1.1\spark-2.1.1-bin-hadoop2.7\spark-m
llib-example\movies.csv 10
Using properties file: C:\Spark2.1.1\spark-2.1.1-bin-hadoop2.7\bin\..\conf\spark-defaults.conf
Adding default property: spark.serializer=org.apache.spark.serializer.KryoSerializer
Adding default property: spark.executor.extraJavaOptions=-XX:+PrintGCDetails -Dkey=value -Dnumbers="one two three"
Adding default property: spark.eventLog.enabled=true
Adding default property: spark.driver.memory=5g
Adding default property: spark.master=spark://valid ip:7077
Error: Cannot load main class from JAR file:/C:/Spark2.1.1/spark-2.1.1-bin-hadoop2.7/û-class
Run with --help for usage help or --verbose for debug output
If I give the --jar command, it gives the error:
Error: Cannot load main class from JAR file:/C:/Spark2.1.1/spark-2.1.1-bin-hadoop2.7/û-jars
Any ideas how I can submit this job to spark??
Is your Jar built correctly ?
Also you don't need to add double quotes for --class option value.
I added KafkaLog4JAppender functionality to my MR job.
locally the job is running and sending the formatted logs into my Kafka cluster.
when I try to run it from the yarn server, using:
jar [jar-name].jar [DriverClass].class [job-params] -Dlog4j.configuration=log4j.xml -libjars
I get the following expception:
log4j:ERROR Could not create an Appender. Reported error follows.
java.lang.ClassNotFoundException: kafka.producer.KafkaLog4jAppender
the KafkaLog4JAppender class is in the path.
running
jar tvf [my-jar].jar | grep KafkaLog4J
finds the class
I'm kinda lost and would appreciate any helpfull input
thanks in advance!
If it works in local mode and not working in Yarn/distributed mode, then it could be problem of jar not being distributed properly. YOu might want to check Using third part jars and files in your MapReduce application(Distributed cache) for details on how to distribute your jar containing KafkaLog4jAppender.class