Hadoop Kerberos failure on Spark local

Hadoop Kerberos failure on Spark local - java

I'm getting the following error when running spark-submit on Gitlab.
This works locally, but not on Gitlab.
21/11/03 05:34:51 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Exception in thread "main" org.apache.hadoop.security.KerberosAuthException: failure to login: javax.security.auth.login.LoginException: java.lang.NullPointerException: invalid null input: name
at jdk.security.auth/com.sun.security.auth.UnixPrincipal.<init>(UnixPrincipal.java:67)
Why would this happen?
pipenv run spark-submit --packages org.apache.spark: spark-avro_2.12:3.2.0 --master local[1] --conf spark.jars. ivy=/tmp/.ivy main.py

Related

SparkContext: Error initializing SparkContext on MapR Sandbox

I tried running this sample project which uses MapR.
I tried executing the class ml.Flight in the sandbox and from the below line,
val spark: SparkSession = SparkSession.builder().appName("churn").getOrCreate()
I got this error.
[user01#maprdemo ~]$ spark-submit --class ml.Flight --master local[2] spark-ml-flightdelay-1.0.jar
Warning: Unable to determine $DRILL_HOME
18/12/19 05:39:09 WARN Utils: Your hostname, maprdemo.local resolves to a loopback address: 127.0.0.1; using 10.0.3.1 instead (on interface enp0s3)
18/12/19 05:39:09 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
18/12/19 05:39:20 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
18/12/19 05:39:28 ERROR SparkContext: Error initializing SparkContext.
java.io.IOException: Could not create FileClient
at com.mapr.fs.MapRFileSystem.lookupClient(MapRFileSystem.java:656)
at com.mapr.fs.MapRFileSystem.lookupClient(MapRFileSystem.java:709)
at com.mapr.fs.MapRFileSystem.getMapRFileStatus(MapRFileSystem.java:1419)
at com.mapr.fs.MapRFileSystem.getFileStatus(MapRFileSystem.java:1093)
at org.apache.spark.scheduler.EventLoggingListener.start(EventLoggingListener.scala:100)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:522)
at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2493)
at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:933)
at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:924)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:924)
at ml.Flight$.main(Flight.scala:37)
at ml.Flight.main(Flight.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:899)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:198)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:228)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:137)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.io.IOException: Could not create FileClient
at com.mapr.fs.MapRClientImpl.<init>(MapRClientImpl.java:137)
at com.mapr.fs.MapRFileSystem.lookupClient(MapRFileSystem.java:650)
... 22 more
I'm new to Scala/Spark and any help is welcome. Thanks in advance.

I think you are using or exporting a different python version of spark-submit.
For example:
/opt/mapr/spark/spark-2.3.1/bin/spark-submit

Hadoop_instalation: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

Java HotSpot(TM) Client VM warning: You have loaded library /home/happyhadoop/hadoop-2.7.3/lib/native/libhadoop.so.1.0.0 which might have disabled stack guard. The VM will try to fix the stack guard now.
It's highly recommended that you fix the library with 'execstack -c ', or link it with '-z noexecstack'.
17/04/30 21:30:01 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Starting namenodes on [localhost]
happyhadoop#localhost's password:
localhost: namenode running as process 13997. Stop it first.
happyhadoop#localhost's password:
localhost: datanode running as process 14153. Stop it first.
Starting secondary namenodes [0.0.0.0]
happyhadoop#0.0.0.0's password:
0.0.0.0: secondarynamenode running as process 14432. Stop it first.
Java HotSpot(TM) Client VM warning: You have loaded library /home/happyhadoop/hadoop-2.7.3/lib/native/libhadoop.so.1.0.0 which might have disabled stack guard. The VM will try to fix the stack guard now.
It's highly recommended that you fix the library with 'execstack -c ', or link it with '-z noexecstack'.
17/04/30 21:30:36 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Can someone please help me with this warning?

Spark-submit fails without an error

I used the following command to run the spark java example of wordcount:-
time spark-submit --deploy-mode cluster --master spark://192.168.0.7:7077 --class org.apache.spark.examples.JavaWordCount /home/pi/Desktop/example/new/target/javaword.jar /books_500.txt
I have copied the same jar file into all nodes in the same location. (Copying into HDFS didn't work for me.) When I run it, the following is the output:-
Running Spark using the REST application submission protocol.
16/07/14 16:32:18 INFO rest.RestSubmissionClient: Submitting a request to launch an application in spark://192.168.0.7:7077.
16/07/14 16:32:30 WARN rest.RestSubmissionClient: Unable to connect to server spark://192.168.0.7:7077.
Warning: Master endpoint spark://192.168.0.7:7077 was not a REST server. Falling back to legacy submission gateway instead.
16/07/14 16:32:30 WARN util.Utils: Your hostname, master02 resolves to a loopback address: 127.0.1.1; using 192.168.0.7 instead (on interface wlan0)
16/07/14 16:32:30 WARN util.Utils: Set SPARK_LOCAL_IP if you need to bind to another address
16/07/14 16:32:31 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
It just stops there, quits the job and waits for the next command on terminal. I didn't understand this error without an error message. Help needed please...!!

Hadoop on archlinux | dfs cannot start | ssh port 22 connection refused

i just can't find any answers for this problem:
[hadoop#evghost ~]$ start-dfs.sh
15/10/21 21:59:17 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Incorrect configuration: namenode address dfs.namenode.servicerpc-address or dfs.namenode.rpc-address is not configured.
Starting namenodes on []
evghost: ssh: connect to host evghost port 22: Connection refused
evghost: ssh: connect to host evghost port 22: Connection refused
Starting secondary namenodes [0.0.0.0]
Error: Please specify one of --hosts or --hostnames options and not both.
15/10/21 21:59:20 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Does somebody know any solution?

i should enable daemon sshd to connect and post
export HADOOP_OPTS="$HADOOP_OPTS
-Djava.library.path=/usr/local/hadoop/lib/native"
in .bashrc

Spark on yarn jar upload problems

I am trying to run a simple Map/Reduce java program using spark over yarn (Cloudera Hadoop 5.2 on CentOS). I have tried this 2 different ways. The first way is the following:
YARN_CONF_DIR=/usr/lib/hadoop-yarn/etc/hadoop/;
/var/tmp/spark/spark-1.4.0-bin-hadoop2.4/bin/spark-submit --class MRContainer --master yarn-cluster --jars /var/tmp/spark/spark-1.4.0-bin-hadoop2.4/lib/spark-assembly-1.4.0-hadoop2.4.0.jar simplemr.jar
This method gives the following error:
diagnostics: Application application_1434177111261_0007 failed 2 times
due to AM Container for appattempt_1434177111261_0007_000002 exited
with exitCode: -1000 due to: Resource
hdfs://kc1ltcld29:9000/user/myuser/.sparkStaging/application_1434177111261_0007/spark-assembly-1.4.0-hadoop2.4.0.jar
changed on src filesystem (expected 1434549639128, was 1434549642191
Then I tried without the --jars:
YARN_CONF_DIR=/usr/lib/hadoop-yarn/etc/hadoop/;
/var/tmp/spark/spark-1.4.0-bin-hadoop2.4/bin/spark-submit --class MRContainer --master yarn-cluster simplemr.jar
diagnostics: Application application_1434177111261_0008 failed 2 times
due to AM Container for appattempt_1434177111261_0008_000002 exited
with exitCode: -1000 due to: File does not exist:
hdfs://kc1ltcld29:9000/user/myuser/.sparkStaging/application_1434177111261_0008/spark-assembly-1.4.0-hadoop2.4.0.jar
.Failing this attempt.. Failing the application.
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: root.myuser
start time: 1434549879649
final status: FAILED
tracking URL: http://kc1ltcld29:8088/cluster/app/application_1434177111261_0008
user: myuser Exception in thread "main" org.apache.spark.SparkException: Application
application_1434177111261_0008 finished with failed status
at org.apache.spark.deploy.yarn.Client.run(Client.scala:841)
at org.apache.spark.deploy.yarn.Client$.main(Client.scala:867)
at org.apache.spark.deploy.yarn.Client.main(Client.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:664)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:169)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:192)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:111)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 15/06/17 10:04:57 INFO util.Utils: Shutdown hook called 15/06/17
10:04:57 INFO util.Utils: Deleting directory
/tmp/spark-2aca3f35-abf1-4e21-a10e-4778a039d0f4
I tried deleting all the .jars from hdfs://users//.sparkStaging and resubmitting but that didn't help.

The problem was solved by copying spark-assembly.jar into a directory on the hdfs for each node and then passing it to spark-submit --conf spark.yarn.jar as a parameter. Commands are listed below:
hdfs dfs -copyFromLocal /var/tmp/spark/spark-1.4.0-bin-hadoop2.4/lib/spark-assembly-1.4.0-hadoop2.4.0.jar /user/spark/spark-assembly.jar
/var/tmp/spark/spark-1.4.0-bin-hadoop2.4/bin/spark-submit --class MRContainer --master yarn-cluster --conf spark.yarn.jar=hdfs:///user/spark/spark-assembly.jar simplemr.jar

If you are getting this error it means you are uploading assembly jars using --jars option or manually copying to hdfs in each node.
I have followed this approach and it works for me.
In yarn-cluster mode, Spark submit automatically uploads the assembly jar to a distributed cache that all executor containers read from, so there is no need to manually copy the assembly jar to all nodes (or pass it through --jars).
It seems there are two versions of the same jar in your HDFS.
Try removing all old jars from your .sparkStaging directory and try again, it should work.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Hadoop Kerberos failure on Spark local - java

Related

SparkContext: Error initializing SparkContext on MapR Sandbox

Hadoop_instalation: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

Spark-submit fails without an error

Hadoop on archlinux | dfs cannot start | ssh port 22 connection refused

Spark on yarn jar upload problems

Categories

Resources