Spark on yarn jar upload problems - java

I am trying to run a simple Map/Reduce java program using spark over yarn (Cloudera Hadoop 5.2 on CentOS). I have tried this 2 different ways. The first way is the following:
YARN_CONF_DIR=/usr/lib/hadoop-yarn/etc/hadoop/;
/var/tmp/spark/spark-1.4.0-bin-hadoop2.4/bin/spark-submit --class MRContainer --master yarn-cluster --jars /var/tmp/spark/spark-1.4.0-bin-hadoop2.4/lib/spark-assembly-1.4.0-hadoop2.4.0.jar simplemr.jar
This method gives the following error:
diagnostics: Application application_1434177111261_0007 failed 2 times
due to AM Container for appattempt_1434177111261_0007_000002 exited
with exitCode: -1000 due to: Resource
hdfs://kc1ltcld29:9000/user/myuser/.sparkStaging/application_1434177111261_0007/spark-assembly-1.4.0-hadoop2.4.0.jar
changed on src filesystem (expected 1434549639128, was 1434549642191
Then I tried without the --jars:
YARN_CONF_DIR=/usr/lib/hadoop-yarn/etc/hadoop/;
/var/tmp/spark/spark-1.4.0-bin-hadoop2.4/bin/spark-submit --class MRContainer --master yarn-cluster simplemr.jar
diagnostics: Application application_1434177111261_0008 failed 2 times
due to AM Container for appattempt_1434177111261_0008_000002 exited
with exitCode: -1000 due to: File does not exist:
hdfs://kc1ltcld29:9000/user/myuser/.sparkStaging/application_1434177111261_0008/spark-assembly-1.4.0-hadoop2.4.0.jar
.Failing this attempt.. Failing the application.
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: root.myuser
start time: 1434549879649
final status: FAILED
tracking URL: http://kc1ltcld29:8088/cluster/app/application_1434177111261_0008
user: myuser Exception in thread "main" org.apache.spark.SparkException: Application
application_1434177111261_0008 finished with failed status
at org.apache.spark.deploy.yarn.Client.run(Client.scala:841)
at org.apache.spark.deploy.yarn.Client$.main(Client.scala:867)
at org.apache.spark.deploy.yarn.Client.main(Client.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:664)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:169)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:192)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:111)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 15/06/17 10:04:57 INFO util.Utils: Shutdown hook called 15/06/17
10:04:57 INFO util.Utils: Deleting directory
/tmp/spark-2aca3f35-abf1-4e21-a10e-4778a039d0f4
I tried deleting all the .jars from hdfs://users//.sparkStaging and resubmitting but that didn't help.

The problem was solved by copying spark-assembly.jar into a directory on the hdfs for each node and then passing it to spark-submit --conf spark.yarn.jar as a parameter. Commands are listed below:
hdfs dfs -copyFromLocal /var/tmp/spark/spark-1.4.0-bin-hadoop2.4/lib/spark-assembly-1.4.0-hadoop2.4.0.jar /user/spark/spark-assembly.jar
/var/tmp/spark/spark-1.4.0-bin-hadoop2.4/bin/spark-submit --class MRContainer --master yarn-cluster --conf spark.yarn.jar=hdfs:///user/spark/spark-assembly.jar simplemr.jar

If you are getting this error it means you are uploading assembly jars using --jars option or manually copying to hdfs in each node.
I have followed this approach and it works for me.
In yarn-cluster mode, Spark submit automatically uploads the assembly jar to a distributed cache that all executor containers read from, so there is no need to manually copy the assembly jar to all nodes (or pass it through --jars).
It seems there are two versions of the same jar in your HDFS.
Try removing all old jars from your .sparkStaging directory and try again, it should work.

Related

CBORFactory NoClassDefFoundError exception in spark 3.0.0

I am having spark streaming application using kinesis and running in EMR 6.0.0,
It's running fine locally but when deploying to AWS EMR it keeps failing with
NoClassDefFoundError exception
20/11/17 15:26:56 INFO Client:
client token: N/A
diagnostics: User class threw exception: java.lang.NoClassDefFoundError: com/fasterxml/jackson/dataformat/cbor/CBORFactory
at com.amazonaws.protocol.json.SdkJsonProtocolFactory.getSdkFactory(SdkJsonProtocolFactory.java:123)
at com.amazonaws.protocol.json.SdkJsonProtocolFactory.createGenerator(SdkJsonProtocolFactory.java:54)
at com.amazonaws.protocol.json.SdkJsonProtocolFactory.createGenerator(SdkJsonProtocolFactory.java:74)
at com.amazonaws.protocol.json.SdkJsonProtocolFactory.createProtocolMarshaller(SdkJsonProtocolFactory.java:64)
at com.amazonaws.services.kinesis.model.transform.DescribeStreamRequestProtocolMarshaller.marshall(DescribeStreamRequestProtocolMarshaller.java:52)
at com.amazonaws.services.kinesis.AmazonKinesisClient.executeDescribeStream(AmazonKinesisClient.java:861)
at com.amazonaws.services.kinesis.AmazonKinesisClient.describeStream(AmazonKinesisClient.java:846)
at com.amazonaws.services.kinesis.AmazonKinesisClient.describeStream(AmazonKinesisClient.java:887)
at com.gartner.tn.datafeed.application.PositionStreamApplicationV4.getJavaDStream(PositionStreamApplicationV4.java:240)
I had the exact same issue and I solved it by removing the dependence on CBOR from Kinesis. I am not sure if that is an option for you but it worked for me.
There are a few ways to do this but, for when running in local mode, I put the following code at the beginning of the main class in my streaming spark application;
System.setProperty(SDKGlobalConfiguration.AWS_CBOR_DISABLE_SYSTEM_PROPERTY, "true");
When running in cluster mode start your spark submit as follows;
spark-submit --deploy-mode cluster \
--conf spark.driver.extraJavaOptions='-Dcom.amazonaws.sdk.disableCbor=true' \
--conf spark.executor.extraJavaOptions='-Dcom.amazonaws.sdk.disableCbor=true'
When running in client mode on the cluster start like this;
spark-submit --deploy-mode client \
--driver-java-options '-Dcom.amazonaws.sdk.disableCbor=true' \
--conf spark.executor.extraJavaOptions='-Dcom.amazonaws.sdk.disableCbor=true'
This question led me to the answer; Getting an AmazonKinesisException Status Code: 502 when using LocalStack from Java

How can I deploy cloudfoundry-uaa as a docker image based on tomcat?

We were using the cf-uaa's gradle tasks to create a docker image but those have been removed in the latest version. I've loaded the war in a recent version, but the service does not seem to be starting correctly.
I've been building the war from the v74 tag, adding it to tomcat:8.5.45-jdk12-openjdk-oracle or tomcat:9.0.24-jdk12-openjdk-oracle, and setting the various env vars that we were passing in to the previous image. I'm not seeing any log entries after the initial tomcat output stating that my war has been deployed and the server startup time.
The Dockerfile is basically just an adaptation of what was being passed in the previous image:
FROM tomcat:8.5.45-jdk12-openjdk-oracle
#FROM tomcat:9.0.24-jdk12-openjdk-oracle
ENV LOGIN_CONFIG_URL WEB-INF/classes/required_configuration.yml
ENV UAA_CONFIG_PATH /uaa
RUN bash -c "rm -r /usr/local/tomcat/webapps/ROOT"
RUN bash -c "rm -r /usr/local/tomcat/webapps/host-manager"
RUN bash -c "rm -r /usr/local/tomcat/webapps/manager"
RUN bash -c "rm -r /usr/local/tomcat/webapps/examples"
RUN bash -c "rm -r /usr/local/tomcat/webapps/docs"
ADD *.war /usr/local/tomcat/webapps/uaa.war
RUN bash -c "echo $LOGIN_CONFIG_URL"
EXPOSE 8080
I would expect to see the service responding to my requests, or some errors in the log indicating that the war failed to deploy. I am not currently getting any log output generated from the application code. When I send a request to the service, the response is a 500 with the an error header from the service.
X-Cf-Uaa-Error:Server failed to start. Possible configuration error.
update: I've located the uaa logs within .../tomcat/logs/uaa.log I'm not seeing anything indicating that the service failed to deploy, but I am also not seeing anything to indicate that it is picking up the env vars I have set in the container. I recreated the service using the war from the original setup which started successfully using the uaa.yml which I mounted as a volume. Comparing the logs, the original setup's first log entry is YamlProcessor which does not show up in the v75 logs at all. In fact, no debug entries show up at all, which suggests to me that my LOG_LEVEL env var is not propagating either.
Update 2: We reverted the image base to FROM tomcat:8.5-jre8 and started seeing flyway errors in the uaa.log. Our previous datasource url format was url: jdbc:postgresql://${POSTGRES_NAME}:5432/${DB}?currentSchema=uaa which caused a flyway exception. After removing the schema reference, it created the tables in the public schema. By creating the uaa schema manually before starting the service, it was able to run with the original format. The flyway version has updated, so perhaps there something new that needs to be set.
The application seems to be running, but when I try to get a token at /uaa/oauth/token I get a 500 with this error in the logs: Caused by: java.lang.NoSuchMethodError: java.nio.CharBuffer.limit(I)Ljava/nio/CharBuffer;
Since Jan 2021, UAA server docker images is now be available on cloudfoundry/uaa dockerhub repository.
docker pull cloudfoundry/uaa:75.0.0
See its Dockerfile for more details.
Can you try following ?
https://github.com/hortonworks/docker-cloudbreak-uaa
This works very well.

Launch spark master windows7

Using win7-64, jdk8, sparks1.6.2.
I have spark running, winutils, HADOOP_HOME, etc
Per documentation Note: The launch scripts do not currently support Windows. To run a Spark cluster on Windows, start the master and workers by hand. But does not say how?
How do I launch spark master on windows?
Tried running sh start-master.sh thru git bash : failed to launch org.apache.spark.deploy.master.Master: Even though it prints out Master --ip Sam-Toshiba --port 7077 --webui-port 8080 - So I don't know what all this means.
But when I try spark-submit --class " " --master spark://Sam-Toshiba:7077 target/ .jar -
I get errors:
WARN AbstractLifeCycle: FAILED SelectChannelConnector#0.0.0.0:
4040: java.net.BindException: Address already in use: bind
java.net.BindException: Address already in use
WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
17/01/12 14:44:29 WARN AppClient$ClientEndpoint: Failed to connect to master Sam-Toshiba:7077
java.io.IOException: Failed to connect to Sam-Toshiba/192.168.137.1:7077
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:216)
Also tried spark://localhost:7077 - same errors
On Windows you can launch Master using below command. Open command prompt and go to Spark bin folder and execute
spark-class.cmd org.apache.spark.deploy.master.Master
Above command will print like Master: Starting Spark master at spark://192.168.99.1:7077 in console as per IP of your machine. You can check the UI at http://192.168.99.1:8080/
If you want to launch worker once your master is up you can use below command. This will use all the available cores of your machine.
spark-class.cmd org.apache.spark.deploy.worker.Worker spark://192.168.99.1:7077
If you want to utilize 2 cores of your 4 cores of machine then use
spark-class.cmd org.apache.spark.deploy.worker.Worker -c 2 spark://192.168.99.1:7077

Spark-submit fails without an error

I used the following command to run the spark java example of wordcount:-
time spark-submit --deploy-mode cluster --master spark://192.168.0.7:7077 --class org.apache.spark.examples.JavaWordCount /home/pi/Desktop/example/new/target/javaword.jar /books_500.txt
I have copied the same jar file into all nodes in the same location. (Copying into HDFS didn't work for me.) When I run it, the following is the output:-
Running Spark using the REST application submission protocol.
16/07/14 16:32:18 INFO rest.RestSubmissionClient: Submitting a request to launch an application in spark://192.168.0.7:7077.
16/07/14 16:32:30 WARN rest.RestSubmissionClient: Unable to connect to server spark://192.168.0.7:7077.
Warning: Master endpoint spark://192.168.0.7:7077 was not a REST server. Falling back to legacy submission gateway instead.
16/07/14 16:32:30 WARN util.Utils: Your hostname, master02 resolves to a loopback address: 127.0.1.1; using 192.168.0.7 instead (on interface wlan0)
16/07/14 16:32:30 WARN util.Utils: Set SPARK_LOCAL_IP if you need to bind to another address
16/07/14 16:32:31 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
It just stops there, quits the job and waits for the next command on terminal. I didn't understand this error without an error message. Help needed please...!!

Timeout when submitting jobs to EC2 cluster

I've been trying to make this work with no luck thus far. I launch a cluster with
./spark-ec2 -k keyname -i ~/.keys/key.pem --region=us-east-1 -s 5 launch "my test cluster"
Then I submit a job with
bin/spark-submit --verbose --class com.company.jobs.AggregateCostDataWorkflow --master spark://ec2-54-157-122-49.compute-1.amazonaws.com:7077 --deploy-mode cluster --conf spark.executor.memory=5g /Users/my.name/scala-proj/target/scala-2.10/scala-proj-0.1.0.jar --outputPath,s3n://my-bucket/my-name/ec2-spark-test/
Where outPutPath is an argument to the main method. After a bit and some status output, I see an exception that looks like
15/06/05 16:09:33 INFO StandaloneRestClient: Submitting a request to launch an application in spark://ec2-74-141-162-19.compute-1.amazonaws.com:7077.
Exception in thread "main" java.net.ConnectException: Operation timed out
at java.net.PlainSocketImpl.socketConnect(Native Method)
at [java socket stuff elided for brevity] org.apache.spark.deploy.rest.StandaloneRestClient.postJson(StandaloneRestClient.scala:150)
at org.apache.spark.deploy.rest.StandaloneRestClient.createSubmission(StandaloneRestClient.scala:70)
at org.apache.spark.deploy.rest.StandaloneRestClient$.run(StandaloneRestClient.scala:317)
at org.apache.spark.deploy.rest.StandaloneRestClient$.main(StandaloneRestClient.scala:329)
at org.apache.spark.deploy.rest.StandaloneRestClient.main(StandaloneRestClient.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:569)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:166)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:178)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:110)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
This is spark 1.3.1 (on my local machine) I can access the UI on the master machine and verify that the Spark processes are in fact up. I can also ssh into the master.
Any tips?
You need to open ports by editing security policies, if you want to access ports on your EC2 spark cluster. spark_ec2.py doesn't open ports 7077 and 6066 on master to be accessed from outside the cluster.
I use the other way - connect to master machine of your spark cluster with the command
./spark_ec2.py -k keyname -i ~/.keys/key.pem login "my test cluster"
Upload the your job file (with scp using same key) and submit job from there. This would ensure that your driver has access to a cluster master and slaves.
See "Running Applications" section of Running Spark on EC2 documentation

Categories