Kill running spark driver program - java

I am running spark 1.6.0 on a small computing cluster and wish to kill a driver program. I've submitted a custom implementation of the out of the box Spark Pi calculation example with the following options:
spark-submit --class JavaSparkPi --master spark://clusterIP:portNum --deploy-mode cluster /path/to/jarfile/JavaSparkPi.jar 10
Note: 10 is a command line argument and is irrelevant for this question.
I've tried many methods of killing the driver program that was started on the cluster:
./bin/spark-class org.apache.spark.deploy.Client kill
spark-submit --master spark://node-1:6066 --kill $driverid
Issue the kill command from the spark administrative interface (web ui): http://my-cluster-url:8080
Number 2 yields a success JSON response:
{
"action" : "KillSubmissionResponse",
"message" : "Kill request for driver-xxxxxxxxxxxxxx-xxxx submitted",
"serverSparkVersion" : "1.6.0",
"submissionId" : "driver-xxxxxxxxxxxxxx-xxxx",
"success" : true
}
Where 'driver-xxxxxxxxxxxxxx-xxxx' is the actual driver id.
But the web UI http://my-cluster-url:8080/ still shows the driver program as running.
Is there anything else I can try?

Related

CBORFactory NoClassDefFoundError exception in spark 3.0.0

I am having spark streaming application using kinesis and running in EMR 6.0.0,
It's running fine locally but when deploying to AWS EMR it keeps failing with
NoClassDefFoundError exception
20/11/17 15:26:56 INFO Client:
client token: N/A
diagnostics: User class threw exception: java.lang.NoClassDefFoundError: com/fasterxml/jackson/dataformat/cbor/CBORFactory
at com.amazonaws.protocol.json.SdkJsonProtocolFactory.getSdkFactory(SdkJsonProtocolFactory.java:123)
at com.amazonaws.protocol.json.SdkJsonProtocolFactory.createGenerator(SdkJsonProtocolFactory.java:54)
at com.amazonaws.protocol.json.SdkJsonProtocolFactory.createGenerator(SdkJsonProtocolFactory.java:74)
at com.amazonaws.protocol.json.SdkJsonProtocolFactory.createProtocolMarshaller(SdkJsonProtocolFactory.java:64)
at com.amazonaws.services.kinesis.model.transform.DescribeStreamRequestProtocolMarshaller.marshall(DescribeStreamRequestProtocolMarshaller.java:52)
at com.amazonaws.services.kinesis.AmazonKinesisClient.executeDescribeStream(AmazonKinesisClient.java:861)
at com.amazonaws.services.kinesis.AmazonKinesisClient.describeStream(AmazonKinesisClient.java:846)
at com.amazonaws.services.kinesis.AmazonKinesisClient.describeStream(AmazonKinesisClient.java:887)
at com.gartner.tn.datafeed.application.PositionStreamApplicationV4.getJavaDStream(PositionStreamApplicationV4.java:240)
I had the exact same issue and I solved it by removing the dependence on CBOR from Kinesis. I am not sure if that is an option for you but it worked for me.
There are a few ways to do this but, for when running in local mode, I put the following code at the beginning of the main class in my streaming spark application;
System.setProperty(SDKGlobalConfiguration.AWS_CBOR_DISABLE_SYSTEM_PROPERTY, "true");
When running in cluster mode start your spark submit as follows;
spark-submit --deploy-mode cluster \
--conf spark.driver.extraJavaOptions='-Dcom.amazonaws.sdk.disableCbor=true' \
--conf spark.executor.extraJavaOptions='-Dcom.amazonaws.sdk.disableCbor=true'
When running in client mode on the cluster start like this;
spark-submit --deploy-mode client \
--driver-java-options '-Dcom.amazonaws.sdk.disableCbor=true' \
--conf spark.executor.extraJavaOptions='-Dcom.amazonaws.sdk.disableCbor=true'
This question led me to the answer; Getting an AmazonKinesisException Status Code: 502 when using LocalStack from Java

How to stop running Spark application?

I wrote a few Spark job in Java then submitted the jars with submit script.
bin/spark-submit --class "com.company.spark.jobName.SparkMain" --master local[*] /tmp/spark-job-1.0.jar
There will be a service and will run in same server. The service should stop the job when receive the stop command.
I have these information about job in service:
SparkHome
AppName
AppResource
Master uri
app-id
status
Is there any way to stop running spark job in java code.
Have you reviewed the REST server and the ability to use /submissions/kill/[submissionId]? That seems like it would work for your need.

Can not start a job from java code in spark; Initial job has not accepted any resource

Hello my Spark configuration in JAVA is :
ss=SparkSession.builder()
.config("spark.driver.host", "192.168.0.103")
.config("spark.driver.port", "4040")
.config("spark.dynamicAllocation.enabled", "false")
.config("spark.cores.max","1")
.config("spark.executor.memory","471859200")
.config("spark.executor.cores","1")
//.master("local[*]")
.master("spark://kousik-pc:7077")
.appName("abc")
.getOrCreate();
Now when I am submitting any job from inside code(not submitting jar) I am getting the Warning:
TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
The spark UI is
The worker that is in the screenshot is started from command:
~/spark/sbin/start-slave.sh
All the four jobs those are in waiting stage is submitted from java code. Tried all solutions from all sites. Any idea please.
As per my understanding, you wanted to run a spark job using only one executor core, you don't have to specify spark.executor.cores.
spark.cores.max should handle assigning only one core to each job as its value is 1.
Its always good practice to provide the configuration details like master, executor memory/cores in spark-submit command like below:
./bin/spark-submit \
--class org.apache.spark.examples.SparkPi \
--master spark://xxx.xxx.xxx.xxx:7077 \
--executor-memory 20G \
--total-executor-cores 100 \
/path/to/examples.jar \
1000
In case if you want to explicitly specify the number of executors to each job use --total-executor-cores in your spark-submit command
Check the documentation here

Launch spark master windows7

Using win7-64, jdk8, sparks1.6.2.
I have spark running, winutils, HADOOP_HOME, etc
Per documentation Note: The launch scripts do not currently support Windows. To run a Spark cluster on Windows, start the master and workers by hand. But does not say how?
How do I launch spark master on windows?
Tried running sh start-master.sh thru git bash : failed to launch org.apache.spark.deploy.master.Master: Even though it prints out Master --ip Sam-Toshiba --port 7077 --webui-port 8080 - So I don't know what all this means.
But when I try spark-submit --class " " --master spark://Sam-Toshiba:7077 target/ .jar -
I get errors:
WARN AbstractLifeCycle: FAILED SelectChannelConnector#0.0.0.0:
4040: java.net.BindException: Address already in use: bind
java.net.BindException: Address already in use
WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
17/01/12 14:44:29 WARN AppClient$ClientEndpoint: Failed to connect to master Sam-Toshiba:7077
java.io.IOException: Failed to connect to Sam-Toshiba/192.168.137.1:7077
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:216)
Also tried spark://localhost:7077 - same errors
On Windows you can launch Master using below command. Open command prompt and go to Spark bin folder and execute
spark-class.cmd org.apache.spark.deploy.master.Master
Above command will print like Master: Starting Spark master at spark://192.168.99.1:7077 in console as per IP of your machine. You can check the UI at http://192.168.99.1:8080/
If you want to launch worker once your master is up you can use below command. This will use all the available cores of your machine.
spark-class.cmd org.apache.spark.deploy.worker.Worker spark://192.168.99.1:7077
If you want to utilize 2 cores of your 4 cores of machine then use
spark-class.cmd org.apache.spark.deploy.worker.Worker -c 2 spark://192.168.99.1:7077

Timeout when submitting jobs to EC2 cluster

I've been trying to make this work with no luck thus far. I launch a cluster with
./spark-ec2 -k keyname -i ~/.keys/key.pem --region=us-east-1 -s 5 launch "my test cluster"
Then I submit a job with
bin/spark-submit --verbose --class com.company.jobs.AggregateCostDataWorkflow --master spark://ec2-54-157-122-49.compute-1.amazonaws.com:7077 --deploy-mode cluster --conf spark.executor.memory=5g /Users/my.name/scala-proj/target/scala-2.10/scala-proj-0.1.0.jar --outputPath,s3n://my-bucket/my-name/ec2-spark-test/
Where outPutPath is an argument to the main method. After a bit and some status output, I see an exception that looks like
15/06/05 16:09:33 INFO StandaloneRestClient: Submitting a request to launch an application in spark://ec2-74-141-162-19.compute-1.amazonaws.com:7077.
Exception in thread "main" java.net.ConnectException: Operation timed out
at java.net.PlainSocketImpl.socketConnect(Native Method)
at [java socket stuff elided for brevity] org.apache.spark.deploy.rest.StandaloneRestClient.postJson(StandaloneRestClient.scala:150)
at org.apache.spark.deploy.rest.StandaloneRestClient.createSubmission(StandaloneRestClient.scala:70)
at org.apache.spark.deploy.rest.StandaloneRestClient$.run(StandaloneRestClient.scala:317)
at org.apache.spark.deploy.rest.StandaloneRestClient$.main(StandaloneRestClient.scala:329)
at org.apache.spark.deploy.rest.StandaloneRestClient.main(StandaloneRestClient.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:569)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:166)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:178)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:110)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
This is spark 1.3.1 (on my local machine) I can access the UI on the master machine and verify that the Spark processes are in fact up. I can also ssh into the master.
Any tips?
You need to open ports by editing security policies, if you want to access ports on your EC2 spark cluster. spark_ec2.py doesn't open ports 7077 and 6066 on master to be accessed from outside the cluster.
I use the other way - connect to master machine of your spark cluster with the command
./spark_ec2.py -k keyname -i ~/.keys/key.pem login "my test cluster"
Upload the your job file (with scp using same key) and submit job from there. This would ensure that your driver has access to a cluster master and slaves.
See "Running Applications" section of Running Spark on EC2 documentation

Categories