Timeout when submitting jobs to EC2 cluster

Timeout when submitting jobs to EC2 cluster - java

I've been trying to make this work with no luck thus far. I launch a cluster with
./spark-ec2 -k keyname -i ~/.keys/key.pem --region=us-east-1 -s 5 launch "my test cluster"
Then I submit a job with
bin/spark-submit --verbose --class com.company.jobs.AggregateCostDataWorkflow --master spark://ec2-54-157-122-49.compute-1.amazonaws.com:7077 --deploy-mode cluster --conf spark.executor.memory=5g /Users/my.name/scala-proj/target/scala-2.10/scala-proj-0.1.0.jar --outputPath,s3n://my-bucket/my-name/ec2-spark-test/
Where outPutPath is an argument to the main method. After a bit and some status output, I see an exception that looks like
15/06/05 16:09:33 INFO StandaloneRestClient: Submitting a request to launch an application in spark://ec2-74-141-162-19.compute-1.amazonaws.com:7077.
Exception in thread "main" java.net.ConnectException: Operation timed out
at java.net.PlainSocketImpl.socketConnect(Native Method)
at [java socket stuff elided for brevity] org.apache.spark.deploy.rest.StandaloneRestClient.postJson(StandaloneRestClient.scala:150)
at org.apache.spark.deploy.rest.StandaloneRestClient.createSubmission(StandaloneRestClient.scala:70)
at org.apache.spark.deploy.rest.StandaloneRestClient$.run(StandaloneRestClient.scala:317)
at org.apache.spark.deploy.rest.StandaloneRestClient$.main(StandaloneRestClient.scala:329)
at org.apache.spark.deploy.rest.StandaloneRestClient.main(StandaloneRestClient.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:569)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:166)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:178)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:110)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
This is spark 1.3.1 (on my local machine) I can access the UI on the master machine and verify that the Spark processes are in fact up. I can also ssh into the master.
Any tips?

You need to open ports by editing security policies, if you want to access ports on your EC2 spark cluster. spark_ec2.py doesn't open ports 7077 and 6066 on master to be accessed from outside the cluster.
I use the other way - connect to master machine of your spark cluster with the command
./spark_ec2.py -k keyname -i ~/.keys/key.pem login "my test cluster"
Upload the your job file (with scp using same key) and submit job from there. This would ensure that your driver has access to a cluster master and slaves.
See "Running Applications" section of Running Spark on EC2 documentation

Related

How to stop running Spark application?

I wrote a few Spark job in Java then submitted the jars with submit script.
bin/spark-submit --class "com.company.spark.jobName.SparkMain" --master local[*] /tmp/spark-job-1.0.jar
There will be a service and will run in same server. The service should stop the job when receive the stop command.
I have these information about job in service:
SparkHome
AppName
AppResource
Master uri
app-id
status
Is there any way to stop running spark job in java code.

Have you reviewed the REST server and the ability to use /submissions/kill/[submissionId]? That seems like it would work for your need.

Launch spark master windows7

Using win7-64, jdk8, sparks1.6.2.
I have spark running, winutils, HADOOP_HOME, etc
Per documentation Note: The launch scripts do not currently support Windows. To run a Spark cluster on Windows, start the master and workers by hand. But does not say how?
How do I launch spark master on windows?
Tried running sh start-master.sh thru git bash : failed to launch org.apache.spark.deploy.master.Master: Even though it prints out Master --ip Sam-Toshiba --port 7077 --webui-port 8080 - So I don't know what all this means.
But when I try spark-submit --class " " --master spark://Sam-Toshiba:7077 target/ .jar -
I get errors:
WARN AbstractLifeCycle: FAILED SelectChannelConnector#0.0.0.0:
4040: java.net.BindException: Address already in use: bind
java.net.BindException: Address already in use
WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
17/01/12 14:44:29 WARN AppClient$ClientEndpoint: Failed to connect to master Sam-Toshiba:7077
java.io.IOException: Failed to connect to Sam-Toshiba/192.168.137.1:7077
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:216)
Also tried spark://localhost:7077 - same errors

On Windows you can launch Master using below command. Open command prompt and go to Spark bin folder and execute
spark-class.cmd org.apache.spark.deploy.master.Master
Above command will print like Master: Starting Spark master at spark://192.168.99.1:7077 in console as per IP of your machine. You can check the UI at http://192.168.99.1:8080/
If you want to launch worker once your master is up you can use below command. This will use all the available cores of your machine.
spark-class.cmd org.apache.spark.deploy.worker.Worker spark://192.168.99.1:7077
If you want to utilize 2 cores of your 4 cores of machine then use
spark-class.cmd org.apache.spark.deploy.worker.Worker -c 2 spark://192.168.99.1:7077

Can't connect Java program to MySQL using Docker

I'm learning docker and trying to put my Java web application using Tomcat to container. I followed some basic tutorial but i found no solution to work properly to me. If i run my database and java containers i get the error:
SEVERE: Unable to create initial connections of pool.
com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure
The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server.
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at com.mysql.jdbc.Util.handleNewInstance(Util.java:404)
at com.mysql.jdbc.SQLError.createCommunicationsException(SQLError.java:981)
at com.mysql.jdbc.MysqlIO.<init>(MysqlIO.java:339)
at com.mysql.jdbc.ConnectionImpl.coreConnect(ConnectionImpl.java:2253)
at com.mysql.jdbc.ConnectionImpl.connectOneTryOnly(ConnectionImpl.java:2286)
at com.mysql.jdbc.ConnectionImpl.createNewIO(ConnectionImpl.java:2085)
at com.mysql.jdbc.ConnectionImpl.<init>(ConnectionImpl.java:795)
at com.mysql.jdbc.JDBC4Connection.<init>(JDBC4Connection.java:44)
MySQL Dockerfile
FROM mysql:latest
ENV MYSQL_DATABASE=db_name #name of db that is required by Java program
Run by:
docker run --name db_name -e MYSQL_ROOT_PASSWORD=root -d db_name
Java Dockerfile
FROM tomcat:7.0.70-jre8
ADD deploy /usr/local/tomcat/webapps #extracted .war
ADD jdbc /usr/local/tomcat/lib #MySQL jdbc drivers
ADD context /usr/local/tomcat/conf #context.xml
Run by:
docker run --name app_name --link db_name:db_name -p 8080:8080 -d app_name
Whole configuration was running properly when i was running it locally in Eclipse.

Because you do not provide the full stack tace, which would show what connection string tomcat is using, I have to guess that you do not provide the proper connection string to your tomcat conainer. You have to supply an connectionstring like:
jdbc:mysql://database_container_name:3306/database_name
into your tomcat config.
BTW:
You should rearrange you lines in the Tomcat Dockerfile to
FROM tomcat:7.0.70-jre8
ADD jdbc /usr/local/tomcat/lib #MySQL jdbc drivers
ADD context /usr/local/tomcat/conf #context.xml
ADD deploy /usr/local/tomcat/webapps #extracted .war
Because docker can cache build layers. With the old Order your war is the first layer of the image and changes every time you make a change to your application causing the folliwing layers to be rebuild every time, even if they did not change. The new order uses the docker chache better. With this order the never changing MySQL driver is always cached and the config also does not alter as fast as the war.
In this example the effect may be minimal, but if you build bigger images with more layers and lengthy build steps (like apt-get install smth) the chache can significantly speed up your build.

spark standalone cluster slave unable to connect slave to master

i have the bin for spark-1.6.0-bin-hadoop2.6 im having issue trying to connect the slave to the master
so far i have tried(on ubuntu 14.04 live usb):
apt-get purge and install openssh-client and server on both systems
I have stated explicitly the ip address of the master in the spark url for the worker
spark://< master ip>:7077 and also tried changing the SPARK_MASTER_IP in /conf/spark-env.sh the worker executes but the log has the following error
im assuming there must be some ssh setup involved but i have tried ssh-keygen and ssh-copy-id # it also doesnt give any reuslts
16/02/22 07:49:16 INFO Worker: Connecting to master 192.168.0.208:7077...
16/02/22 07:49:16 WARN Worker: Failed to connect to master 192.168.0.208:7077
java.io.IOException: Failed to connect to /192.168.0.208:7077
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:216)
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:167)
at org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:200)
at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:187)
at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:183)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.ConnectException: Connection refused: /192.168.0.208:7077
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:740)
at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:224)
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:289)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:528)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
... 1 more
16/02/22 07:49:27 INFO Worker: Retrying connection to master (attempt # 2)
i am however able to open the master webUI by typing :8080 on my browser . i am also able to access the webUI of the slave from the master . im almost at the point of giving in so please helllppp.

Make sure that each master and worker has a firewall exception to allow connections for all other workers and masters.
Here's a simplified example from one of our master machines (master0):
$iptables -L
...
ACCEPT all -- worker0.company.com master0.company.com
ACCEPT all -- worker1.company.com master0.company.com
ACCEPT all -- master1.company.com master0.company.com
...
Of course you can also use IPs instead of hostnames.

Spark on yarn jar upload problems

I am trying to run a simple Map/Reduce java program using spark over yarn (Cloudera Hadoop 5.2 on CentOS). I have tried this 2 different ways. The first way is the following:
YARN_CONF_DIR=/usr/lib/hadoop-yarn/etc/hadoop/;
/var/tmp/spark/spark-1.4.0-bin-hadoop2.4/bin/spark-submit --class MRContainer --master yarn-cluster --jars /var/tmp/spark/spark-1.4.0-bin-hadoop2.4/lib/spark-assembly-1.4.0-hadoop2.4.0.jar simplemr.jar
This method gives the following error:
diagnostics: Application application_1434177111261_0007 failed 2 times
due to AM Container for appattempt_1434177111261_0007_000002 exited
with exitCode: -1000 due to: Resource
hdfs://kc1ltcld29:9000/user/myuser/.sparkStaging/application_1434177111261_0007/spark-assembly-1.4.0-hadoop2.4.0.jar
changed on src filesystem (expected 1434549639128, was 1434549642191
Then I tried without the --jars:
YARN_CONF_DIR=/usr/lib/hadoop-yarn/etc/hadoop/;
/var/tmp/spark/spark-1.4.0-bin-hadoop2.4/bin/spark-submit --class MRContainer --master yarn-cluster simplemr.jar
diagnostics: Application application_1434177111261_0008 failed 2 times
due to AM Container for appattempt_1434177111261_0008_000002 exited
with exitCode: -1000 due to: File does not exist:
hdfs://kc1ltcld29:9000/user/myuser/.sparkStaging/application_1434177111261_0008/spark-assembly-1.4.0-hadoop2.4.0.jar
.Failing this attempt.. Failing the application.
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: root.myuser
start time: 1434549879649
final status: FAILED
tracking URL: http://kc1ltcld29:8088/cluster/app/application_1434177111261_0008
user: myuser Exception in thread "main" org.apache.spark.SparkException: Application
application_1434177111261_0008 finished with failed status
at org.apache.spark.deploy.yarn.Client.run(Client.scala:841)
at org.apache.spark.deploy.yarn.Client$.main(Client.scala:867)
at org.apache.spark.deploy.yarn.Client.main(Client.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:664)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:169)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:192)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:111)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 15/06/17 10:04:57 INFO util.Utils: Shutdown hook called 15/06/17
10:04:57 INFO util.Utils: Deleting directory
/tmp/spark-2aca3f35-abf1-4e21-a10e-4778a039d0f4
I tried deleting all the .jars from hdfs://users//.sparkStaging and resubmitting but that didn't help.

The problem was solved by copying spark-assembly.jar into a directory on the hdfs for each node and then passing it to spark-submit --conf spark.yarn.jar as a parameter. Commands are listed below:
hdfs dfs -copyFromLocal /var/tmp/spark/spark-1.4.0-bin-hadoop2.4/lib/spark-assembly-1.4.0-hadoop2.4.0.jar /user/spark/spark-assembly.jar
/var/tmp/spark/spark-1.4.0-bin-hadoop2.4/bin/spark-submit --class MRContainer --master yarn-cluster --conf spark.yarn.jar=hdfs:///user/spark/spark-assembly.jar simplemr.jar

If you are getting this error it means you are uploading assembly jars using --jars option or manually copying to hdfs in each node.
I have followed this approach and it works for me.
In yarn-cluster mode, Spark submit automatically uploads the assembly jar to a distributed cache that all executor containers read from, so there is no need to manually copy the assembly jar to all nodes (or pass it through --jars).
It seems there are two versions of the same jar in your HDFS.
Try removing all old jars from your .sparkStaging directory and try again, it should work.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Timeout when submitting jobs to EC2 cluster - java

Related

How to stop running Spark application?

Launch spark master windows7

Can't connect Java program to MySQL using Docker

spark standalone cluster slave unable to connect slave to master

Spark on yarn jar upload problems

Categories

Resources