Running a Job on Spark 0.9.0 throws error

Running a Job on Spark 0.9.0 throws error - java

I have a Apache Spark 0.9.0 Cluster installed where I am trying to deploy a code which reads a file from HDFS. This piece of code throws a warning and eventually the job fails. Here is the code
/**
* running the code would fail
* with a warning
* Initial job has not accepted any resources; check your cluster UI to ensure that
* workers are registered and have sufficient memory
*/
object Main extends App {
val sconf = new SparkConf()
.setMaster("spark://labscs1:7077")
.setAppName("spark scala")
val sctx = new SparkContext(sconf)
sctx.parallelize(1 to 100).count
}
The below is the WARNING message
Initial job has not accepted any resources; check your cluster UI to
ensure that workers are registered and have sufficient memory
how to get rid of this or am I missing some configurations.

You get this when either the number of cores or amount of RAM (per node) you request via setting spark.cores.max and spark.executor.memory resp' exceeds what is available. Therefore even if no one else is using the cluster, and you specify you want to use, say 100GB RAM per node, but your nodes can only support 90GB, then you will get this error message.
To be fair the message is vague in this situation, it would be more helpful if it said your exceeding the maximum.

Looks like Spark master can't assign any workers for this task. Either the workers aren't started or they are all busy.
Check Spark UI on master node (port specified by SPARK_MASTER_WEBUI_PORT in spark-env.sh, 8080 by default). It should look like this:
For cluster to function properly:
There must be some workers with state "Alive"
There must be some cores available (for example, if all cores are busy with the frozen task, the cluster won't accept new tasks)
There must be sufficient memory available

Also make sure your spark workers can communicate both ways with the driver. Check for firewalls, etc.

I had this exact issue. I had a simple 1-node Spark cluster and was getting this error when trying to run my Spark app.
I ran through some of the suggestions above and it was when I tried to run the Spark shell against the cluster and not being able to see this in the UI that I became suspicious that my cluster was not working correctly.
In my hosts file I had an entry, let's say SparkNode, that referenced the correct IP Address.
I had inadvertently put the wrong IP Address in the conf/spark-env.sh file against the SPARK_MASTER_IP variable. I changed this to SparkNode and I also changed SPARK_LOCAL_IP to point to SparkNode.
To test this I opened up the UI using SparkNode:7077 in the browser and I could see an instance of Spark running.
I then used Wildfires suggestion of running the Spark shell, as follows:
MASTER=spark://SparkNode:7077 bin/spark-shell
Going back to the UI I could now see the Spark shell application running, which I couldn't before.
So I exited the Spark shell and ran my app using Spark Submit and it now works correctly.
It is definitely worth checking out all of your IP and host entries, this was the root cause of my problem.

You need to specify the right SPARK_HOME and your driver program's IP address in case Spark may not able to locate your Netty jar server. Be aware that your Spark master should listen to the correct IP address which you suppose to use. This can be done by setting SPARK_MASTER_IP=yourIP in file spark-env.sh.
val conf = new SparkConf()
.setAppName("test")
.setMaster("spark://yourSparkMaster:7077")
.setSparkHome("YourSparkHomeDir")
.set("spark.driver.host", "YourIPAddr")

Check for errors regard to hostname, IP address and loopback. Make sure to set SPARK_LOCAL_IP and SPARK_MASTER_IP.

I had similar issue Initial job has not accepted any resource, fixed it by specify the spark correct download url on spark-env.sh or installing spark on all slaves.
export SPARK_EXECUTOR_URI=http://mirror.fibergrid.in/apache/spark/spark-1.6.1/spark-1.6.1-bin-hadoop2.6.tgz
Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory

Related

Weird way servers are getting added in baseline topology

I have two development machines, both running Ignite in server mode on same network. Actually I started the server in the first machine and then started another machine. When the other machine starts, it is getting automatically added to the first one's topology.
Note:
when starting I've removed the work folder in both machines.
In config, I never mentioned any Ips of other machines.
Can anyone tell me what's wrong with this? My intention is each machine should've separate topology.

As described in discovery documentation, Apache Ignite will employ multicast to find all nodes in a local network, forming a cluster. This is default mode of operation.
Please note that we don't really recommend using this mode for either development or production deployment, use static discovery instead (see the same documentation).

Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources

I'm trying to run the spark examples from Eclipse and getting this generic error: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources.
The version I have is spark-1.6.2-bin-hadoop2.6. I started spark using the ./sbin/start-master.sh command from a shell, and set my sparkConf like this:
SparkConf conf = new SparkConf().setAppName("Simple Application");
conf.setMaster("spark://My-Mac-mini.local:7077");
I'm not bringing any other code here because this error pops up with any of the examples I'm running. The machine is a Mac OSX and I'm pretty sure it has enough resources to run the simplest examples.
What am I missing?

I had the same problem, and it was because the workers could not communicate with the driver.
You need to set spark.driver.port (and open said port on your driver), spark.driver.host and spark.driver.bindAddress in your spark-submit from the driver.

The error indicates that you cluster has insufficient resources for current job.Since you have not started the slaves i.e worker . The cluster won't have any resources to allocate to your job. Starting the slaves will work.
`start-slave.sh <spark://master-ip:7077>`

Solution to your Answer
Reason
Spark Master doesn't have any resources allocated to execute the Job like worker node or slave node.
Fix
You have to start the slave node by connecting with the master node like this /SPARK_HOME/sbin> ./start-slave.sh spark://localhost:7077 (if your master in your local node)
Conclusion
start your master node and also slave node during spark-submit, so that you will get the enough resources allocated to execute the job.
Alternate-way
You need to make necessary changes in spark-env.sh file which is not recommended.

If you try to run your application with IDE, and you have free resources on your workers, you need to do this:
1) Before all, configure workers and master spark nodes.
2) Specify driver(PC) configuration to return calculation value from workers.
SparkConf conf = new SparkConf()
.setAppName("Test spark")
.setMaster("spark://ip of your master node:port of your master node")
.set("spark.blockManager.port", "10025")
.set("spark.driver.blockManager.port", "10026")
.set("spark.driver.port", "10027") //make all communication ports static (not necessary if you disabled firewalls, or if your nodes located in local network, otherwise you must open this ports in firewall settings)
.set("spark.cores.max", "12")
.set("spark.executor.memory", "2g")
.set("spark.driver.host", "ip of your driver (PC)"); //(necessary)

I had a stand-alone cluster setup on my local Mac machine with 1 master and 1 worker. The worker was connected to master and everything seemed to be Ok. However, to save memory I thought I will start the worker with 500M memory only and I had this problem. I restarted the worker with 1G of memory and it worked.
./start-slave.sh spark://{master_url}:{master_port} -c 2 -m 1G

I've encountered the same issue while setting up a Spark cluster on EC2. The issue is that the worker instance is unable to reach the driver because the security group rules didn't open up the port.
I confirmed this to be the issue by opening up all inbound ports for the driver temporarily - and then it worked.
In the Spark documentation, it says the spark.driver.port port is randomly assigned. So, to make this port a fixed one <driverPort>, I specified this port configuration when I created the SparkSession (I'm using PySpark).
That way, I need to open up only <driverPort> in the driver instance's security group - which is a better practice.

Try using "spark://127.0.0.1:7077" as a master address instead of *.local name. Sometime java is not able to resolve .local addresses - for reasons I don't understand.

What is the limitation with one-node Cassandra cluster?

I am experimenting with Cassandra and Opscenter. In the Opscenterd's log file, I found this line
2015-07-29 16:10:16+0000 [] ERROR: Problem while calling CreateClusterConfController (SingleNodeProvisioningError): Due to a limitation with one-node clusters, OpsCenter will not be able to communicate with the Datastax Agent unless list
en_address/broadcast_address in cassandra.yaml are set to 172.17.42.1. Please ensure these match before continuing.
Because I deployed Cassandra and Opscenter in different Docker containers, I must set listen_address to the container's internal IP (because Cassandra sitting in a container knows nothing about its host) and broadcast_address to the corresponding host's bridge IP. This is the normal setup if you deploy Cassandra on machines behind separate gateways (like AWS EC2 where each instance has a private and a public IP).
Question 1: What exactly is the limitation with one-node cluster?
Question 2: How should I workaround the problem in this case?
Thanks

Question 1: What exactly is the limitation with one-node cluster?
OpsCenter (via underlying python driver) is reading cluster information from Cassandra’s system tables (namely, system.peers and system.local), with most of the information coming from system.peers, including broadcast interfaces for each of the nodes.
However, that table does not contain information about the node itself, only about its peers. When there are no peers, there is no way to get broadcast address from Cassandra itself, and that’s what OpsCenter uses to tie actual Cassandra instances to the internal representation. In this case OpsCenter uses whatever address you specified as a seed (172.17.42.1 here), and when agents report with a different IP (they’re getting Cassandra’s broadcast address via JMX), OpsCenter would discard those messages.
Question 2: How should I workaround the problem in this case?
Try setting local_address in address.yaml to 172.17.42.1, this should do the trick.

How to know status of Kafka broker in java?

i am working on apache storm which has a topolgy main class. This topology contains the kafkaSpout which listen a kafka topic over a kafka broker. Now before i submit this topology i want to make sure the status of the kafka broker which has the topic. But i didnt found any way to do it? How a kafka brokers status can be known from storm tolopogy class ? Please help...

If you simply want a quick way to know if it is running or not you can just run the start command again with the same config:
bin/kafka-server-start.sh config/server.properties
If it's running then you should get an exception about the port already being in use.
Not foolproof, so better would be to use Zookeeper as mentioned above:
Personally I use intellij which has a Zookeeper plugin which helps you browse the brokers/topics running within it. Probably something similar for Eclipse or other IDEs.
(IntelliJ)
Go to File > Settings > type zookeeper in the search, then install and click ok (may need to restart)
Go to File > Settings > type zookeeper in the search. Click enable then put in the address where your zookeeper server is running and apply changes. (Note you may need to check the port is correct too)
You should now see your zookeeper server as a tab on the left side of the IDE.
This should show you your broker and topics, consumers etc
Hope that helps!

If you have configured storm-ui, then that should give you a brief information about the running cluster, including informations such as currently running topologies, available free slots, supervisor info etc.
Programitically you can write a thrift client to retrieve those information from the storm cluster. You can possibly choose almost any language to develope your own client.
Check out this article for further reference

Depending on what kind of status you want to have, for most cases you would actually retrieve this from Zookeeper. In Zookeeper you can see registered brokers, topics and other useful things which might be what you're looking for.
Another solution would be to deploy a small regular consumer which would be able to perform those checks for you.

Redis/Jedis no single point of failure and automated failover

In a simple situation with 3 servers with 1 master and 2 slaves with no sharding. Is there a proven solution with java and Jedis that has no single point of failure and will automatically deal with a single server going down be that master or slave(automated failover). e.g. promoting masters and reseting after the failure without any lost data.
It seems to me like it should be a solved problem but I can't find any code on it just high level descriptions of possible ways to do it.
Who actually has this covered and working in production?

You may want to give a try to Redis Sentinel to achieve that:
Redis Sentinel is a system designed to help managing Redis instances.
It performs the following three tasks:
Monitoring. Sentinel constantly check if your master and slave instances are working as expected.
Notification. Sentinel can notify the system administrator, or another computer program, via an API, that something is wrong with one
of the monitored Redis instances.
Automatic failover. If a master is not working as expected, Sentinel can start a failover process where a slave is promoted to master, the
other additional slaves are reconfigured to use the new master, and
the applications using the Redis server informed about the new address
to use when connecting.
... or to use an external solution like Zookeeper and Jedis_failover:
JedisPool pool = new JedisPoolBuilder()
.withFailoverConfiguration(
"localhost:2838", // ZooKeeper cluster URL
Arrays.asList( // List of redis servers
new HostConfiguration("localhost", 7000),
new HostConfiguration("localhost", 7001)))
.build();
pool.withJedis(new JedisFunction() {
#Override
public void execute(final JedisActions jedis) throws Exception {
jedis.ping();
}
});
See this presentation of Zookeeper + Redis.
[Update] ... or a pure Java solution with Jedis + Sentinel is to use a wrapper that handles Redis Sentinel events, see SentinelBasedJedisPoolWrapper.

Currently using Jedis 2.4.2 ( from git ), I didn't find a way to do a failover based only on Redis or Sentinel. I hope there will be a way. I am thinking to explore the zookeeper option right now. Redis cluster works well in terms of performance and even stability but its still on beta stage.
If anyone has better insight let us know.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Running a Job on Spark 0.9.0 throws error - java

Also make sure your spark workers can communicate both ways with the driver. Check for firewalls, etc.

Check for errors regard to hostname, IP address and loopback. Make sure to set SPARK_LOCAL_IP and SPARK_MASTER_IP.

Related

Weird way servers are getting added in baseline topology

Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources

What is the limitation with one-node Cassandra cluster?

How to know status of Kafka broker in java?

Redis/Jedis no single point of failure and automated failover

Categories

Resources