Get Hadoop Cluster Details - java

I have created a pseudo cluster on a vm and I am trying to get my cluster information such as number of nodes, live node, dead etc. by using the Hadoop's API from a Java program.
Unfortunatelly, I am not able to use any of the methods of the FSNamesystem class. It's like I have to cluster discovery using Java client.
same information as we get using http port 50070.
If the following statement will work I mean if I can create the object which should not return null then I can get all the details regarding my cluster
FSNamesystem f= FSNamesystem.getFSNamesystem();
And also how I can inject dependency for NameNode?

Related

How to connect multiple Java applications to same Ignite cluster?

I have three Java applications that will connect to the same Ignite node (running on a particular VM) to access the same cache store.
Is there a step-by-step procedure on how to run a node outside Java application (from command prompt, may be) and connect my Java apps to it?
Your Java applications should serve as client nodes in your cluster. More information about client/sever mode can be found in the documentation. Server node(s) could be started from command line, it's described here. Information about running with a custom configuration could be found there as well. You need to set up discovery in order to make the entire thing work. It should be done on every node (incl. client nodes). I'd recommend you to use static IP finder in the configuration.

How to connect to k8s api server within a pod using k8s java client

Context
I have a java application built as a docker image.
The image is deployed in a k8s cluster.
In the java application, I want to connect to the api server and save something in Secrets.
How can I do that with k8s java client?
Current Attempts
The k8s official document says:
From within a pod the recommended ways to connect to API are:
run kubectl proxy in a sidecar container in the pod, or as a background process within the container. This proxies the Kubernetes API to the localhost interface of the pod, so that other processes in any container of the pod can access it.
use the Go client library, and create a client using the rest.InClusterConfig() and kubernetes.NewForConfig() functions. They handle locating and authenticating to the apiserver.
But I can't find similar functions neither similar examples in java client.
With the assumption that your Pod has a serviceAccount automounted -- which is the default unless you have specified otherwise -- the ClientBuilder.cluster() method reads the API URL from the environment, reads the cluster CA from the well known location, and similarly the ServiceAccount token from that same location.
Then, while not exactly "create a Secret," this PatchExample performs a mutation operation which one could generalize into "create or update a Secret."

Using redis 'standalone' in java application

I am following a Redis tutorial online that shows how to connect to Redis from a Java application.
I understand that there are many JAVA clients available, and the tutorial I was following was using Jedis. My question is, can a JAVA client (like Jedis) be used without actually installing the redis server itself? The tutorial shows a simple call to:
Jedis jedis = new Jedis("localhost");
and then beginning set/get operations, but I don't believe they installed Redis. I am new to Redis, but I am picturing installing the redis server to be the equivalent of installing something like Oracle, and then using a JAVA API to actually talk to that Oracle instance.
How is the Jedis API used without an actual Redis instance present? If the Jedis client was initialized without the host parameter, would it then expect to find an actual Redis server/instance running on port 6379?

What is the limitation with one-node Cassandra cluster?

I am experimenting with Cassandra and Opscenter. In the Opscenterd's log file, I found this line
2015-07-29 16:10:16+0000 [] ERROR: Problem while calling CreateClusterConfController (SingleNodeProvisioningError): Due to a limitation with one-node clusters, OpsCenter will not be able to communicate with the Datastax Agent unless list
en_address/broadcast_address in cassandra.yaml are set to 172.17.42.1. Please ensure these match before continuing.
Because I deployed Cassandra and Opscenter in different Docker containers, I must set listen_address to the container's internal IP (because Cassandra sitting in a container knows nothing about its host) and broadcast_address to the corresponding host's bridge IP. This is the normal setup if you deploy Cassandra on machines behind separate gateways (like AWS EC2 where each instance has a private and a public IP).
Question 1: What exactly is the limitation with one-node cluster?
Question 2: How should I workaround the problem in this case?
Thanks
Question 1: What exactly is the limitation with one-node cluster?
OpsCenter (via underlying python driver) is reading cluster information from Cassandra’s system tables (namely, system.peers and system.local), with most of the information coming from system.peers, including broadcast interfaces for each of the nodes.
However, that table does not contain information about the node itself, only about its peers. When there are no peers, there is no way to get broadcast address from Cassandra itself, and that’s what OpsCenter uses to tie actual Cassandra instances to the internal representation. In this case OpsCenter uses whatever address you specified as a seed (172.17.42.1 here), and when agents report with a different IP (they’re getting Cassandra’s broadcast address via JMX), OpsCenter would discard those messages.
Question 2: How should I workaround the problem in this case?
Try setting local_address in address.yaml to 172.17.42.1, this should do the trick.

Running a Job on Spark 0.9.0 throws error

I have a Apache Spark 0.9.0 Cluster installed where I am trying to deploy a code which reads a file from HDFS. This piece of code throws a warning and eventually the job fails. Here is the code
/**
* running the code would fail
* with a warning
* Initial job has not accepted any resources; check your cluster UI to ensure that
* workers are registered and have sufficient memory
*/
object Main extends App {
val sconf = new SparkConf()
.setMaster("spark://labscs1:7077")
.setAppName("spark scala")
val sctx = new SparkContext(sconf)
sctx.parallelize(1 to 100).count
}
The below is the WARNING message
Initial job has not accepted any resources; check your cluster UI to
ensure that workers are registered and have sufficient memory
how to get rid of this or am I missing some configurations.
You get this when either the number of cores or amount of RAM (per node) you request via setting spark.cores.max and spark.executor.memory resp' exceeds what is available. Therefore even if no one else is using the cluster, and you specify you want to use, say 100GB RAM per node, but your nodes can only support 90GB, then you will get this error message.
To be fair the message is vague in this situation, it would be more helpful if it said your exceeding the maximum.
Looks like Spark master can't assign any workers for this task. Either the workers aren't started or they are all busy.
Check Spark UI on master node (port specified by SPARK_MASTER_WEBUI_PORT in spark-env.sh, 8080 by default). It should look like this:
For cluster to function properly:
There must be some workers with state "Alive"
There must be some cores available (for example, if all cores are busy with the frozen task, the cluster won't accept new tasks)
There must be sufficient memory available
Also make sure your spark workers can communicate both ways with the driver. Check for firewalls, etc.
I had this exact issue. I had a simple 1-node Spark cluster and was getting this error when trying to run my Spark app.
I ran through some of the suggestions above and it was when I tried to run the Spark shell against the cluster and not being able to see this in the UI that I became suspicious that my cluster was not working correctly.
In my hosts file I had an entry, let's say SparkNode, that referenced the correct IP Address.
I had inadvertently put the wrong IP Address in the conf/spark-env.sh file against the SPARK_MASTER_IP variable. I changed this to SparkNode and I also changed SPARK_LOCAL_IP to point to SparkNode.
To test this I opened up the UI using SparkNode:7077 in the browser and I could see an instance of Spark running.
I then used Wildfires suggestion of running the Spark shell, as follows:
MASTER=spark://SparkNode:7077 bin/spark-shell
Going back to the UI I could now see the Spark shell application running, which I couldn't before.
So I exited the Spark shell and ran my app using Spark Submit and it now works correctly.
It is definitely worth checking out all of your IP and host entries, this was the root cause of my problem.
You need to specify the right SPARK_HOME and your driver program's IP address in case Spark may not able to locate your Netty jar server. Be aware that your Spark master should listen to the correct IP address which you suppose to use. This can be done by setting SPARK_MASTER_IP=yourIP in file spark-env.sh.
val conf = new SparkConf()
.setAppName("test")
.setMaster("spark://yourSparkMaster:7077")
.setSparkHome("YourSparkHomeDir")
.set("spark.driver.host", "YourIPAddr")
Check for errors regard to hostname, IP address and loopback. Make sure to set SPARK_LOCAL_IP and SPARK_MASTER_IP.
I had similar issue Initial job has not accepted any resource, fixed it by specify the spark correct download url on spark-env.sh or installing spark on all slaves.
export SPARK_EXECUTOR_URI=http://mirror.fibergrid.in/apache/spark/spark-1.6.1/spark-1.6.1-bin-hadoop2.6.tgz
Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory

Categories