Amazon EMR Clusters stuck on starting state

Amazon EMR Clusters stuck on starting state - java

I'm creating clusters on AWS EMR (with Console and SDK). But these clusters always remain "starting" state and never start. Why can this happen and how can I solve it? Thanks.
bootstrap-actions log:
INFO i-062fab1a95f485684: new instance started
ERROR i-062fab1a95f485684: failed to start. bootstrap action 1 failed with non-zero exit code.
My codes:
val emr = AmazonElasticMapReduceClientBuilder.standard()
.withCredentials(new AWSStaticCredentialsProvider(awsCred))
.withRegion(Regions.EU_WEST_1)
.build()
val stepFactory = new StepFactory();
val enabledebugging = new StepConfig()
.withName("Enable debugging")
.withActionOnFailure("TERMINATE_JOB_FLOW")
.withHadoopJarStep(stepFactory.newEnableDebuggingStep())
val spark = new Application().withName("Spark")
val hive = new Application().withName("Hive")
val ganglia = new Application().withName("Ganglia")
val zeppelin = new Application().withName("Zeppelin")
val request = new RunJobFlowRequest()
.withName("Spark Cluster")
.withReleaseLabel("emr-5.20.0")
.withSteps(enabledebugging)
.withApplications(spark)
.withLogUri("s3://my-logs")
.withServiceRole("EMR_DefaultRole")
.withJobFlowRole("EMR_EC2_DefaultRole")
.withInstances(new JobFlowInstancesConfig()
.withEc2SubnetId("subnet-xxxxx")
.withEc2KeyName("ec2test")
.withInstanceCount(3)
.withKeepJobFlowAliveWhenNoSteps(true)
.withMasterInstanceType("m5.xlarge")
.withSlaveInstanceType("m5.xlarge")
);
val result = emr.runJobFlow(request);
System.out.println("The cluster ID is " + result.toString());

Related

Pod size returned zero post kubernetes job creation

We are creating kubernates job using java kubernates client api (V:5.12.2) like below.
I am struck with two places . Could some one please help on this ?
podList.getItems().size() in below code snippet is some times returning zero even though I see pod get created and with other existing jobs.
How to specify particular label to job pod?
KubernetesClient kubernetesClient = new DefaultKubernetesClient();
String namespace = System.getenv(POD_NAMESPACE);
String jobName = TextUtils.concatenateToString("flatten" + Constants.HYPHEN+ flattenId);
Job jobRequest = createJob(flattenId, authValue);
var jobResult = kubernetesClient.batch().v1().jobs().inNamespace(namespace)
.create(jobRequest);
PodList podList = kubernetesClient.pods().inNamespace(namespace)
.withLabel("job-name", jobName).list();
// Wait for pod to complete
var pods = podList.getItems().size();
var terminalPodStatus = List.of("succeeded", "failed");
_LOGGER.info("pods created size:" + pods);
if (pods > 0) {
// returns zero some times.
var k8sPod = podList.getItems().get(0);
var podName = k8sPod.getMetadata().getName();
kubernetesClient.pods().inNamespace(namespace).withName(podName)
.waitUntilCondition(pod -> {
var podPhase = pod.getStatus().getPhase();
//some logic
return terminalPodStatus.contains(podPhase.toLowerCase());
}, JOB_TIMEOUT, TimeUnit.MINUTES);
kubernetesClient.close();
}
private Job createJob(String flattenId, String authValue) {
return new JobBuilder()
.withApiVersion(API_VERSION)
.withNewMetadata().withName(jobName)
.withLabels(labels)
.endMetadata()
.withNewSpec()
.withTtlSecondsAfterFinished(300)
.withBackoffLimit(0)
.withNewTemplate()
.withNewMetadata().withAnnotations(LINKERD_INJECT_ANNOTATIONS)
.endMetadata()
.withNewSpec()
.withServiceAccount(Constants.TEST_SERVICEACCOUNT)
.addNewContainer()
.addAllToEnv(envVars)
.withImage(System.getenv(BUILD_JOB_IMAGE))
.withName(“”test)
.withCommand("/bin/bash", "-c", "java -jar test.jar")
.endContainer()
.withRestartPolicy(RESTART_POLICY_NEVER)
.endSpec()
.endTemplate()
.endSpec()
.build();
}

Pods are not instantly created as consequence of creating a Job: The Job controller needs to become active and create the Pods accordingly. Depending on the load on your control plane and number of Job instances you may need to wait more or less time.

Flink does not checkpoint, and BucketingSink leaves files in pending state, when source is generated from Collection

I'm trying to generate some test data using a collection, and write that data to s3, Flink doesn't seem to do any checkpointing at all when I do this, but it does do checkpointing when the source comes from s3.
For example, this DOES checkpoint and leaves output files in a completed state:
val env: StreamExecutionEnvironment = StreamExecutionEnvironment.getExecutionEnvironment
env.setMaxParallelism(128)
env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)
env.enableCheckpointing(2000L)
env.setStateBackend(new RocksDBStateBackend("s3a://my_bucket/simple_job/rocksdb_checkpoints"))
val lines: DataStream[String] = {
val path = "s3a://my_bucket/simple_job/in"
env
.readFile(
inputFormat = new TextInputFormat(new Path(path)),
filePath = path,
watchType = FileProcessingMode.PROCESS_CONTINUOUSLY,
interval = 5000L
)
}
val sinkFunction: BucketingSink[String] =
new BucketingSink[String]("s3a://my_bucket/simple_job/out")
.setBucketer(new DateTimeBucketer("yyyy-MM-dd--HHmm"))
lines.addSink(sinkFunction)
env.execute()
Meanwhile, this DOES NOT checkpoint, and leaves files in a .pending state even after the job has finished:
val env: StreamExecutionEnvironment = StreamExecutionEnvironment.getExecutionEnvironment
env.setMaxParallelism(128)
env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)
env.enableCheckpointing(2000L)
env.setStateBackend(new RocksDBStateBackend("s3a://my_bucket/simple_job/rocksdb_checkpoints"))
val lines: DataStream[String] = env.fromCollection((1 to 100).map(_.toString))
val sinkFunction: BucketingSink[String] =
new BucketingSink[String]("s3a://my_bucket/simple_job/out")
.setBucketer(new DateTimeBucketer("yyyy-MM-dd--HHmm"))
lines.addSink(sinkFunction)
env.execute()

It turns out that this is because of this ticket: https://issues.apache.org/jira/browse/FLINK-2646 and simply comes about because the stream from collection finishes finishes before the app ever has time to make a single checkpoint.

Is that possible to get yarn application status from AWS EMR Java SDK?

Context
I run spark applications on an Amazon EMR cluster.
These applications are orchestrated by Yarn.
From AWS Console, I am able to get YARN application status using the Application History tab of the cluster's detail page. (cf. View Application History)
Expectation / Question
I would like to get the same information (application status) but from a java or scala program.
So, is it possible to get yarn application status from AWS EMR Java SDK ?
In my application, I manage some EMR object instance like:
AmazonElasticMapReduceClient
Cluster
Thanks in advance.

I came upon this because i was looking for a way to get the job status via EMRs "steps" api...but if you're looking to get it via yarn directly here is some sample code:
object DataLoad {
private def getJsonField(json: JValue, key: String): Option[String] = {
val value = (json \ key)
value match {
case jval: JValue => Some(jval.values.toString)
case _ => None
}
}
def load(logger: Logger, hiveDatabase: String, hiveTable: String, dw_table_name: String): Unit = {
val conf = ConfigFactory.load
val yarnResourceManager = conf.getString("app.yarnResourceManager")
val sparkExecutors = conf.getString("app.sparkExecutors")
val sparkHome = conf.getString("app.sparkHome")
val sparkAppJar = conf.getString("app.sparkAppJar")
val sparkMainClass = conf.getString("app.sparkMainClass")
val sparkMaster = conf.getString("app.sparkMaster")
val sparkDriverMemory = conf.getString("app.sparkDriverMemory")
val sparkExecutorMemory = conf.getString("app.sparkExecutorMemory")
var destination = ""
if(dw_table_name.contains("s3a://")){
destination = "s3"
}
else
{
destination = "sql"
}
val spark = new SparkLauncher()
.setSparkHome(sparkHome)
.setAppResource(sparkAppJar)
.setMainClass(sparkMainClass)
.setMaster(sparkMaster)
.addAppArgs(hiveDatabase)
.addAppArgs(hiveTable)
.addAppArgs(destination)
.setVerbose(false)
.setConf("spark.driver.memory", sparkDriverMemory)
.setConf("spark.executor.memory", sparkExecutorMemory)
.setConf("spark.executor.cores", sparkExecutors)
.setConf("spark.executor.instances", sparkExecutors)
.setConf("spark.driver.maxResultSize", "5g")
.setConf("spark.sql.broadcastTimeout", "144000")
.setConf("spark.network.timeout", "144000")
.startApplication()
var unknownCounter = 0
while(!spark.getState.isFinal) {
println(spark.getState.toString)
Thread.sleep(10000)
if(unknownCounter > 3000){
throw new IllegalStateException("Spark Job Failed, timeout expired 8 hours")
}
unknownCounter += 1
}
println(spark.getState.toString)
val appId: String = spark.getAppId
println(s"appId: $appId")
var finalState = ""
var i = 0
while(i < 5){
val response = Http(s"http://$yarnResourceManager/ws/v1/cluster/apps/$appId/").asString
if(response.code.toString.startsWith("2"))
{
val json = parse(response.body)
finalState = getJsonField(json \ "app","finalStatus").getOrElse("")
i = 55
}
else {
i = i+1
}
}
if(finalState.equalsIgnoreCase("SUCCEEDED")){
println("SPARK JOB SUCCEEDED")
}
else {
throw new IllegalStateException("Spark Job Failed")
}
}
}

Cannot connect to Elasticearch with Java API (in Scala)

So I am trying to connect to an ElasticSearch cluster, following these examples:
https://www.elastic.co/guide/en/elasticsearch/client/java-api/current/transport-client.html
https://www.elastic.co/guide/en/elasticsearch/client/java-api/current/java-docs-index.html
This is my code:
def saveToES(message: String) {
println("start saving")
// val client = TransportClient.builder().settings(settings).build().addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName("xxx.com"), 9200))
// prepare Adresses
val c = InetAddress.getByName("xxx.com")
println("inetadress done")
val d = new InetSocketTransportAddress(c, 9200)
println("socketadress done")
val settings = Settings.settingsBuilder().put("cluster.name", "CLUSTER").build();
println("settings created")
val a = TransportClient.builder()
println("Builder created")
val aa = a.settings(settings)
println("settings set")
val b = a.build
println("client built")
val client = b.addTransportAddress(d)
println("adress added to client")
val response = client.prepareIndex("performance_logs", "performance").setSource(message).get()
//
println(response.toString)
//
// // on shutdown
//
// client.close();
}
And the program get's stuck on building the client (val b = a.build), so my last print is ("settings set"). No error or whatever, just stuck. For firewall reasons I have to deploy the whole thing in a jar and execute on a server, so I can't really debug with the IDE.
Any idea what the problem is?

where is my EMR cluster

I am trying to create an EMR cluster on java, but i can't neither find it on the EMR cluster list, neither can see the instances requested on EC2.
EMR roles do exist:
sqlInjection#VirtualBox:~$ aws iam list-roles | grep EMR
"RoleName": "EMR_DefaultRole",
"Arn": "arn:aws:iam::removed:role/EMR_DefaultRole"
"RoleName": "EMR_EC2_DefaultRole",
"Arn": "arn:aws:iam::removed:role/EMR_EC2_DefaultRole"
and now my java code:
AWSCredentials awsCredentials = new BasicAWSCredentials(awsKey, awsKeySecret);
AmazonElasticMapReduce emr = new AmazonElasticMapReduceClient(awsCredentials);
StepFactory stepFactory = new StepFactory();
StepConfig enabledebugging = new StepConfig()
.withName("Enable debugging")
.withActionOnFailure("TERMINATE_JOB_FLOW")
.withHadoopJarStep(stepFactory.newEnableDebuggingStep());
HadoopJarStepConfig hadoopConfig1 = new HadoopJarStepConfig()
.withJar("s3://foo.bucket/hadoop_jar/2015-01-12/foo.jar")
.withMainClass("com.strackoverflow.DriverFoo") // optional main class, this can be omitted if jar above has a manifest
.withArgs("--input=s3://foo.bucket/logs/,s3://foo.bucket/morelogs/", "--output=s3://foo.bucket/myEMROutput" , "--inputType=text"); // i have custom java code to handle the --input, --output and --inputType parameters
StepConfig customStep = new StepConfig("Step1", hadoopConfig1);
Collection <StepConfig> steps = new ArrayList<StepConfig>();
{
steps.add(enabledebugging);
steps.add(customStep);
}
JobFlowInstancesConfig instancesConfig = new JobFlowInstancesConfig()
.withEc2KeyName("fookey") //not fookey.pem
.withInstanceCount(2)
.withKeepJobFlowAliveWhenNoSteps(false) // on aws example is set to true
.withMasterInstanceType("m1.medium")
.withSlaveInstanceType("m1.medium");
RunJobFlowRequest request = new RunJobFlowRequest()
.withName("java programatic request")
.withAmiVersion("3.3.1")
.withSteps(steps) // on the amazon example is lunched debug and hive, here is debug and a jar
.withLogUri("s3://devel.rui/emr_clusters/pr01/")
.withInstances(instancesConfig)
.withVisibleToAllUsers(true);
RunJobFlowResult result = emr.runJobFlow(request);
System.out.println("toString "+ result.toString());
System.out.println("getJobFlowId "+ result.getJobFlowId());
System.out.println("hashCode "+ result.hashCode());
Where is my cluster? I cannot see it on cluster list, output folder is not created, logs folder stays empty and no instances are visible on EC2.
by the program outputs this
toString {JobFlowId: j-2xxxxxxU}
getJobFlowId j-2xxxxxU
hashCode -1xxxxx4
I had follow the instruction from here to create the cluster
http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/calling-emr-with-java-sdk.html
And this to create the java job
http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-common-programming-sample.html

On the Amazon example, the region is not configured.
After configuring the region the cluster is launched properly.
AmazonElasticMapReduce emr = new AmazonElasticMapReduceClient(awsCredentials);
emr.setRegion(Region.getRegion(Regions.EU_WEST_1));

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Amazon EMR Clusters stuck on starting state - java

Related

Pod size returned zero post kubernetes job creation

Flink does not checkpoint, and BucketingSink leaves files in pending state, when source is generated from Collection

Is that possible to get yarn application status from AWS EMR Java SDK?

Cannot connect to Elasticearch with Java API (in Scala)

where is my EMR cluster

Categories

Resources