Java code manually Triggering kubernetes cronjob from the cluster - java

I'm trying to trigger cronjob manually(not scheduled) using fabric8 library
but getting the following error:
Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: POST at: https://172.20.0.1:443/apis/batch/v1/
namespaces/engineering/jobs. Message: Job.batch "app-chat-manual-947171" is invalid: spec.template.spec.containers[0].name: Re
quired value. Received status: Status(apiVersion=v1, code=422, details=StatusDetails(causes=[StatusCause(field=spec.template.spec.co
ntainers[0].name, message=Required value, reason=FieldValueRequired, additionalProperties={})], group=batch, kind=Job, name=ap
p-chat-manual-947171, retryAfterSeconds=null, uid=null, additionalProperties={}), kind=Status, message=Job.batch "app-chat-man
ual-947171" is invalid: spec.template.spec.containers[0].name: Required value, metadata=ListMeta(_continue=null, remainingItemCount=
null, resourceVersion=null, selfLink=null, additionalProperties={}), reason=Invalid, status=Failure, additionalProperties={}).
my code is running at the cluster:
maven dependency:
<dependency>
<groupId>io.fabric8</groupId>
<artifactId>kubernetes-client</artifactId>
<version>6.3.1</version>
</dependency>
java code:
public static void triggerCronjob(String cronjobName, String applicableNamespace) {
KubernetesClient kubernetesClient = new KubernetesClientBuilder().build();
final String podName = String.format("%s-manual-%s", cronjobName.length() > 38 ? cronjobName.substring(0, 38) : cronjobName,
new Random().nextInt(999999));
System.out.println("triggerCronjob method invoked, applicableNamespace: " + applicableNamespace
+ ", cronjobName: " + cronjobName + ", podName: " + podName);
Job job = new JobBuilder()
.withApiVersion("batch/v1")
.withNewMetadata()
.withName(podName)
.endMetadata()
.withNewSpec()
.withBackoffLimit(4)
.withNewTemplate()
.withNewSpec()
.addNewContainer()
.withName(podName)
.withImage("perl")
.withCommand("perl", "-Mbignum=bpi", "-wle", "print bpi(2000)")
.endContainer()
.withRestartPolicy("Never")
.endSpec()
.endTemplate()
.endSpec().build();
kubernetesClient.batch().v1().jobs().inNamespace(applicableNamespace).createOrReplace(job);
kubernetesClient.close();
System.out.println("CronJob triggered: applicableNamespace: " + applicableNamespace + ", cronjob name: " + cronjobName);
}
the code executed at the kubernetes cluster, but not form the application, it's an external program that's running in the cluster.
my goal is to trigger given job in a given namespace.

If you want to trigger an already existing CronJob, you need to provide ownerReference for the existing CronJob in Job:
// Get already existing CronJob
CronJob cronJob = kubernetesClient.batch().v1()
.cronjobs()
.inNamespace(namespace)
.withName(cronJobName)
.get();
// Create new Job object referencing CronJob
Job newJobToCreate = new JobBuilder()
.withNewMetadata()
.withName(jobName)
.addNewOwnerReference()
.withApiVersion("batch/v1")
.withKind("CronJob")
.withName(cronJob.getMetadata().getName())
.withUid(cronJob.getMetadata().getUid())
.endOwnerReference()
.addToAnnotations("cronjob.kubernetes.io/instantiate", "manual")
.endMetadata()
.withSpec(cronJob.getSpec().getJobTemplate().getSpec())
.build();
// Apply job object to Kubernetes Cluster
kubernetesClient.batch().v1()
.jobs()
.inNamespace(namespace)
.resource(newJobToCreate)
.create();

Related

Pod size returned zero post kubernetes job creation

We are creating kubernates job using java kubernates client api (V:5.12.2) like below.
I am struck with two places . Could some one please help on this ?
podList.getItems().size() in below code snippet is some times returning zero even though I see pod get created and with other existing jobs.
How to specify particular label to job pod?
KubernetesClient kubernetesClient = new DefaultKubernetesClient();
String namespace = System.getenv(POD_NAMESPACE);
String jobName = TextUtils.concatenateToString("flatten" + Constants.HYPHEN+ flattenId);
Job jobRequest = createJob(flattenId, authValue);
var jobResult = kubernetesClient.batch().v1().jobs().inNamespace(namespace)
.create(jobRequest);
PodList podList = kubernetesClient.pods().inNamespace(namespace)
.withLabel("job-name", jobName).list();
// Wait for pod to complete
var pods = podList.getItems().size();
var terminalPodStatus = List.of("succeeded", "failed");
_LOGGER.info("pods created size:" + pods);
if (pods > 0) {
// returns zero some times.
var k8sPod = podList.getItems().get(0);
var podName = k8sPod.getMetadata().getName();
kubernetesClient.pods().inNamespace(namespace).withName(podName)
.waitUntilCondition(pod -> {
var podPhase = pod.getStatus().getPhase();
//some logic
return terminalPodStatus.contains(podPhase.toLowerCase());
}, JOB_TIMEOUT, TimeUnit.MINUTES);
kubernetesClient.close();
}
private Job createJob(String flattenId, String authValue) {
return new JobBuilder()
.withApiVersion(API_VERSION)
.withNewMetadata().withName(jobName)
.withLabels(labels)
.endMetadata()
.withNewSpec()
.withTtlSecondsAfterFinished(300)
.withBackoffLimit(0)
.withNewTemplate()
.withNewMetadata().withAnnotations(LINKERD_INJECT_ANNOTATIONS)
.endMetadata()
.withNewSpec()
.withServiceAccount(Constants.TEST_SERVICEACCOUNT)
.addNewContainer()
.addAllToEnv(envVars)
.withImage(System.getenv(BUILD_JOB_IMAGE))
.withName(“”test)
.withCommand("/bin/bash", "-c", "java -jar test.jar")
.endContainer()
.withRestartPolicy(RESTART_POLICY_NEVER)
.endSpec()
.endTemplate()
.endSpec()
.build();
}
Pods are not instantly created as consequence of creating a Job: The Job controller needs to become active and create the Pods accordingly. Depending on the load on your control plane and number of Job instances you may need to wait more or less time.

Send flow from java to apache nifi processor

Good Morning everyone
So I have this java code that parses into swagger documentation file (a JSON file) and split its:
{ Swagger swagger = new SwaggerParser().read("C:/Users/admin/Desktop/testdownload.txt");
Map<String, Path> paths = swagger.getPaths();
for (Map.Entry<String, Path> p : paths.entrySet()) {
Path path = p.getValue();
Map<HttpMethod, Operation> operations = path.getOperationMap();
for (java.util.Map.Entry<HttpMethod, Operation> o : operations.entrySet()) {
System.out.println("===");
System.out.println("PATH:" + p.getKey());
System.out.println("Http method:" + o.getKey());
System.out.println("Summary:" + o.getValue().getSummary());
System.out.println("Parameters number: " + o.getValue().getParameters().size());
for (Parameter parameter : o.getValue().getParameters()) {
System.out.println(" - " + parameter.getName());
}
System.out.println("Responses:");
for (Map.Entry<String, Response> r : o.getValue().getResponses().entrySet()) {
System.out.println(" - " + r.getKey() + ": " + r.getValue().getDescription());
}
System.out.println("");
}
}
}
And here is the input:
and the output is :
What I want to ask is: is it possible to send this output one path by one to apache Nifi ??
is there is any solution that Nifi extracts those outputs and put each one of them in a dependent processor??
You could start a HTTP lister service in NiFi. Use the HandleHttpRequest
Some time ago I did something like this. And was sending data from my Java application to this HandleHttpRequest. This Processor is designed to be used in conjunction with the HandleHttpResponse Processor in order to create a Web Service
You just have to post your data to this webservice and the webservice can consume it and you would already have your data in NiFi. From then on, you are manipulate and control you data as you please.
You can also look into ListenHTTP

NoSuchMethodError while using Distcp java API

I am trying to use Distcp Java API to do copy data from one hadoop cluster to another cluster.
However I am getting the following exception:
Exception in thread "main" java.lang.NoSuchMethodError: org.apache.hadoop.util.StringUtils.toLowerCase(Ljava/lang/String;)Ljava/lang/String;
at org.apache.hadoop.tools.util.DistCpUtils.getStrategy(DistCpUtils.java:126)
at org.apache.hadoop.tools.DistCp.createJob(DistCp.java:235)
at org.apache.hadoop.tools.DistCp.createAndSubmitJob(DistCp.java:174)
at org.apache.hadoop.tools.DistCp.execute(DistCp.java:153)
at com.monitor.BackupUtil.doBackup(BackupUtil.java:72)
at com.monitor.BackupUtil.main(BackupUtil.java:45)
I am using the following code:
public void doBackup() throws Exception {
System.out.println("Beginning Distcp");
DistCpOptions options = new DistCpOptions(
new Path(prop.getProperty("sourceClusterDirectory") + "/" + prop.getProperty("tablename")
+ "/distcp.txt"),
new Path(prop.getProperty("targetCluster") + prop.getProperty("targetClusterDirectory")));
System.out.println("Disctp between--->" + prop.getProperty("sourceClusterDirectory")+ "/distcp.txt" + "AND" + prop.getProperty("targetCluster")
+ prop.getProperty("targetClusterDirectory"));
DistCp distcp = new DistCp(new Configuration(), options);
Job job = distcp.execute();
job.waitForCompletion(true);
System.out.println("DistCp Completed Successfully");
}
I am using hadoop 2.7.1 and the distcp dependency is this:
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-distcp</artifactId>
<version>2.7.1</version>
</dependency>

EMR cluster bootstrap failure (timeout) occurs most of the times I initialize a cluster

I'm writing an app that is consisted of 4 chained MapReduce jobs, which runs on Amazon EMR. I'm using the JobFlow interface to chain the jobs. Each job is contained in its own class, and has its own main method. All of these are packed into a .jar which is saved in S3, and the cluster is initialized from a small local app on my laptop, which configures the JobFlowRequest and submits it to EMR.
For most of the attempts I make to start the cluster, it fails with the error message Terminated with errors On the master instance (i-<cluster number>), bootstrap action 1 timed out executing. I looked up info on this issue, and all I could find is that if the combined bootstrap time of the cluster exceeds 45 minutes, then this exception is thrown. However, This only occurs ~15 minutes after the request is submitted to EMR, with disregard to the requested cluster size, be it of 4 EC2 instances, 10 or even 20. This makes no sense to me at all, what am I missing?
Some tech specs:
-The project is compiled with Java 1.7.79
-The requested EMR image is 4.6.0, which uses Hadoop 2.7.2
-I'm using the AWS SDK for Java v. 1.10.64
This is my local main method, which sets up and submits the JobFlowRequest:
import com.amazonaws.AmazonClientException;
import com.amazonaws.auth.AWSCredentials;
import com.amazonaws.auth.profile.ProfileCredentialsProvider;
import com.amazonaws.services.ec2.model.InstanceType;
import com.amazonaws.services.elasticmapreduce.AmazonElasticMapReduce;
import com.amazonaws.services.elasticmapreduce.AmazonElasticMapReduceClient;
import com.amazonaws.services.elasticmapreduce.model.*;
public class ExtractRelatedPairs {
public static void main(String[] args) throws Exception {
if (args.length != 1) {
System.err.println("Usage: ExtractRelatedPairs: <k>");
System.exit(1);
}
int outputSize = Integer.parseInt(args[0]);
if (outputSize < 0) {
System.err.println("k should be positive");
System.exit(1);
}
AWSCredentials credentials = null;
try {
credentials = new ProfileCredentialsProvider().getCredentials();
} catch (Exception e) {
throw new AmazonClientException(
"Cannot load the credentials from the credential profiles file. " +
"Please make sure that your credentials file is at the correct " +
"location (~/.aws/credentials), and is in valid format.",
e);
}
AmazonElasticMapReduce mapReduce = new AmazonElasticMapReduceClient(credentials);
HadoopJarStepConfig jarStep1 = new HadoopJarStepConfig()
.withJar("s3n://dsps162assignment2benasaf/jars/ExtractRelatedPairs.jar")
.withMainClass("Phase1")
.withArgs("s3://datasets.elasticmapreduce/ngrams/books/20090715/eng-gb-all/5gram/data/", "hdfs:///output1/");
StepConfig step1Config = new StepConfig()
.withName("Phase 1")
.withHadoopJarStep(jarStep1)
.withActionOnFailure("TERMINATE_JOB_FLOW");
HadoopJarStepConfig jarStep2 = new HadoopJarStepConfig()
.withJar("s3n://dsps162assignment2benasaf/jars/ExtractRelatedPairs.jar")
.withMainClass("Phase2")
.withArgs("shdfs:///output1/", "hdfs:///output2/");
StepConfig step2Config = new StepConfig()
.withName("Phase 2")
.withHadoopJarStep(jarStep2)
.withActionOnFailure("TERMINATE_JOB_FLOW");
HadoopJarStepConfig jarStep3 = new HadoopJarStepConfig()
.withJar("s3n://dsps162assignment2benasaf/jars/ExtractRelatedPairs.jar")
.withMainClass("Phase3")
.withArgs("hdfs:///output2/", "hdfs:///output3/", args[0]);
StepConfig step3Config = new StepConfig()
.withName("Phase 3")
.withHadoopJarStep(jarStep3)
.withActionOnFailure("TERMINATE_JOB_FLOW");
HadoopJarStepConfig jarStep4 = new HadoopJarStepConfig()
.withJar("s3n://dsps162assignment2benasaf/jars/ExtractRelatedPairs.jar")
.withMainClass("Phase4")
.withArgs("hdfs:///output3/", "s3n://dsps162assignment2benasaf/output4");
StepConfig step4Config = new StepConfig()
.withName("Phase 4")
.withHadoopJarStep(jarStep4)
.withActionOnFailure("TERMINATE_JOB_FLOW");
JobFlowInstancesConfig instances = new JobFlowInstancesConfig()
.withInstanceCount(10)
.withMasterInstanceType(InstanceType.M1Small.toString())
.withSlaveInstanceType(InstanceType.M1Small.toString())
.withHadoopVersion("2.7.2")
.withEc2KeyName("AWS")
.withKeepJobFlowAliveWhenNoSteps(false)
.withPlacement(new PlacementType("us-east-1a"));
RunJobFlowRequest runFlowRequest = new RunJobFlowRequest()
.withName("extract-related-word-pairs")
.withInstances(instances)
.withSteps(step1Config, step2Config, step3Config, step4Config)
.withJobFlowRole("EMR_EC2_DefaultRole")
.withServiceRole("EMR_DefaultRole")
.withReleaseLabel("emr-4.6.0")
.withLogUri("s3n://dsps162assignment2benasaf/logs/");
System.out.println("Submitting the JobFlow Request to Amazon EMR and running it...");
RunJobFlowResult runJobFlowResult = mapReduce.runJobFlow(runFlowRequest);
String jobFlowId = runJobFlowResult.getJobFlowId();
System.out.println("Ran job flow with id: " + jobFlowId);
}
}
A while back, I encountered a similar issue, where even a Vanilla EMR cluster of 4.6.0 was failing to get past the startup, and thus it was throwing a timeout error on the bootstrap step.
I ended up just creating a cluster on a different/new VPC in a different region and it worked fine, and thus it led me to believe there may be a problem with either the original VPC itself or the software in 4.6.0.
Also, regarding the VPC, it was specifically having an issue setting and resolving DNS names for the newly created cluster nodes, even though older versions of EMR were not having this problem

where is my EMR cluster

I am trying to create an EMR cluster on java, but i can't neither find it on the EMR cluster list, neither can see the instances requested on EC2.
EMR roles do exist:
sqlInjection#VirtualBox:~$ aws iam list-roles | grep EMR
"RoleName": "EMR_DefaultRole",
"Arn": "arn:aws:iam::removed:role/EMR_DefaultRole"
"RoleName": "EMR_EC2_DefaultRole",
"Arn": "arn:aws:iam::removed:role/EMR_EC2_DefaultRole"
and now my java code:
AWSCredentials awsCredentials = new BasicAWSCredentials(awsKey, awsKeySecret);
AmazonElasticMapReduce emr = new AmazonElasticMapReduceClient(awsCredentials);
StepFactory stepFactory = new StepFactory();
StepConfig enabledebugging = new StepConfig()
.withName("Enable debugging")
.withActionOnFailure("TERMINATE_JOB_FLOW")
.withHadoopJarStep(stepFactory.newEnableDebuggingStep());
HadoopJarStepConfig hadoopConfig1 = new HadoopJarStepConfig()
.withJar("s3://foo.bucket/hadoop_jar/2015-01-12/foo.jar")
.withMainClass("com.strackoverflow.DriverFoo") // optional main class, this can be omitted if jar above has a manifest
.withArgs("--input=s3://foo.bucket/logs/,s3://foo.bucket/morelogs/", "--output=s3://foo.bucket/myEMROutput" , "--inputType=text"); // i have custom java code to handle the --input, --output and --inputType parameters
StepConfig customStep = new StepConfig("Step1", hadoopConfig1);
Collection <StepConfig> steps = new ArrayList<StepConfig>();
{
steps.add(enabledebugging);
steps.add(customStep);
}
JobFlowInstancesConfig instancesConfig = new JobFlowInstancesConfig()
.withEc2KeyName("fookey") //not fookey.pem
.withInstanceCount(2)
.withKeepJobFlowAliveWhenNoSteps(false) // on aws example is set to true
.withMasterInstanceType("m1.medium")
.withSlaveInstanceType("m1.medium");
RunJobFlowRequest request = new RunJobFlowRequest()
.withName("java programatic request")
.withAmiVersion("3.3.1")
.withSteps(steps) // on the amazon example is lunched debug and hive, here is debug and a jar
.withLogUri("s3://devel.rui/emr_clusters/pr01/")
.withInstances(instancesConfig)
.withVisibleToAllUsers(true);
RunJobFlowResult result = emr.runJobFlow(request);
System.out.println("toString "+ result.toString());
System.out.println("getJobFlowId "+ result.getJobFlowId());
System.out.println("hashCode "+ result.hashCode());
Where is my cluster? I cannot see it on cluster list, output folder is not created, logs folder stays empty and no instances are visible on EC2.
by the program outputs this
toString {JobFlowId: j-2xxxxxxU}
getJobFlowId j-2xxxxxU
hashCode -1xxxxx4
I had follow the instruction from here to create the cluster
http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/calling-emr-with-java-sdk.html
And this to create the java job
http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-common-programming-sample.html
On the Amazon example, the region is not configured.
After configuring the region the cluster is launched properly.
AmazonElasticMapReduce emr = new AmazonElasticMapReduceClient(awsCredentials);
emr.setRegion(Region.getRegion(Regions.EU_WEST_1));

Categories