Check status of AWS job - java

When I upload a file to s3 bucket a event is triggered and a AWS batch job is started. Is there any way to check the status of AWS batch job in my java code. I have to perform some operation when the status of AWS batch job is SUCCEEDED.

You have the choice of using the ListJobs / DescribeJobs APIs to poll for status.
ListJobsResult listJobs(ListJobsRequest listJobsRequest) Returns a
list of AWS Batch jobs.
You must specify only one of the following items:
A job queue ID to return a list of jobs in that job queue
A multi-node parallel job ID to return a list of that job's nodes
An array job ID to return a list of that job's children
You can filter the results by job status with the jobStatus parameter.
If you don't specify a status, only RUNNING jobs are returned.
Or you can listen for the CloudWatch Events which are emitted as jobs transition from one state to another if you prefer an event-driven architecture.
ListJobsRequest

For solving this problem, I have created separate thread callable thread where looped until status of Job is SUCCEDED and FAILED. Extracted the job status based on job id using describe job API.
class ReturnJobStatus implements Callable<String>
{
public String Callable()
{
while(!(jobStatus.equals("SUCCEEDED") || (jobStatus.equals("FAILED")))
{
// extracts job status using describeJob API after passing jobId
Thread.currentThread().sleep(2000);
}
return jobStatus;
}

Related

How to use updateMetadata(request) in Cognos Analytics in parallel threads?

I am trying to update an existing code to send mdprovider requests to the metadata service to update or publish the metadata in an unpublished model using parallel threads. My model is having 1000 query subjects and initially we are validating it sequentially. It looks almost 4 hrs to complete. Now what I am trying to do is run in 3 parallel threads and my aim to bring down the time.
I have used ExecuterService and created a fixed thread pool of 3 and submitted the task.
ExecutorService exec = Executors.newFixedThreadPool(thread);
exe.submit(task)
and inside the run method I connected to cognos, logon to cognos and calls the updateMetadata()
MetadataService_PortType mdService;
public void run() {
cognosConnect();
if (namespace.length() > 0) {
login(namespace, user name, password);
}
//xml = Will build the xml here
//Calls the method
boolean testdblResult = validateQS(xml);
Boolean validateQS(String actionXml){
//actionXML : transaction XML to test a query subject
//Cognos SDK method
result = mdService.updateMetadata(actionXml);
}
}
This is executing successfully. But the problem is, though 3 threads send request to Cognos SDK method mdService.updateMetadata() in parallel, the response is given back from the method is sequentially. for example lets say in 10th sec it send request for 3 Query subject validation in parallel, But the response of that 3 query subject is given in 15th second, 20th sec, 24th sec sequentially.
Is this the expected behaviour of Cognos? Does mdService.updateMetadata(xmlActionXml); internally execute it sequentially? or is there any other way to achieve parallelism here. I couldn't found any much information in SDK documentation.

Locking Mechanism if pod crashes while processing mongodb record

We have a java/spring application which runs on EKS pods and we have records stored in MongoDB collection.
STATUS: READY,STARTED,COMPLETED
Application needs to pick the records which are in READY status and update the status to STARTED. Once the processing of the record is completed, the status will be updated to COMPLETED
Once the record is STARTED, it may take few hours to complete, until then other pods(other instance of the same app) should not pick this record. If some exception occurs, the app changes the status to READY so that other pods(or the same pod) can pick the READY record for processing.
Requirement: If the pod crashes when the record is processing(STARTED) but crashes before changing the status to READY/COMPLETED, the other pod should be able to pick this record and start processing again.
We have some solution in mind but trying to find the best solution. Request you to help me with some best approaches.
You can use a shutdown hook from spring:
#Component
public class Bean1 {
#PreDestroy
public void destroy() {
## handle database change
System.out.println(Status changed to ready);
}
}
Beyond that, that kind of job could run better in a messaging architecture, using SQS for example. Instead of using the status on the database to handle and orchestrate the task, you can use an SQS, publish the message that needs to be consumed (the messages that were in ready state) and have a poll of workers consuming messages from this SQS, if something crashes or the pod of this workers needs to be reclaimed, the message goes back to SQS and can be consumed by another pod.

Dataproc Job Submit Via API

I have streaming job running which will run forever and will execute the query on Kafka topic, I am going through DataProc Documentation for submitting a job via Java, here is the link
// Submit an asynchronous request to execute the job.
OperationFuture<Job, JobMetadata> submitJobAsOperationAsyncRequest =
jobControllerClient.submitJobAsOperationAsync(projectId, region, job);
Job response = submitJobAsOperationAsyncRequest.get();
For the above line of code I am not able to get the response , the above code keeps on running ? Is it because it's streaming job and it's running forever ?
How I can get a response ? So to end user I can provide some job information like URL where they can see their Jobs or any monitoring dashaborad
The OperationFuture<Job, JobMetadata> class has a getMetadata() method which returns a com.google.api.core.ApiFuture<JobMetadata>. You can get the job metadata before the job finishes by calling jobMetadataApiFuture.get().
See more details about the OperationFuture.getMetadata() method and the ApiFuture class.

Stop submitting jobs when there is no data

I use spark streaming to get data from a queue in mq via a custom receiver.
Javastreaming context duration is 10 seconds.
And there is one task defined for the input from queue.
In event time line in spark UI, I see a job getting submitted in each 10s interval even when there is no data from the receiver.
Is it the normal behavior or how to stop jobs getting submitted when there is no data.
JavaDStream<String> customReceiverStream = ssc.receiverStream(newJavaCustomReceiver(host, port));
JavaDStream<String> words =lines.flatMap(new FlatMapFunction<String, String>() { ... });
words.print();
ssc.start();
ssc.awaitTermination();
As a work around
You can use livy to submit the spark jobs(use java codes instead of cli commands).
Livy job would be constantly checking a database that would have an indicator whether the data is flowing in or not.As soon as the data flow stops,change the indicator in the database and this would result in spark job being killed by the livy.(Use Livy sessions)

Google Cloud Platform blocking BatchRequest request - Java

Is it possible to wait until a BatchJob (BatchRequest objecT) in GCP is completed?
I.g. you can do it with a normal Job:
final Job job = createJob(jobId, projectId, datasetId, tableId, destinationBucket);
service.jobs().insert(projectId, job).execute();
final Get request = service.jobs().get(projectId, jobId);
JobStatus response;
while (true) {
Thread.sleep(500); // improve this sleep policy
response = request.execute().getStatus();
if (response.getState().equals("DONE") || response.getState().equals("FAILED"))
break;
}
Something like the above code works fine. The problem with batchRequest is that the jobRequest.execute() method does not return a Response object.
When you execute it, the batch request returns after it has initialised all the jobs specified in its queue but it does not wait until all of them are really finished. Indeed your execute() method returns but you can have failing jobs later on (i.g. error due to quota issue, schema issues etc.) and I can't notify the client on time with the right information.
You can just check the status of all the created jobs in the web UI with the job history button from the BigQuery view, you can't return error message to a client.
Any idea with that?

Categories