Dataproc Job Submit Via API - java

I have streaming job running which will run forever and will execute the query on Kafka topic, I am going through DataProc Documentation for submitting a job via Java, here is the link
// Submit an asynchronous request to execute the job.
OperationFuture<Job, JobMetadata> submitJobAsOperationAsyncRequest =
jobControllerClient.submitJobAsOperationAsync(projectId, region, job);
Job response = submitJobAsOperationAsyncRequest.get();
For the above line of code I am not able to get the response , the above code keeps on running ? Is it because it's streaming job and it's running forever ?
How I can get a response ? So to end user I can provide some job information like URL where they can see their Jobs or any monitoring dashaborad

The OperationFuture<Job, JobMetadata> class has a getMetadata() method which returns a com.google.api.core.ApiFuture<JobMetadata>. You can get the job metadata before the job finishes by calling jobMetadataApiFuture.get().
See more details about the OperationFuture.getMetadata() method and the ApiFuture class.

Related

Check status of AWS job

When I upload a file to s3 bucket a event is triggered and a AWS batch job is started. Is there any way to check the status of AWS batch job in my java code. I have to perform some operation when the status of AWS batch job is SUCCEEDED.
You have the choice of using the ListJobs / DescribeJobs APIs to poll for status.
ListJobsResult listJobs(ListJobsRequest listJobsRequest) Returns a
list of AWS Batch jobs.
You must specify only one of the following items:
A job queue ID to return a list of jobs in that job queue
A multi-node parallel job ID to return a list of that job's nodes
An array job ID to return a list of that job's children
You can filter the results by job status with the jobStatus parameter.
If you don't specify a status, only RUNNING jobs are returned.
Or you can listen for the CloudWatch Events which are emitted as jobs transition from one state to another if you prefer an event-driven architecture.
ListJobsRequest
For solving this problem, I have created separate thread callable thread where looped until status of Job is SUCCEDED and FAILED. Extracted the job status based on job id using describe job API.
class ReturnJobStatus implements Callable<String>
{
public String Callable()
{
while(!(jobStatus.equals("SUCCEEDED") || (jobStatus.equals("FAILED")))
{
// extracts job status using describeJob API after passing jobId
Thread.currentThread().sleep(2000);
}
return jobStatus;
}

Spring batch execute last step even get an exception

I want to run a spring batch job which has set of steps and finally I want to send a notification to redis containing the status of the Job execution. Let's say if all the steps are executed, I should send "Pass". If there was any execution or any error, I want to pass "Fail". So my last step will be notification to redis updating the status regardless of it finished fine or got an exception.
My question is:
Can I achieve this in Spring Batch?
Can I use notification
function as a last step or should I use any specific method for
this?
How can I get the status of jobs?
I know I can get the job status like :
JobExecution execution = jobLauncher.run(job, params);
System.out.println("Exit Status : " + execution.getStatus());
But I call the job in command-line like java -jar app.jar ----spring.batch.job.names=myjobnamehere so that I do not use a JobExecution object.
You can use a JobExecutionListener for that. In the afterJob method, you have a reference to the JobExecution from which you can get the status of the job and send the notification as required.
You can find an example in the getting started guide (See JobCompletionNotificationListener).

Stop submitting jobs when there is no data

I use spark streaming to get data from a queue in mq via a custom receiver.
Javastreaming context duration is 10 seconds.
And there is one task defined for the input from queue.
In event time line in spark UI, I see a job getting submitted in each 10s interval even when there is no data from the receiver.
Is it the normal behavior or how to stop jobs getting submitted when there is no data.
JavaDStream<String> customReceiverStream = ssc.receiverStream(newJavaCustomReceiver(host, port));
JavaDStream<String> words =lines.flatMap(new FlatMapFunction<String, String>() { ... });
words.print();
ssc.start();
ssc.awaitTermination();
As a work around
You can use livy to submit the spark jobs(use java codes instead of cli commands).
Livy job would be constantly checking a database that would have an indicator whether the data is flowing in or not.As soon as the data flow stops,change the indicator in the database and this would result in spark job being killed by the livy.(Use Livy sessions)

Stop specific running Kettle Job in java

How would it be possible to stop a specific running job in Kettle?
I'm using the following code:
KettleEnvironment.init();
JobMeta jobmeta = new JobMeta(C://Users//Admin//DBTOOL//EDW_Testing_Tool - 1.8(VersionUpgraded)//data-integration//Regress_bug//Start_Validation.kjb,
null);
Job job = new Job(null, jobmeta);
job.initializeVariablesFrom(null);
job.setVariable("Internal.Job.Filename.Directory", Constants.JOB_EXECUTION_KJB_FILE_PATH);
job.setVariable("jobId", jobId.toString());
job.getJobMeta().setInternalKettleVariables(job);
job.stopAll();
How would I ensure that the job which I want to stop is getting stopped and it is not executed after setting the flag?
I'm using rest api to stop the job and i'm not able to get the job Object.
if i'm using CarteSingleton and store the object in map i'm not able to execute the job it gives driver error could not connect to database(eg:-jtds) url not working.

Google Cloud Platform blocking BatchRequest request - Java

Is it possible to wait until a BatchJob (BatchRequest objecT) in GCP is completed?
I.g. you can do it with a normal Job:
final Job job = createJob(jobId, projectId, datasetId, tableId, destinationBucket);
service.jobs().insert(projectId, job).execute();
final Get request = service.jobs().get(projectId, jobId);
JobStatus response;
while (true) {
Thread.sleep(500); // improve this sleep policy
response = request.execute().getStatus();
if (response.getState().equals("DONE") || response.getState().equals("FAILED"))
break;
}
Something like the above code works fine. The problem with batchRequest is that the jobRequest.execute() method does not return a Response object.
When you execute it, the batch request returns after it has initialised all the jobs specified in its queue but it does not wait until all of them are really finished. Indeed your execute() method returns but you can have failing jobs later on (i.g. error due to quota issue, schema issues etc.) and I can't notify the client on time with the right information.
You can just check the status of all the created jobs in the web UI with the job history button from the BigQuery view, you can't return error message to a client.
Any idea with that?

Categories