Google Cloud Platform blocking BatchRequest request - Java - java

Is it possible to wait until a BatchJob (BatchRequest objecT) in GCP is completed?
I.g. you can do it with a normal Job:
final Job job = createJob(jobId, projectId, datasetId, tableId, destinationBucket);
service.jobs().insert(projectId, job).execute();
final Get request = service.jobs().get(projectId, jobId);
JobStatus response;
while (true) {
Thread.sleep(500); // improve this sleep policy
response = request.execute().getStatus();
if (response.getState().equals("DONE") || response.getState().equals("FAILED"))
break;
}
Something like the above code works fine. The problem with batchRequest is that the jobRequest.execute() method does not return a Response object.
When you execute it, the batch request returns after it has initialised all the jobs specified in its queue but it does not wait until all of them are really finished. Indeed your execute() method returns but you can have failing jobs later on (i.g. error due to quota issue, schema issues etc.) and I can't notify the client on time with the right information.
You can just check the status of all the created jobs in the web UI with the job history button from the BigQuery view, you can't return error message to a client.
Any idea with that?

Related

Dataproc Job Submit Via API

I have streaming job running which will run forever and will execute the query on Kafka topic, I am going through DataProc Documentation for submitting a job via Java, here is the link
// Submit an asynchronous request to execute the job.
OperationFuture<Job, JobMetadata> submitJobAsOperationAsyncRequest =
jobControllerClient.submitJobAsOperationAsync(projectId, region, job);
Job response = submitJobAsOperationAsyncRequest.get();
For the above line of code I am not able to get the response , the above code keeps on running ? Is it because it's streaming job and it's running forever ?
How I can get a response ? So to end user I can provide some job information like URL where they can see their Jobs or any monitoring dashaborad
The OperationFuture<Job, JobMetadata> class has a getMetadata() method which returns a com.google.api.core.ApiFuture<JobMetadata>. You can get the job metadata before the job finishes by calling jobMetadataApiFuture.get().
See more details about the OperationFuture.getMetadata() method and the ApiFuture class.

Vertx - wait till data available in redis

I am new to Vertx and was exploring request-reply using event bus.
I want to implement below flow
User requests for a data
controller sends a message on event bus to a redis-processor verticle
redis-processor will wait for n seconds till value is available in redis (there will be a background process which will keep on refreshing cache, hence the wait)
redis-processor will send reply back to controller
controller responds to user
In short I want to do something like this:
Now I want to implement this in Vertx since vertx can run asynchronously. Using event bus I can isolate controller from processor. So controller can accept multiple user request and stay responsive under load.
(I hope I am right with this!)
I have implemented this in very crude fashion in java-vertx. Stuck in below part.
//receive request from controller
vertx.eventBus().consumer(REQUEST_PROCESSOR, evtHandler -> {
String txnId = evtHandler.body().toString();
LOGGER.info("Received message:: {}", txnId);
this.redisAPI.get(txnId, result -> { // <=====
String value = result.result().toString();
LOGGER.info("Value in redis : {}", value);
evtHandler.reply(value); // reply to controller
});
});
pls see line denoted by arrow. How can I wait for x seconds without blocking event loop?
Please help.
Thats actually very simple, you need a timer. Please see docs for details but you will need more or less something like this:
vertx.setTimer(1000, id -> {
this.redisAPI.get(txnId, result -> {
String value = result.result().toString();
LOGGER.info("Value in redis : {}", value);
evtHandler.reply(value); // reply to controller
});
});
You might want to store the timer IDs somewhere so that you can cancel them or that at least you know something is running when a shutdown request comes in for your verticle to delay it. But this all depends on your needs.
As #mohamnag said, you could use a Vertx timer
here is another example on how to user timer.
Note that the timer value is in ms.
As an improvement to the, I will recommend checking that the callback has succeeded before attempting to get the value from redisAPI. This is done using the succeeded() method.
In an asynchronous environment getting that result could fail due to several issues (network errors etc)
vertx.setTimer(n * 1000, id -> {
this.redisAPI.get(txnId, result -> {
if(result.succeeded()){ // the callback succeeded to get a value from redis
String value = result.result().toString();
LOGGER.info("Value in redis : {}", value);
evtHandler.reply(value); // reply to controller
} else {
LOGGER.error("Value could not be gotten from redis : {}", result.cause());
evtHandler.fail(someIntegerCode, result.cause()); // reply with failure related info
}
});
});

Check status of AWS job

When I upload a file to s3 bucket a event is triggered and a AWS batch job is started. Is there any way to check the status of AWS batch job in my java code. I have to perform some operation when the status of AWS batch job is SUCCEEDED.
You have the choice of using the ListJobs / DescribeJobs APIs to poll for status.
ListJobsResult listJobs(ListJobsRequest listJobsRequest) Returns a
list of AWS Batch jobs.
You must specify only one of the following items:
A job queue ID to return a list of jobs in that job queue
A multi-node parallel job ID to return a list of that job's nodes
An array job ID to return a list of that job's children
You can filter the results by job status with the jobStatus parameter.
If you don't specify a status, only RUNNING jobs are returned.
Or you can listen for the CloudWatch Events which are emitted as jobs transition from one state to another if you prefer an event-driven architecture.
ListJobsRequest
For solving this problem, I have created separate thread callable thread where looped until status of Job is SUCCEDED and FAILED. Extracted the job status based on job id using describe job API.
class ReturnJobStatus implements Callable<String>
{
public String Callable()
{
while(!(jobStatus.equals("SUCCEEDED") || (jobStatus.equals("FAILED")))
{
// extracts job status using describeJob API after passing jobId
Thread.currentThread().sleep(2000);
}
return jobStatus;
}

Spring and background thread execution

I have a Spring Boot 1.3.5 web application (Running on Tomcat 8), one of its features is to contact a third-party API through REST and launch many lenghty jobs (From 1 to around maybe 30 depending on the user input, each one with its own REST call in a for loop). I have all this logic in a controller called using a POST with some parameters.
What I need is to launch a background task after each job has been acknowledged by the API, which would be passed some parameter (Job ID) and periodically (~30 s) poll another API to fetch the job output (Again, these jobs may take from several seconds up to an hour, and getting its job takes about 3-4 seconds plus parsing a long string) and do some business logic based on their status (Updating a DB record for now)
However I'm not sure which, if any, TaskExecutor to use, or whether I should use Java's Future structures for this. I might benefit from a Thread pool which will only run X threads parallel and queue others to not overload the server. Is there an example I can take to learn and start off?
Sample of my existing code:
#RequestMapping(value={"/job/launch"}, method={RequestMethod.POST})
public ResponseEntity<String> runJob(HttpServletRequest req) {
for (int deployments=1; deployments <= deployments_required; deployments++) {
httpPost.setEntity((HttpEntity)new StringEntity(jsonInput));
CloseableHttpResponse response = httpclient.execute(httpPost);
HttpEntity entity = response.getEntity();
responseString = EntityUtils.toString(entity, "UTF-8");
JsonObject jsonObject = new JsonParser().parse(responseString).getAsJsonObject();
if (response.getStatusLine().getStatusCode() != 200) {
resultsNotOk.add(new ResponseEntity<String>(jsonObject.get("message").getAsString(), HttpStatus.INTERNAL_SERVER_ERROR));
continue;
}
String deploymentId;
deploymentId = jsonObject.get("id").getAsString();
// Start background task to keep checking the job every few seconds and find created instance IP addresses
start_checking_execution(deploymentId);
}
}
(Yes, this code may be better put in a Service but it was originally built as is so I haven't moved it yet. It may be a good time to do it now)
I would say it's work for Spring Batch
You can define Reader/Processor (to convert source read to target write objects)/Writer to work with the the logic
You can use JobOperator to get job state. See job status transitions

Akka and Ask Pattern. When Actor is abruptly stopped can i return Future?

I currently have code which dispatches a request using the Ask Pattern. The dispatched request will generate an Akka Actor which sends a HTTP request and then returns the response. I'm using Akka's circuit breaker API to manage issues with the upstream web services i call.
If the circuitbreaker is in an open state then all subsequent requests are failing fast which is the desired effect. However when the actor fails fast it just throws a CircuitBreakerOpenException, stops the actor however control does not return to the code which made the initial request until an AskTimeoutException is generated.
This is the code which dispatches the request
Timeout timeout = new Timeout(Duration.create(10, SECONDS));
Future<Object> future = Patterns.ask(myActor, argMessage, timeout);
Response res = (Response ) Await.result(future, timeout.duration());
This is the circuitbreaker
getSender().tell(breaker.callWithSyncCircuitBreaker(new Callable<Obj>()
{
#Override
public Obj call() throws Exception {
return fetch(message);
}
}), getSelf()
);
getContext().stop(getSelf());
When this block of code is executed and if the circuit is open it fails fast throwing an exception however i want to return control back to the code which handles the future without having to wait for a timeout.
Is this possible?
When an actor fails out and is restarted, if it was processing a message, no response will be automatically sent to that sender. If you want to send that sender a message on that particular failure then catch that exception explicitly and respond back to that sender with a failed result, making sure to capture the sender first before you go into any future callbacks to avoid closing over this mutable state. You could also try to do this in the preRestart, but that's not very safe as by that time the sender might have changed if you are using futures inside the actor.

Categories