How to use CompletableFuture with AWS Glue job status? - java

I have a requirement where I need to get the status of AWS Glue crawler, which is an async request, and based on when the jobs get completed, I would fire certain events. The catch here is that I do not want to use polling. On looking further, AWS docs suggests to use CompletableFuture object to deal with async request in AWS. But when I try to use, I am not able to form CompletableFuture object as it gives me Type mismatch. I have this code :
GetCrawlerMetricsRequest metricsRequest =
new GetCrawlerMetricsRequest().withCrawlerNameList(Arrays.asList("myJavaCrawler"));
GetCrawlerMetricsResult jsonOb = awsglueClient.getCrawlerMetrics(metricsRequest);
CompletableFuture<GetCrawlerMetricsResult> futureResponse = CompletableFuture<GetCrawlerMetricsResult>awsglueClient.getCrawlerMetricsAsync(metricsRequest);
But futureResponse object shows error stating FutureTask cannot be casted to CompletableFuture.
I am following the approach given here
I am not sure how can I make this working. Based on this futureResponse object, I can then use .whenApply function to trigger the certain job which I want to execute such as pushing the above response into a Kafka Queue. Any ideas?

It seems like you are using AWS SDK v1 when the doc you mentioned shows how to do it using v2 (which has 'Developer Preview' status, so it's not recommended for production). Here is a doc showing how to make async calls in v1
For your use case I would recommend another approach. Glue posts few types of events and one of them is "Crawler Succeeded". So you can create a CloudWatch rule to catch these events and trigger a lambda which will make a call to start appropriate job

Related

Lettuce StatefulRedisConnection async command execution order

I'm confused a bit about order of Redis command execution when using a Lettuce driver.
Examples use code like
private val cacheConnection: StatefulRedisConnection<String, String>
// (...)
cacheConnection.async().getset(keyStr, json)
cacheConnection.async().expire(keyStr, expireAfterWrite)
https://github.com/lettuce-io/lettuce-core/issues/1627
https://www.baeldung.com/java-redis-lettuce
However, the documentation states
good example is the async API. Every invocation on the async API returns a Future (response handle) after the command is written to the netty pipeline. A write to the pipeline does not mean, the command is written to the underlying transport. Multiple commands can be written without awaiting the response. Invocations to the API (sync, async and starting with 4.0 also reactive API) can be performed by multiple threads.
(https://github.com/lettuce-io/lettuce-core/wiki/Pipelining-and-command-flushing)
This does not specify when the commands are put in the pipeline. Shouldn't I use thenAccept instead?
cacheConnection.async().getset(keyStr, json)
.thenAccept { expire(keyStr, expireAfterWrite) }
That would mean that all these examples are wrong which is... improbable?
Can you please explain how does it work? Is execution order preservation just a systematic coincidence (ie an implementation detail)?

How to continuously receive and parse the JSON from REST API in spring boot

There is a remote server keep bringing about the data in JSON format. Here is a REST API named
http://192.168.1.101:8000/v1/status,and if I want to collect the data continuously in Spring Boot.Here is a possible JSON from the REST API:
{
"run-status": 0,
"opr-mode": 0,
"ready": false,
"not-ready-reason": 1,
"alarms":["ps", "prm-switch"]
}
I want to keep collecting or just subscribe the REST API, if there is a JSON and then collect it.
There are two main approaches of achieving what you are looking for:
Polling - If this service already exists and you do not have control
over the code, then this might be your only option. You constantly
poll the given URL to check if data has been changed.
In spring, you can use #Scheduled annotation to execute and poll
at any given frequency (using cron expression or fixed delays).
https://www.baeldung.com/spring-scheduled-tasks - provides a detail
of how to create a scheduled tasks.
Webhooks - If you have control over your server code, you can use
webhooks to notify subscriber about availability of data. It is a
callback mechanism where caller will receive a notification about
data changes on the server, and subscriber can then call server to
fetch data immediately.
More about Polling and Webhooks can be found on this URL: https://dzone.com/articles/webhooks-vs-polling-youre-better-than-this-1
Make a "while" cycle what calls your function then goes to sleep (if needed) for the time you want.
Or just while (true) {}

How to invoke Azure Function on User Defined Schedules

I've an HTTP triggered azure function (Java) that performs a series of action. In tmy application UI, I've a button which triggers this function to initiate the task.
Everything works as expected.
Now I've to perform this operation in user defined schedules. That is from the UI, user can specify the interval (say every 3 Hrs) at which the function need to be executed. As this schedule will be custom and dynamic I cannot rely on the timer-triggered azure functions. Also the same function need to be executed at different intervals with different input parameters.
How can I dynamically create schedules and invoke the azure function on the scheduled time? Does Azure have an option to run the function with specific events something like (AWS cloud watch rule + lambda invocation)?
EDIT: Its different from the suggested question as it changes the schedule of an existing function.And I think configuring a new schedule will break the previously configured schedules for the function. I want to run the same function in different schedules as per the user configuration and should not break any of the previous schedules set for the function.
You can have a try to modify the function.json, change the cron expression in function.json. Please refer to the steps below:
Use Kudu API to change function.json https://github.com/projectkudu/kudu/wiki/REST-API
PUT https://{functionAppName}.scm.azurewebsites.net/api/vfs/{pathToFunction.json}, Headers: If-Match: "*", Body: new function.json content.
Then send request to apply changes.
POST https://{functionAppName}.scm.azurewebsites.net/api/functions/synctriggers
You can use a durable function for this, applying the Monitor Pattern (shamelessly copied from this MSDN documentation).
This orchestration function sets a dynamic timer trigger using context.CreateTimer.
Code is in C#, but hopefully there is something you can use here.
[FunctionName("MonitorJobStatus")]
public static async Task Run(
[OrchestrationTrigger] IDurableOrchestrationContext context)
{
int jobId = GetJobId();
int pollingInterval = context.GetInput<int>();
DateTime expiryTime = GetExpiryTime();
while (context.CurrentUtcDateTime < expiryTime)
{
var jobStatus = await context.CallActivityAsync<string>("GetJobStatus", jobId);
if (jobStatus == "Completed")
{
// Perform an action when a condition is met.
await context.CallActivityAsync("SendAlert", machineId);
break;
}
// Orchestration sleeps until this time.
var nextCheck = context.CurrentUtcDateTime.AddSeconds(pollingInterval);
await context.CreateTimer(nextCheck, CancellationToken.None);
}
// Perform more work here, or let the orchestration end.
}

AFNetworking 2.0 JSON queue

in AFNetworking 1.3 i enqueue some operations in this way:
Create many AFHTTPRequestOperation with success and failure block
Create enqueueBatchOfHTTPRequestOperations with all operations and when all are completed, call the completionBlock for the final step.
Unfortunally this dont works well with my webservice. I've read in internet and all say to try it with Afnetworking 2.0, but, how can i have the same behavior ?
Simply i want to add on queue some operations (and set for each one success block to do something), and call another method when ALL of them are completed.
How can i do it in AFNetworking 2.0 ? My response is in JSON format.

How do I make an async call to Hive in Java?

I would like to execute a Hive query on the server in an asynchronous manner. The Hive query will likely take a long time to complete, so I would prefer not to block on the call. I am currently using Thirft to make a blocking call (blocks on client.execute()), but I have not seen an example of how to make a non-blocking call. Here is the blocking code:
TSocket transport = new TSocket("hive.example.com", 10000);
transport.setTimeout(999999999);
TBinaryProtocol protocol = new TBinaryProtocol(transport);
Client client = new ThriftHive.Client(protocol);
transport.open();
client.execute(hql); // Omitted HQL
List<String> rows;
while ((rows = client.fetchN(1000)) != null) {
for (String row : rows) {
// Do stuff with row
}
}
transport.close();
The code above is missing try/catch blocks to keep it short.
Does anyone have any ideas how to do an async call? Can Hive/Thrift support it? Is there a better way?
Thanks!
AFAIK, at the time of writing Thrift does not generate asynchronous clients. The reason as explained in this link here (search text for "asynchronous") is that Thrift was designed for the data centre where latency is assumed to be low.
Unfortunately as you know the latency experienced between call and result is not always caused by the network, but by the logic being performed! We have this problem calling into the Cassandra database from a Java application server where we want to limit total threads.
Summary: for now all you can do is make sure you have sufficient resources to handle the required numbers of blocked concurrent threads and wait for a more efficient implementation.
It is now possible to make an asynchronous call in a Java thrift client after this patch was put in:
https://issues.apache.org/jira/browse/THRIFT-768
Generate the async java client using the new thrift and initialize your client as follows:
TNonblockingTransport transport = new TNonblockingSocket("127.0.0.1", 9160);
TAsyncClientManager clientManager = new TAsyncClientManager();
TProtocolFactory protocolFactory = new TBinaryProtocol.Factory();
Hive.AsyncClient client = new Hive.AsyncClient(protocolFactory, clientManager, transport);
Now you can execute methods on this client as you would on a synchronous interface. The only change is that all methods take an additional parameter of a callback.
I know nothing about Hive, but as a last resort, you can use Java's concurrency library:
Callable<SomeResult> c = new Callable<SomeResult>(){public SomeResult call(){
// your Hive code here
}};
Future<SomeResult> result = executorService.submit(c);
// when you need the result, this will block
result.get();
Or, if you do not need to wait for the result, use Runnable instead of Callable.
After talking to the Hive mailing list, Hive does not support async calls using Thirft.
I don't know about Hive in particular but any blocking call can be turned in an asynch call by spawning a new thread and using a callback. You could look at java.util.concurrent.FutureTask which has been designed to allow easy handling of such asynchronous operation.
We fire off asynchronous calls to AWS Elastic MapReduce. AWS MapReduce can run hadoop/hive jobs on Amazon's cloud with a call to the AWS MapReduce web services.
You can also monitor the status of your jobs and grab the results off S3 once the job is completed.
Since the calls to the web services are asynchronous in nature, we never block our other operations. We continue to monitor the status of our jobs in a separate thread and grab the results when the job is complete.

Categories