Java 8 concurrency in AWS lambda?

Java 8 concurrency in AWS lambda? - java

We have an AWS lambda function that needs to perform a few checks done by calling remote services. As long as one of them returning false, lambda can return; otherwise, all the checks need to be finished to make sure none returning false. Right now I am using a parallel stream to run the tasks, as they can go independently.
In a may-not-be-rare situation, the main thread returns while one of the tasks is still running with its thread, or thread blocked waiting for I/O, as short-circuiting has seen a false with another task. AWS lambda documentation says that all threads in Lambda will be frozen when main thread returns. And they will thaw once lambda is handling the next request. Will the busy/blocked thread keep working on the original task after getting re-activated, or it will take on the new task for current request?
Would really appreciate it if Lambda gurus can share some insights.

I hope I understood correctly. You want to perform parallel activities while waiting for them to finish.
I just read in StackOverflow a comment saying the following:
Streams is about data-parallelism; data parallel problems are CPU-bound, not IO-bound. It seems that you're simply looking to run a number of mostly unrelated IO-intensive tasks concurrently. Use a plain-old thread pool for that; your first example is an ideal candidate for ExecutorService.invokeAll()
Maybe ExecutorService can help.
I don't know how your code is being structured but I can propose something like this:
int processors = Runtime.getRuntime().availableProcessors();
ExecutorService executorService = Executors.newFixedThreadPool(processors);
List<Callable<Boolean>> services = getURLToCheck().parallelStream()
.map(this::checkService)
.collect(Collectors.toList());
try {
List<Future<Boolean>> futures = executorService.invokeAll(services);
// do your validation with the concurrent tasks.
} catch (InterruptedException e) {
// Handle as you wish
}
Where also:
private List<URL> getURLToCheck() {
// Fetch your URL from wherever :)
}
private Callable<Boolean> checkService(URL url){
// Logic to check the service
}
The Future class has to key methods that may be useful for you. The isDone() method and the .get().
The first one indicates whether the task finished or not, and the second one will wait for it to finish throwing all the exceptions that occurred inside but wrapped in ExecutionException. Maybe you can combine those methods to have the validation done. Having a quick think, I imagined a while loop where you ask if the future finished, and if so, have the validation result and with that, break that loop if false. But I don't like it haha.
Hope I made my self clear. And also I hope that can help. If not, i tried my best.

Related

Stop UI-method until Async task is finished

Sorry for me poor english.
i am messing about with a java class that needs to do UI-work. but the UI-work needs to wait for an async task. The asyncTask retrieves api soap from internet. Once api is retrieved it is set to global jsonResponseBody. Then UI-method then uses jsonResponseBody to do UI-stuff.
In my now code, I use while-loop to stop from moving on before jsonResponseBody is ready. Is while-loop best idea for me? I think maybe while-loop will slow down main-thread, no?
//Pre-async task stuff is run
connectDbAsync(db,sqlQuery); //This will set jsonResponseBody sooner or later
while(jsonResponseBody == null){
//Do nothing, just wait
}
//Post-async task stuff which uses jsonResponseBody

When performing asynchronous tasks in Java there are several ways to handle output. One way as you discovered, is to use a loop to block code execution until the task completes. I personally like to use a thread pool and Future objects to wait on my threads to complete. There are some advantages and disadvantages to both approaches, however in your case, your while loop should not slow your main thread because it only runs for a finite period of time and runs immediately after you start executing your asynchronous task.
That said, the benefit of an asynchronous task is that it can do its thing while your code is doing something else. If you MUST wait on the asynchronous task to complete before continuing on in your method, then the task could be done synchronously instead and you wouldn't need the loop to pause execution.
Example blocking code that waits on network poll of multiple "sites" before continuing execution. This shows the benefit of asynchronous tasks/multithreading when it comes to doing multiple things at one time:
//Invoke run method on each site simultaneously, store results in a list
List<Future<Site>> futures=threadmaker.invokeAll(active_sites.stream().map(TAG_SCANNER::new).collect(Collectors.toList()));
List<Site> alarm_sites = new ArrayList<>();
//Now fetch all the results serially
for(Future<Site> result: futures){
//SOUND THE ALARMS
alarm_sites.add(result.get());
}
// Continue synchronous method execution

You might have a look at java Future. You can use it to launch some code asynchronously but you get a handle to it and so you can check if it is finished (Future.isDone()) or block until it is finished: Future.get()

using parallelStream for independent tasks?

I have a list of tasks. Each task is independent of each other (they do not use results from each other).
When having 1000 tasks and using a sequential stream to process these tasks..
tasks.forEach(task->{
// long running task
task.run();
System.out.println("Thread: " + Thread.currentThread().getName());
});
..then, the second task is running AFTER the first task and so forth. The loop is running in blocking and sequential mode (second task is only done after first task is finished).
What is the best way to process each task in parallel?
Is this the best way?
tasks.parallelStream().forEach(task->{
// long running task
task.run();
System.out.println("Thread: " + Thread.currentThread().getName());
});
According to Should I always use a parallel stream when possible?, it should be avoided to use parallel streams. As in my case, these tasks are independent of each other, I do not need the synchronization overhead which comes by using parallelStream(). However, there is no option to disable the synchronization overhead when using parallelStream(). Or?
Is there a better way for my use case than parallelStream()?

In Java 8 parallelStream() use the ForkJoinCommonPool which is initialised at JVM startup and contains a fixed number of threads that is more suited to work that can follows the "divide and conquer" paradigm. In your case, since they are all isolated, the use of an ExecutorService may be more fitting.

A good solution for you can be to use CompletableFuture.allOf. Use it like this:
ExecutorService ex = //Whatever executor you want;
CompletableFuture.allOf((CompletableFuture<Void>[]) tasks.stream()
.map(task -> CompletableFuture.runAsync((() -> /* Do task */), ex))
.toArray());
In doing so, you can perform asynchronous, non-blocking. Also, you will get a compiler warning about type casting but I think in your case, it may be safe to ignore it.
ExecutorService.submit will fire off the task but when you use get to obtain any result, it's going to block and then retrieve. CompletableFuture doesn't block when getting the data. This is case when you want to see some kind of result returned after all the parallel tasks finish.
Some more explanation can be found here.
Also, in your original question, you asked if it is a good idea to use parallelStream and my answer to that would be that it isn't a good idea because if there is a task that blocks the thread then you will have problems (assuming you have used parallelStream all over the place in your code).
Also, CompletableFuture can accept it's own thread pool (which you can customize yourself) and run there. Notice the second argument to runAsync in the above code.
If you simply want to have a fire and forget mechanism and don't care about the result then using the ExecutorService.invokeAll is a good way to do thing. You can use it like this:
executorService.invokeAll(tasks.stream().map(task -> new Callable<Void>() {
#Override
public Void call() throws Exception {
// run task;
return null;
}
})
.collect(Collectors.toList()));
But why do you want to use a CompletableFuture with your own ExecutorService in such a case?
One good reason is the fluent error handling. You can see some examples here and here

Java Async is blocking?

After doing lots of searching on Java, I really am very confused over the following questions:
Why would I choose an asynchronous method over a multi-threaded method?
Java futures are supposed to be non-blocking. What does non-blocking mean? Why call it non-blocking when the method to extract information from a Future--i.e., get()--is blocking and will simply halt the entire thread till the method is done processing? Perhaps a callback method that rings the church bell of completion when processing is complete?
How do I make a method async? What is the method signature?
public List<T> databaseQuery(String Query, String[] args){
String preparedQuery = QueryBaker(Query, args);
List<int> listOfNumbers = DB_Exec(preparedQuery); // time taking task
return listOfNumbers;
}
How would this fictional method become a non blocking method? Or if you want please provide a simple synchronous method and an asynchronous method version of it.

Why would I choose an asynchronous method over a multi-threaded method?
Asynchronous methods allow you to reduce the number of threads. Instead of tying up a thread in a blocking call, you can issue an asynchronous call and then be notified later when it completes. This frees up the thread to do other processing in the meantime.
It can be more convoluted to write asynchronous code, but the benefit is improved performance and memory utilization.
Java futures are supposed to be non-blocking. What does non-blocking mean? Why call it non-blocking when the method to extract information from a Future--i.e., get()--is blocking and will simply halt the entire thread till the method is done processing ? Perhaps a callback method that rings the church bell of completion when processing is complete?
Check out CompletableFuture, which was added in Java 8. It is a much more useful interface than Future. For one, it lets you chain all kinds of callbacks and transformations to futures. You can set up code that will run once the future completes. This is much better than blocking in a get() call, as you surmise.
For instance, given asynchronous read and write methods like so:
CompletableFuture<ByteBuffer> read();
CompletableFuture<Integer> write(ByteBuffer bytes);
You could read from a file and write to a socket like so:
file.read()
.thenCompose(bytes -> socket.write(bytes))
.thenAccept(count -> log.write("Wrote {} bytes to socket.", count)
.exceptionally(exception -> {
log.error("Input/output error.", exception);
return null;
});
How do I make a method async? What is the method signature?
You would have it return a future.
public CompletableFuture<List<T>> databaseQuery(String Query, String[] args);
It's then the responsibility of the method to perform the work in some other thread and avoid blocking the current thread. Sometimes you will have worker threads ready to go. If not, you could use the ForkJoinPool, which makes background processing super easy.
public CompletableFuture<List<T>> databaseQuery(String query, String[] args) {
CompletableFuture<List<T>> future = new CompletableFuture<>();
Executor executor = ForkJoinPool.commonPool();
executor.execute(() -> {
String preparedQuery = QueryBaker(Query, args);
List<T> list = DB_Exec(preparedQuery); // time taking task
future.complete(list);
});
}

why would I choose a Asynchronous method over a multi-threaded method
They sound like the same thing to me except asynchronous sounds like it will use one thread in the back ground.
Java futures is supposed to be non blocking ?
Non- blocking operations often use a Future, but the object itself is blocking, though only when you wait on it.
What does Non blocking mean?
The current thread doesn't wait/block.
Why call it non blocking when the method to extract information from a Future < some-object > i.e. get() is blocking
You called it non-blocking. Starting the operation in the background is non-blocking, but if you need the results, blocking is the easiest way to get this result.
and will simply halt the entire thread till the method is done processing ?
Correct, it will do that.
Perhaps a callback method that rings the church bell of completion when processing is complete ?
You can use a CompletedFuture, or you can just add to the task anything you want to do at the end. You only need to block on things which have to be done in the current thread.
You need to return a Future, and do something else while you wait, otherwise there is no point using a non-blocking operation, you may as well execute it in the current thread as it's simpler and more efficient.
You have the synchronous version already, the asynchronous version would look like
public Future<List<T>> databaseQuery(String Query, String[] args) {
return executor.submit(() -> {
String preparedQuery = QueryBaker(Query, args);
List<int> listOfNumbers = DB_Exec(preparedQuery); // time taking task
return listOfNumbers;
});
}

I'm not a guru on multithreading but I'm gonna try to answer these questions for my sake as well
why would I choose a Asynchronous method over a multi-threaded method ? (My problem: I believe I read too much and now I am myself confused)`
Multi-threading is working with multiple threads, there isn't much else to it. One interesting concept is that multiple threads cannot work in a truly parallel fashion and thus divides each thread into small bits to give the illusion of working in parallel.
1
One example where multithreading would be useful is in real-time multiplayer games, where each thread corresponds to each user. User A would use thread A and User B would use thread B. Each thread could track each user's activity and data could be shared between each thread.
2
Another example would be waiting for a long http call. Say you're designing a mobile app and the user clicks on download for a file of 5 gigabytes. If you don't use multithreading, the user would be stuck on that page without being able to perform any action until the http call completes.
It's important to note that as a developer multithreading is only a way of designing code. It adds complexity and doesn't always have to be done.
Now for Async vs Sync, Blocking vs Non-blocking
These are some definitions I found from http://doc.akka.io/docs/akka/2.4.2/general/terminology.html
Asynchronous vs. Synchronous
A method call is considered synchronous if the caller cannot make progress until the method returns a value or throws an exception. On the other hand, an asynchronous call allows the caller to progress after a finite number of steps, and the completion of the method may be signalled via some additional mechanism (it might be a registered callback, a Future, or a message).
A synchronous API may use blocking to implement synchrony, but this is not a necessity. A very CPU intensive task might give a similar behavior as blocking. In general, it is preferred to use asynchronous APIs, as they guarantee that the system is able to progress. Actors are asynchronous by nature: an actor can progress after a message send without waiting for the actual delivery to happen.
Non-blocking vs. Blocking
We talk about blocking if the delay of one thread can indefinitely delay some of the other threads. A good example is a resource which can be used exclusively by one thread using mutual exclusion. If a thread holds on to the resource indefinitely (for example accidentally running an infinite loop) other threads waiting on the resource can not progress. In contrast, non-blocking means that no thread is able to indefinitely delay others.
Non-blocking operations are preferred to blocking ones, as the overall progress of the system is not trivially guaranteed when it contains blocking operations.
I find that async vs sync refers more to the intent of the call whereas blocking vs non-blocking refers to the result of the call. However, it wouldn't be wrong to say usually asynchronous goes with non-blocking and synchronous goes with blocking.
2> Java futures is supposed to be non blocking ? What does Non blocking mean? Why call it non blocking when the method to extract information from a Future < some-object > i.e. get() is blocking and will simply halt the entire thread till the method is done processing ? Perhaps a callback method that rings the church bell of completion when processing is complete ?
Non-blocking do not block the thread that calls the method.
Futures were introduced in Java to represent the result of a call, although it may have not been complete. Going back to the http file example, Say you call a method like the following
Future<BigData> future = server.getBigFile(); // getBigFile would be an asynchronous method
System.out.println("This line prints immediately");
The method getBigFile would return immediately and proceed to the next line of code. You would later be able to retrieve the contents of the future (or be notified that the contents are ready). Libraries/Frameworks like Netty, AKKA, Play use Futures extensively.
How do I make a method Async? What is the method signature?
I would say it depends on what you want to do.
If you want to quickly build something, you would use high level functions like Futures, Actor models, etc. something which enables you to efficiently program in a multithreaded environment without making too many mistakes.
On the other hand if you just want to learn, I would say it's better to start with low level multithreading programming with mutexes, semaphores, etc.
Examples of codes like these are numerous in google if you just search java asynchronous example with any of the keywords I have written.
Let me know if you have any other questions!

How can I ensure an ExecutorService pool has completed, without shutting it down?

Currently, I'm making sure my tasks have finished before moving on like so:
ExecutorService pool = Executors.newFixedThreadPool(5);
public Set<Future> EnqueueWork(StreamWrapper stream) {
Set<Future> futureObjs = new HashSet<>();
util.setData(stream);
Callable callable = util;
Future future = pool.submit(callable);
futureObjs.add(future);
pool.shutdown();
try {
pool.awaitTermination(Long.MAX_VALUE, TimeUnit.NANOSECONDS);
} catch (InterruptedException e) {
e.printStackTrace();
}
Node.sendTCP(Node.getNodeByHostname(StorageTopology.getNextPeer()), Coordinator.prepareForTransport(stream));
return futureObjs;
}
However, because of some other threading on my socket, it's possible that multiple calls are made to EnqueueWork - I'd like to make sure the calls to .submit have completed in the current thread, without shutting down the pool for subsequent threads coming in.
Is this possible?

You can check by invoking isDone() method on all the Future objects in futureObjs. You need to make sure isDone is called in a loop. calling get() method on Future object is another option, since get() is a blocking call, it will return only after task is completed and result is ready. But do you really want to keep the pool open after all the tasks are done?

I agree with one of the comments, it seems odd that your executor can be used by different threads. Usually and executor is private to an instance of some class, but anyhow.
What you can do, from the docs, is to check:
getActiveCount() - Returns the approximate number of threads that are >actively executing tasks.
NOTE: This is a blocking method, it will take out a lock on the workers of your threadpool and block until it has counted everything
And also check:
getQueue() - Returns the task queue used by this executor. Access to the
task queue is intended primarily for debugging and monitoring.
This queue may be in active use. Retrieving the task queue
does not prevent queued tasks from executing.
If your queue is empty and the activeCount is 0, all your tasks should have finished. I say should because getActiveCount says "approximate". Looking at the impl, this is most likely because the worker internally has a flag indicating that it is locked (in use). There is in theory a slight race between executing and the worker being done and marking itself so.
A better approach would in fact be to track the features. You would have to check the Queue and that all futures are done.
However I think what you really need is to reverse your logic. Instead of the current thread trying to work out if another thread has submitted work in the meantime, you should have the other thread call isShutdown() and simply not submit a new task in that case.

You are approaching this issue from the wrong direction. If you need to know whether or not your tasks are finished, that means you have a dependency of A->B. The executor is the wrong place to ensure that dependency, as much as you don't ask the engine of your car "are we there yet?".
Java offers several features to ensure that a certain state has been reached before starting a new execution path. One of them is the invokeAll method of the ExecutorService, that returns only when all tasks that have been submitted are completed.
pool.invokeAll(listOfAllMyCallables);
// if you reach this point all callables are completed

You have already added Future to the set. Just add below code block to get the status of each Future task by calling get() with time out period.
In my example, time out is 60 seconds. You can change it as per your requirement.
Sample code:
try{
for(Future future : futureObjs){
System.out.println("future.status = " + future.get(60000, TimeUnit.MILLISECONDS));
}
}catch(Exception err){
err.printStackTrace();
}
Other useful posts:
How to forcefully shutdown java ExecutorService
How to wait for completion of multiple tasks in Java?

Simple asynchronous I/O: many threads, one file

I have a scientific application which I usually run in parallel with xargs, but this scheme incurs repeated JVM start costs and neglects cached file I/O and the JIT compiler. I've already adapted the code to use a thread pool, but I'm stuck on how to save my output.
The program (i.e. one thread of the new program) reads two files, does some processing and then prints the result to standard output. Currently, I've dealt with output by having each thread add its result string to a BlockingQueue. Another thread takes from the queue and writes to a file, as long as a Boolean flag is true. Then I awaitTermination and set the flag to false, triggering the file to close and the program to exit.
My solution seems a little kludgey; what is the simplest and best way to accomplish this?
How should I write primary result data from many threads to a single file?
The answer doesn't need to be Java-specific if it is, for example, a broadly applicable method.
Update
I'm using "STOP" as the poison pill.
while (true) {
String line = queue.take();
if (line.equals("STOP")) {
break;
} else {
output.write(line);
}
}
output.close();
I manually start the queue-consuming thread, then add the jobs to the thread pool, wait for the jobs to finish and finally poison the queue and join the consumer thread.

That's really the way you want to do it, have the threads put their output to the queue and then have the writer exhaust it.
The only thing you might want to do to make things a little cleaner is rather than checking a flag, simply put an "all done" token on to the queue that the writer can use to know that it's finished. That way there's no out of band signaling necessary.
That's trivial to do, you can use an well known string, an enum, or simply a shared object.

You could use an ExecutorService.
Submit a Callable that would perform the task and return the string after completion.
When Submitting the Callable you get hold of a Future, store these references e.g. in a List.
Then simply iterate through the Futures and get the Strings by calling Future#get.
This will block until the task is completed if it not yet is, otherwise return the value immediately.
Example:
ExecutorService exec = Executors.newFixedThreadPool(10);
List<Future<String>> tasks = new ArrayList<Future<String>>();
tasks.add(exec.submit(new Callable<String> {
public String call() {
//do stuff
return <yourString>;
}
}));
//and so on for the other tasks
for (Future<String> task : tasks) {
String result = task.get();
//write to output
}

Many threads processing, one thread writing and a message queue between them is a good strategy. The issue that just needs to be solved, is knowing when all work is finished. One way to do that is to count how many worker threads you started, and then after that count how many responses you got. Something like this pseudo code:
int workers = 0
for each work item {
workers++
start the item's worker in a separate thread
}
while workers > 0 {
take worker's response from a queue
write response to file
workers--
}
This approach also works if the workers can find more work items while they are executing. Just include any additional not-yet-processed work in the worker responses, and then increment the workers count and start workers threads as usual.
If each of the workers returns just one message, you can use Java's ExecutorService to execute Callable instances which return the result. ExecutorService's methods give access to Future instances from which you can get the result when the Callable has finished its work.
So you would first submit all the tasks to the ExecutorService and then loop over all the Futures and get their responses. That way you would write the responses in the order in which you check the futures, which can be different from the order in which they finish their work. If latency is not important, that shouldn't be a problem. Otherwise, a message queue (as mentioned above) might be more suitable.

It's not clear if your output file has some defined order or if you just dump your data there. I assume it has no order.
I don't see why you need an extra thread for writing to output. Just synchronized the method that writes to file and call it at the end of each thread.

If you have many threads writing to the same file the simplest thing to do is to write to that file in the task.
final PrintWriter out =
ExecutorService es =
for(int i=0;i<tasks;i++)
es.submit(new Runnable() {
public void run() {
performCalculations();
// so only one thread can write to the file at a time.
synchornized(out) {
writeResults(out);
}
}
});
es.shutdown();
es.awaitTermination(1, TimeUnit.HOUR);
out.close();

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.