NIO.2 asynchronous channels coding guidelines - java

For example, i want to read 100500 bytes to an array:
byte[] array = new byte[100500];
int offset = 0;
ByteBuffer buf = ByteBuffer.directBuffer(4096);
channel.read(buffer, null, new LambdaAdapter((count, exception, attachment) -> {
buf.get(array, offset, count);
offset += count; //will not work, offset must be final
if (offset < 100500) {
channel.read(buffer, null, /*what here? must be lambda we are currently in but we can't use it!*/)
}
//here we have our 100500 bytes inside array
});
LambdaAdapter here is simple wrapper that converts CompletionHandler to functional interface with three arguments.
Anyway. Offset can be put to 'attachment' parameter, lambda can be declared beforehand and reused as as field. However, resulting code is always ugly Ugly UGLY.
I was not able to write acceptable solution even for such simple task - what it will look like for complex protocol where reads are interleaved with writes and wrapped in complex logic?
Does anyone know suitable way to deal with async API? If you think that Scala can save the world here, feel free to use it.

I know how to deal with with async computations IO in general, and async IO in particular. Async program should be represented as a dataflow graph, as described in Dataflow_programming. Each node has a set of inputs, each input accepts messages or signals, and fires (goes to a thread pool) when all inputs are filled. For example, a node representing a socket channel has two inputs: one for ByteBuffers and one to indicate that the channel is free and can accept IO requests. So a node resembles a function in functional programming, but can be reused - no new objects are created for next IO operation.
Scala (and Akka) actors does not fit, as each actor has only one input. Look, however, at Scala Dataflow - I did not learn it yet, but the name is promising :).
I have developed a dataflow library for java dataflow library for java, but its IO part is somewhat outdated (I am thinking of more elegant, or at least less ugly API). Look at echo server implementation in examples subdirectory.

I found acceptable solutions.
First of all, following worth looking at: https://github.com/Netflix/RxJava
As for coding guidelines...
Async method is a method that returns before doing any useful work.
Async operation code should start with creation of new object, let's call it context
Void startMyAsync(String p1, int p2, Observer callback) {
return new MyAsyncContext(p1, p2, callback).start();
}
Result of async operation method is not used - let's return Void type. It is useful since compiler will check for you that every async method calls another async method or explicitly returns null.
Async methods can throw exceptions
Async callbacks should not throw exceptions - callback error handler must be used instead
Async callbacks should only contain try...catch and context method invocation.
Async callbacks should return Void as well
Additional data provided by CompletionHandler is not needed - context fields should be used instead. If async flow does not split synchronization is not needed.
Example of async callback:
return myAsyncMethod(param1, param2, (result, exc, att) -> {
try {
if (exc != null) {
return handleError(exc); //cleanup resources and invoke parentHandler.complete(null, exc, null)
} else {
return handleResult(result);
}
} catch (Exception e) {
return handleError(exc); //cleanup resources and invoke parentHandler.complete(null, exc, null)
}
});

Related

RxJava - When to use Observable with create method

I was reading a tutorial:
http://code.tutsplus.com/tutorials/getting-started-with-reactivex-on-android--cms-24387
which concers RxAndroid in particular but it's pretty much the same as in RxJava. I am not sure that I understood the concept completely.
Below I have written a method and then a sample usage.
My question is: is this the right way to implement my functions so that I can run them on other threads asynchronously? They will in fact only return a created Observable running the real code, and handling errors and all that stuff.
Or is this wrong, then I'd like to know the correct way.
Observable<String> googleSomething(String text){
return Observable.create(new Observable(){
#Override
public void call(Subscriber<? super String> subscriber) {
try {
String data = fetchData(text); // some normal method
subscriber.onNext(data); // Emit the contents of the URL
subscriber.onCompleted(); // Nothing more to emit
} catch(Exception e) {
subscriber.onError(e); // In case there are network errors
}
}
});
}
googleSomething("hello world").subscribeOn(Schedulers.io()).observeOn(Schedulers.immediate()).subscribe(...)
Also is Schedulers.immediate() used in order to execute the subscriber code on the current thread? It says "Creates and returns a Scheduler that executes work immediately on the current thread." in javadoc, but I'm not sure.
Unless you are more experienced and need a custom operator or want to bridge a legacy addListener/removeListener based API you should not start with create. There are several questions on StackOverflow which used create and was the source of trouble.
I'd prefer fromCallable which let's you generate a single value or throw an Exception thus no need for those lengthy defer + just sources.
Schedulers.immediate() executes its task immediately on the caller's thread, which is the io() thread in your example, not the main thread. Currently, there is no support for moving back the computation to the Java main thread as it requires blocking trampolining and usually a bad idea anyway.
You should almost never use create(), especially not as a beginner. There are easier ways to create observables, and create() is difficult to implement correctly.
Most of the time, you can easily get around create() by using defer(). E.g., in this case you'd do:
Observable<String> googleSomething(String text) {
return Observable.defer(new Func0<Observable<String>>() {
#Override
public Observable<String> call() {
try {
return Observable.just(fetchData(text));
} catch (IOException e) {
return Observable.error(e);
}
}
});
}
If you're not using a checked exception, then you could even get rid of the try-catch. RxJava will automatically forward any RuntimeException to the onError() part of the subscriber.
You can create Observable via Observable.create(new OnSubscribe {}) method however:
Look at defer() operator, which allows you to return for example Observable.just() and Observable.error() so you don't need to touch subscriber directly
Prefer using SyncOnSubscribe/AsyncOnSubscribe to handle backpressure
Schedulers.immediate() will keep Observable processing on the thread it already is - so in your case it will be one of the Schedulers.io threads
Your code looks good to me. If you are unsure wether that is running on another thread or not. you could print something immediately after you call .subscribe() and see the order of the outputs.
googleSomething("hello world").subscribeOn(Schedulers.io()).observeOn(Schedulers.immediate()).subscribe(...)
System.out.println("This should be printed first");
Try to simulate a long running operation inside fetchData() and print something else immediately afterwards. As .subscribe() is non blocking "This should be printed first" is in fact, going to be printed first.
Alternatively, you can print the current thread using.
Thread.currentThread().getName()
Use this inside and outside your observable and the outputs should differ.

ParallelStreams in java

I'm trying to use parallel streams to call an API endpoint to get some data back. I am using an ArrayList<String> and sending each String to a method that uses it in making a call to my API. I have setup parallel streams to call a method that will call the endpoint and marshall the data that comes back. The problem for me is that when viewing this in htop I see ALL the cores on the db server light up the second I hit this method ... then as the first group finish I see 1 or 2 cores light up. My issue here is that I think I am truly getting the result I want ... for the first set of calls only and then from monitoring it looks like the rest of the calls get made one at a time.
I think it may have something to do with the recursion but I'm not 100% sure.
private void generateObjectMap(Integer count){
ArrayList<String> myList = getMyList();
myList.parallelStream().forEach(f -> performApiRequest(f,count));
}
private void performApiRequest(String myString,Integer count){
if(count < 10) {
TreeMap<Integer,TreeMap<Date,MyObj>> tempMap = new TreeMap();
try {
tempMap = myJson.getTempMap(myRestClient.executeGet(myString);
} catch(SocketTimeoutException e) {
count += 1;
performApiRequest(myString,count);
}
...
else {
System.exit(1);
}
}
This seems an unusual use for parallel streams. In general the idea is that your are informing the JVM that the operations on the stream are truly independent and can run in any order in one thread or multiple. The results will subsequently be reduced or collected as part of the stream. The important point to remember here is that side effects are undefined (which is why variables changed in streams need to be final or effectively final) and you shouldn't be relying on how the JVM organises execution of the operations.
I can imagine the following being a reasonable usage:
list.parallelStream().map(item -> getDataUsingApi(item))
.collect(Collectors.toList());
Where the api returns data which is then handed to downstream operations with no side effects.
So in conclusion if you want tight control over how the api calls are executed I would recommend you not use parallel streams for this. Traditional Thread instances, possibly with a ThreadPoolExecutor will serve you much better for this.

Way to prioritize specific API calls with multithreads or priority queue?

In my application, my servlet(running on tomcat) takes in a doPost request, and it returns an initial value of an api call to the user for presentation and then does a ton of data analysis in the back with a lot more other api calls. The data analysis then goes into my mongodb. The problem arises when I want to start the process before the bulk api calls are finished. There are so many calls that I would need at least 20 seconds. I don't want the user to wait for 20 seconds for their initial data display, so I want the data analysis to pause to let the new request to call for that initial api for display.
Here's the general structure of my function after the doPost(async'd so this is in a Runnable). It's a bit long so I abbreviated it for easier read:
private void postMatches(ServletRequest req, ServletResponse res) {
... getting the necessary fields from the req ...
/* read in values for arrays */
String rankQueue = generateQueryStringArray("RankedQueues=", "rankedQueues", info);
String season = generateQueryStringArray("seasons=", "seasons", info);
String champion = generateQueryStringArray("championIds=", "championIds", info);
/* first api call, "return" this and then start analysis */
JSONObject recentMatches = caller.callRiotMatchHistory(region, "" + playerId);
try {
PrintWriter out = res.getWriter();
out.write((new Gson()).toJson(recentMatches));
out.close();
} catch (IOException e) {
e.printStackTrace();
}
/* use array values to send more api calls */
JSONObject matchList = caller.callRiotMatchList(playerId, region, rankQueue, season, champion);
... do a ton more api calls with the matchList's id's...
}
So one of my ideas is to
Have two threads per client.
That way, there would be one thread calling that single api call, and the other thread would be calling the rest 999 api calls. This way, the single api calling thread would just wait until another doPost from the same client would come and call the api immediately, and the bulk api calls that come with it will just be appended to the other thread. By doing this, the two threads will compute in parallel.
Have a priority queue, put the initial call on high priority
This way, every URL will be passed through the queue and I can chose the compareTo of specific URL's to be greater(maybe wrap it in a bean). However, I'm not sure how the api caller will be able to distinguish which call is which, because once the url's added into the queue it loses identity. Is there any way to fix that? I know callbacks aren't available in java, so it's kind of hard to do that.
Are either of these two ideas possible? No need for code, but it would be greatly appreciated!
PS: I'm using Jersey for API calls.
The best bet for you seems to be using the "two threads per client" solution. Or rather a variation of it.
I figure the API you're calling will have some rate-limiting in place, so that significant amounts of calls will get automatically blocked. That's problematic for you since that limit can probably be trivially reached with just a few requests you process simultaneously.
Additionally you may hit I/O-Limits rather sooner than later, depending on how timeintensive your calculations are. This means you should have an intrinsic limit for your background API calls, the initial request should be fine without any inherent limiting. As such a fixed-size ThreadPool seems to be the perfect solution. Exposing it as a static ExecutorService in your service should be the simplest solution.
As such I'd propose that you expose a static service, that takes the matchList as parameter and then "does it's thing".
It could looks something like this:
public class MatchProcessor {
private static final ExecutorService service = Executors.newFixedThreadPool(THREADS);
public static void processMatchList(final JSONObject matchList) {
service.submit(() -> runAnalysis(matchList));
}
private static void runAnalysis(final JSONObject matchList) {
//processing goes here
}
}
Sidenote: this code uses java 8, it should be simple to convert the submitted lambda into a Runnable, if you're on Java 7

What is the meaning of the following statment from play! documentation?

From play documentation
Whether the action code returns a Result or a Promise, both
kinds of returned object are handled internally in the same way. There
is a single kind of Action, which is asynchronous, and not two kinds
(a synchronous one and an asynchronous one). Returning a Promise is a
technique for writing non-blocking code.
does this mean that there is no difference/advantage or disadvantage in returning a Promise<Result> rather than returning a Result? If play! framework encloses calls to public static Result function() in a Promise, is there a point in the developer explicitly returning a Promise<Result>?
No, there is no point in explicitly returning a Promise<Result>, if what your code does is synchronous simply return a Result.
However, sometimes your code calls other code that returns a Promise because it performs a non-blocking asynchronous operation. In that case you should transform the promise to extract the information you need from it and then return it. By keeping it a Promise instead of unwrapping it - you're not forcing the thread to block and saving the overhead of a context switch which can be significant.
For example let's say you want to query a webservice:
WSRequestHolder holder = WS.url("someUrl");
Promise<JsonNode> json = holder.get();
Now, you can do the following:
JsonNode jsonResult = json.get(); // force the program to wait
return jsonResult; // return it
But this will force the thread to context-switch in a blocking io operation. Instead, you can return a promise directly:
return json; // return the promise
And save the overhead of context-switching, you can also use .map if you need to manipulate it first.

Java pattern for nested callbacks?

I'm looking for a Java pattern for making a nested sequence of non-blocking method calls. In my case, some client code needs to asynchronously invoke a service to perform some use case, and each step of that use case must itself be performed asynchronously (for reasons outside the scope of this question). Imagine I have existing interfaces as follows:
public interface Request {}
public interface Response {}
public interface Callback<R extends Response> {
void onSuccess(R response);
void onError(Exception e);
}
There are various paired implementations of the Request and Response interfaces, namely RequestA + ResponseA (given by the client), RequestB + ResponseB (used internally by the service), etc.
The processing flow looks like this:
In between the receipt of each response and the sending of the next request, some additional processing needs to happen (e.g. based on values in any of the previous requests or responses).
So far I've tried two approaches to coding this in Java:
anonymous classes: gets ugly quickly because of the required nesting
inner classes: neater than the above, but still hard for another developer to comprehend the flow of execution
Is there some pattern to make this code more readable? For example, could I express the service method as a list of self-contained operations that are executed in sequence by some framework class that takes care of the nesting?
Since the implementation (not only the interface) must not block, I like your list idea.
Set up a list of "operations" (perhaps Futures?), for which the setup should be pretty clear and readable. Then upon receiving each response, the next operation should be invoked.
With a little imagination, this sounds like the chain of responsibility. Here's some pseudocode for what I'm imagining:
public void setup() {
this.operations.add(new Operation(new RequestA(), new CallbackA()));
this.operations.add(new Operation(new RequestB(), new CallbackB()));
this.operations.add(new Operation(new RequestC(), new CallbackC()));
this.operations.add(new Operation(new RequestD(), new CallbackD()));
startNextOperation();
}
private void startNextOperation() {
if ( this.operations.isEmpty() ) { reportAllOperationsComplete(); }
Operation op = this.operations.remove(0);
op.request.go( op.callback );
}
private class CallbackA implements Callback<Boolean> {
public void onSuccess(Boolean response) {
// store response? etc?
startNextOperation();
}
}
...
In my opinion, the most natural way to model this kind of problem is with Future<V>.
So instead of using a callback, just return a "thunk": a Future<Response> that represents the response that will be available at some point in the future.
Then you can either model subsequent steps as things like Future<ResponseB> step2(Future<ResponseA>), or use ListenableFuture<V> from Guava. Then you can use Futures.transform() or one of its overloads to chain your functions in a natural way, but while still preserving the asynchronous nature.
If used in this way, Future<V> behaves like a monad (in fact, I think it may qualify as one, although I'm not sure off the top of my head), and so the whole process feels a bit like IO in Haskell as performed via the IO monad.
You can use actor computing model. In your case, the client, services, and callbacks [B-D] all can be represented as actors.
There are many actor libraries for java. Most of them, however, are heavyweight, so I wrote a compact and extendable one: df4j. It considers actor model as a specific case of more general dataflow computing model and, as a result, allows user to create new types of actors, to optimally fit user's requirements.
I am not sure if I get you question correctly. If you want to invoke a service and on its completion result need to be passed to other object which can continue processing using result. You can look at using Composite and Observer to achive this.

Categories