I have to get data from an API, so naturally I have an endpoint handeler that is accessed through a lambda that, I assume, spawns off several threads to complete each API call that I need. However, After all of the API calls are finished (all of the lambda threads complete) I need to oranize my data. Currently, the Sort method that I have runs on the main thread, and therefore finishes before any of the API calls in the lambda finish. Here is a sample of what I have
for(String data : dataArray) {
APIEndpoint apiCall = new APIEndpoint("http://sampleAPI.org/route/" + data);
apiCall.execute(((response, success) -> {
//Format and gather the info from the response
apiDataArray.add(DataFromAPIObject);
}));
}
System.out.print(apiDataArray.size());//Returns 0
sortData();//Currently Doesn't Sort anything because the array is empty
Edit: Here is the Endpoint Executer I am working with:
https://github.com/orange-alliance/TOA-DataSync/blob/master/src/org/theorangealliance/datasync/util/FIRSTEndpoint.java
Using semaphores might be an option. But it will deadlock if for some reason there is no response for at least one of the data points. (To fix the deadlock you might need to release the semaphore on errors).
Semaphore semaphore = new Semaphore(dataArray.length);
for (String data : dataArray) {
semaphore.acquire();
APIEndpoint apiCall = new APIEndpoint("http://sampleAPI.org/route/" + data);
apiCall.execute(((response, success) -> {
// Format and gather the info from the response
apiDataArray.add(DataFromAPIObject);
semaphore.release();
}));
}
semaphore.acquire(dataArray.length);
sortData();
Related
in following code I have asynchronous method which is executed by rest by authenticated user. In this method I execute loop in which is checking periodically cache of new data.
#Async
public CompletableFuture<List<Data>> pollData(Long previousMessageId, Long userId) throws InterruptedException {
// check db at first, if there are new data no need go to loop and waiting
List<Data> data = dataRepository.findByLastAndByUser(dataId, userId));
data not found so jump to loop for some time
if (data.size() == 0) {
short c = 0;
while (c < 100) {
// check if some new data added or not, if yes break loop
if (cache.getIfPresent(userId) != null) {
break;
}
c++;
Thread.sleep(1000);
System.out.println("SEQUENCE: " + c + " in " + Thread.currentThread().getName());
}
// check database on the end of loop or after break from loop
data = dataRepository.findByLastAndByUser(dataId, userId);
}
// clear data for that recipient and return result
cache.clear(userId);
return CompletableFuture.completedFuture(data);
}
and executor bean:
#Bean
public Executor asyncExecutor() {
ThreadPoolTaskExecutor executor = new ThreadPoolTaskExecutor();
executor.setCorePoolSize(2);
executor.setMaxPoolSize(2);
executor.setQueueCapacity(500);
executor.initialize();
return executor;
}
I execute this checking in separate thread for every request because these data are different for each user.
I need to optimize this code for many users (about 10k active users). In current state it doesn't work well because where there is more requests, these requests are waiting for some new free thread, and every another request takes very long time (5 min instead of 100 sec for example).
Can you help me improve it? Thanks in advice.
In case there are no other concurrent calls to the pollData method, it takes at most ~100s.
The parameter maxPoolSize defines the maximum number of thread that can run concurrently your #Asynch method.
So (number of users * execution time) / number of threads = 10K*100/2 = 500K[s].
I haven't completely understood the goal you want to reach with this method, but I suggest you to review the design of this functionality.
(For example take a look at spring cache, #evict,...)
(Notice that, in case you have multiple #async, you can bind the pool configuration with the #Asynch method by adding the name to the annotations #Bean("Pool1") and #Asynch("Pool1")).
I don't fully understand what you want to do.
But I think obviously it fill quickly your pool of thread.
I think you should try to use message broker or something like this.
Instead of trying to respond to request by waiting something new append, you should connect your clients with AMQP, WebSocket, Webhook... etc... On your server, when you detect new informations, you notify your clients.
So you don't need to occupy one thread by client.
I have method as
public List<SenderResponse> sendAllFiles(String folderName) {
List<File> allFiles = getListOfFiles();
List<SenderResponse> finalResponse = new ArrayList<SenderResponse>();
for (File file : allFiles) {
finalResponse.getResults().add(sendSingleFile(file));
}
return finalResponse;
}
which is running as a single thread. I want run sendSingleFile(file) using multithread so I can reduce the total time taken to send files.
how can I run sendSingleFile(file) using multithreads for various files and get the final response?
I found few articles using threadpoolexecutor. But how to handle the response got during the sendSingleFile(file) and add it to one Final SenderResponse?
I am kind of new to multi-thread. Please suggest the best way to process these files.
Define an executor service
ExecutorService executor = Executors.newFixedThreadPool(MAX_THREAD); //Define integer value of MAX_THREAD
Then for each job you can do something like this:-
Callable<SenderResponse> task = () -> {
try {
return sendSingleFile(file);
}
catch (InterruptedException e) {
throw new IllegalStateException("Interrupted", e);
}
};
Future<SenderResponse> future = executor.submit(task);
future.get(MAX_TIME_TO_WAIT, TimeUnit.SECONDS); //Blocking call. MAX_TIME_TO_WAIT is max time future will wait for the process to execute.
You start by writing code that works works for the single-thread solution. The code you posted wouldn't even compile; as the method signature says to return SenderResponse; whereas you use/return a List<SenderResponse> within the method!
When that stuff works, you continue with this:
You create an instance of
ExecutorService, based on as many threads as you want to
You submit tasks into that service.
Each tasks knows about that result list object. The task does its work, and adds the result to that result list.
The one point to be careful about: making sure that add() is synchronized somehow - having multiple threads update an ordinary ArrayList is not safe.
For your situation, I would use a work stealing pool (ForkJoin executor service) and submit "jobs" to it. If you're using guava, you can wrap that in a listeningDecorator which will allow you to add a listener on the futures it returns.
Example:
// create the executor service
ListeningExecutorService exec = MoreExecutors.listeningDecorator(Executors.newWorkStealingPool());
for(Foo foo : bar) {
// submit can accept Runnable or Callable<T>
final ListenableFuture<T> future = exec.submit(() -> doSomethingWith(foo));
// Run something when it is complete.
future.addListener(() -> doSomeStuff(future), exec);
}
Note that the listener will be called whether the future was successful or not.
With RxJava, how to sequentially execute asynchronous methods on a list of data?
Using the Node.js Async module it is possible to call a function over an array of objects sequentially in the following fashion:
var dataArray = ['file1','file2','file3']; // array of arguments
var processor = function(filePath, callback) { // function to call on dataArray's arguments
fs.access(filePath, function(err) { // perform an async operation
// process next item in dataArray only after the following line is called
callback(null, !err)
}
};
async.every(dataArray, processor, function(err, result) {
// process results
});
What's nice about this is that code executed within the processor function can be asynchronous and the callback can be run only once the async task is finished. This means that each object in dataArray will be processed one after another, not in parallel.
Now, in RxJava I can call a processing function over dataArray by calling:
String[] dataArray = new String[] {"file1", "file2", "file3"};
Observable.fromArray(dataArray).subscribe(new Consumer<String>() { // in RxJava 1 it's Action1
#Override
public void accept(#NonNull String filePath) throws Exception {
//perform operation on filePath
}
});
However, if I perform an asynchronous operation on filePath, how can I ensure the sequential execution on items of dataArray? What I'd be looking for is something along the lines of:
String[] dataArray = new String[] {"file1", "file2", "file3"};
Observable.fromArray(dataArray).subscribe(new Consumer<String>() {
// process next item in dataArray only after the callback is called
#Override
public void accept(#NonNull String filePath, CallNext callback) throws Exception {
// SomeDatabase will call callback once
// the asynchronous someAsyncOperation finishes
SomeDatabase.someAsyncOperation(filePath, callback);
}
});
And furthermore, how do I call some code once all items of dataArray have been processed? Sort of a 'completion listener'?
Note: I realise I'm probably getting the RX concept wrong. I'm asking since I can't find any guideline on how to implement the Node.js' Async pattern I mentioned above using RxJava. Also, I'm working with Android hence no lambda functions.
After a couple of days of trial and error, the solution I used for this question is as follows. Any suggestions on improvement are most welcome!
private Observable<File> makeObservable(final String filePath){
return Observable.create(new ObservableOnSubscribe<File>() {
#Override
public void subscribe(final ObservableEmitter<File> e) throws Exception {
// someAsyncOperation will call the Callback.onResult after
// having finished the asynchronous operation.
// Callback<File> is an abstract code I put together for this example
SomeDatabase.someAsyncOperation(filePath, new Callback<File>(){
#Override
public void onResult(Error error, File file){
if (error != null){
e.onError(error);
} else {
e.onNext(file);
e.onComplete();
}
}
})
}
});
}
private void processFilePaths(ArrayList<String> dataArray){
ArrayList<Observable<File>> observables = new ArrayList<>();
for (String filePath : dataArray){
observables.add(makeObservable(filePath));
}
Observable.concat(observables)
.toList()
.subscribe(new Consumer<List<File>>() {
#Override
public void accept(#NonNull List<File> files) throws Exception {
// process results.
// Files are now sequentially processed using the asynchronous code.
}
});
}
In short what's happening:
Turn dataArray into an array of Observable's
Each Observable performs the asynchronous operation at the time of subscription and feeds the data to onNext after the async operation finishes
Use Observable.concat() on the array of Observable's to ensure sequential execution
Use .toList() to combine the results of Observable's into one List
While this code does what I require, I'm not entirely satisfied with this solution for a few reasons:
I'm not sure if executing the asynchronous code within the Observable.create()...subscribe(){ is the right way to use Observable's.
I read that using Observable.create should be used with caution (ref docs)
I'm not sure the behaviour I'm trying to achieve requires creating a new Observable for each item in the dataArray. It seems more natural to me to have one observable that is capable of releasing data sequentially - but again, that may be just my old style of thinking.
I've read that using concatMap should do the trick, but I could never figure out how to apply it to this example, and found concat doing just the trick. Anyone caring to explain the difference between the two?
I also tried using .zip function on the array of Observable's before, and while I managed to get the List of results at the end, the async operations were executed in parallel not sequentially.
One very simple and easy way to perform asynchronous operations (calculations etc.) is to use the RxJava Async Utils library (not written by me, just used in our code).
Easiest way to run a method in a background thread is to use Async.start(this::methodToCall, Schedulers.io()) which returns an Observable<T> that produces the return value of the passed method after the method call completes.
There are many other alternative methods that allow you to e.g. report intermediate results via an Observer<T>
If you want to use Observable.from() and concatMap() (or similar ways) remember to first move the processing to a background scheduler with .observerOn(Schedulers.io()) or any other scheduler you want.
To call a function over an array of objects sequentially, you need not any async facilities. Just make a for-loop over the array.
If you need that loop to execute asynchronously, then please describe what asynchrony you need: either to start the loop after array is asynchronously filled, or you want to react asynchronously to the result of the computation, or both. In either case, class CompletableFuture may help.
I'm trying to delete a batch of couchbase documents in rapid fashion according to some constraint (or update the document if the constraint isn't satisfied). Each deletion is dubbed a "parcel" according to my terminology.
When executing, I run into a very strange behavior - the thread in charge of this task starts working as expected for a few iterations (at best). After this "grace period", couchbase gets "stuck" and the Observable doesn't call any of its Subscriber's methods (onNext, onComplete, onError) within the defined period of 30 seconds.
When the latch timeout occurs (see implementation below), the method returns but the Observable keeps executing (I noticed that when it kept printing debug messages when stopped with a breakpoint outside the scope of this method).
I suspect couchbase is stuck because after a few seconds, many Observables are left in some kind of a "ghost" state - alive and reporting to their Subscriber, which in turn have nothing to do because the method in which they were created has already finished, eventually leading to java.lang.OutOfMemoryError: GC overhead limit exceeded.
I don't know if what I claim here makes sense, but I can't think of another reason for this behavior.
How should I properly terminate an Observable upon timeout? Should I? Any other way around?
public List<InfoParcel> upsertParcels(final Collection<InfoParcel> parcels) {
final CountDownLatch latch = new CountDownLatch(parcels.size());
final List<JsonDocument> docRetList = new LinkedList<JsonDocument>();
Observable<JsonDocument> obs = Observable
.from(parcels)
.flatMap(parcel ->
Observable.defer(() ->
{
return bucket.async().get(parcel.key).firstOrDefault(null);
})
.map(doc -> {
// In-memory manipulation of the document
return updateDocs(doc, parcel);
})
.flatMap(doc -> {
boolean shouldDelete = ... // Decide by inner logic
if (shouldDelete) {
if (doc.cas() == 0) {
return Observable.just(doc);
}
return bucket.async().remove(doc);
}
return (doc.cas() == 0 ? bucket.async().insert(doc) : bucket.async().replace(doc));
})
);
obs.subscribe(new Subscriber<JsonDocument>() {
#Override
public void onNext(JsonDocument doc) {
docRetList.add(doc);
latch.countDown();
}
#Override
public void onCompleted() {
// Due to a bug in RxJava, onError() / retryWhen() does not intercept exceptions thrown from within the map/flatMap methods.
// Therefore, we need to recalculate the "conflicted" parcels and send them for update again.
while(latch.getCount() > 0) {
latch.countDown();
}
}
#Override
public void onError(Throwable e) {
// Same reason as above
while (latch.getCount() > 0) {
latch.countDown();
}
}
};
);
latch.await(30, TimeUnit.SECONDS);
// Recalculating remaining failed parcels and returning them for another cycle of this method (there's a loop outside)
}
I think this is indeed due to the fact that using a countdown latch doesn't signal the source that the flow of data processing should stop.
You could use more of rxjava, by using toList().timeout(30, TimeUnit.SECONDS).toBlocking().single() instead of collecting in an (un synchronized and thus unsafe) external list and of using the countdownLatch.
This will block until a List of your documents is returned.
When you create your couchbase env in code, set computationPoolSize to something large. When the Couchbase clients runs out of threads using async it just stops working, and wont ever call the callback.
I'm using Jetty HTTP Client to make about 50 HTTP calls asynchronously. The code looks something like this:
List<Address> addresses = getAddresses();
final List<String> done = Collections.synchronizedList(new LinkedList<String>());
List<ContentExchange> requests;
for (Address address : addresses) {
ContentExchange ce = new ContentExchange() {
#Override
protected void onResponseComplete() throws IOException {
//handle response
done.add("done");
}
}
ce.setURL(createURL(address));
requests.add(ce);
}
for (ContentExchange ce : requests) {
httpClient.send(ce);
}
while (done.size() != addresses.size()) {
Thread.yield();
}
System.out.println("All addresses processed");
It's calling a rest service that returns back some data about the address. What I expect it to do is this:
Make 50 asynchronous (non-blocking) http calls.
The thread will wait until all 50 are finished.
However, it's not working. It works fine if I don't have the while loop, but I need to wait until all 50 are done. Is there some way to wait until all 50 are done?
Also I know about ExecutorService and multiple thread solution, but I need a single thread solution with non-blocking IO.
Use the java.util.concurrent.CountDownLatch to manage this.
Example from Eclipse Jetty 8.1.10.v20130312's Siege.java test class:
final CountDownLatch latch = new CountDownLatch(concurrent);
for (int i=0;i<concurrent;i++)
{
ConcurrentExchange ex = new ConcurrentExchange(client,latch,uris,repeats);
if (!ex.next()) // this executes the client.send()
{
latch.countDown(); // count down if client.send() was in error
}
}
latch.await(); // wait for all ConcurrentExchange's to complete (or error out)
Note: ConcurrentExchange is a private class within Siege.java.
Then in your HttpExchange object, use the CountDownLatch.countDown() call in the following methods
onConnectionFailed(Throwable x) - example
onException(Throwable x) - example
onExpire() - example
onResponseComplete() - example
Note that all of the examples use a AtomicBoolean counted to make sure that they are only counted once.
if (!counted.getAndSet(true)) // get the value, then set it to true
{
// only get here if counted returned false. (and that will only happen once)
latch.countDown(); // count down this exchange as being done.
}