N+1 HTTP calls batching with simultaneous queue - java

Suppose,I have a method which fetches data from an HTTP API
public R getResource(String id){
//HTTP call to
return fetch("http://example.com/api/id")
}
But
http://example.org/api/ supports multiple Ids at a time say
http://example.org/api/id1,id2,id3
In a multi-threaded environment, i want to block, until i have collected 'm' ids and then in one shot, get data from the API.
Also, to avoid infinite/long blocks, there should be a wait timeout.
for m=5
Lets say 20 threads arrive concurrently to call this method, then 4 batches of requests should be sent to the HTTP api.
Any implementation suggestion or existing frameworks to support this batching.
Edit suggestions are welcome.

Use a BlockingQueue with a thread doing BlockingQueue::poll(long timeout, TimeUnit unit) with the timeout computed e.g., so that the first request waits no longer than some fixed duration.
The polling thread will gather the IDs from the queue in its own list until it either has m IDs or the maximum waiting duration gets reached. There should be only one such thread.
In the above list, there should be entries containing both the ID and a CompletableFuture<R>, which gets completed using the result of the call. The future is what you give to the caller. Instead of a list, you may want to use a Map<String, CompletableFuture<R>> so than on request completion, you can complete the futures easily. Actually, the queue should contain the future, too, so you can give it back to the caller.
A rough sketch:
class ResourceMultigetter<R> {
private final BlockingQueue<Map.Entry<String, CompletableFuture<R>>> newEntries = ...;
private final Map<String, CompletableFuture<R>> collected = ...;
private long millisOfFirstWaitingRequest;
private volatile boolean stopped;
class Processor implements Runnable {
#Override
public void run() { // run by the polling thread
while (!stopped) {
final Map.Entry<String, CompletableFuture<R>> e = newEntries.poll(....);
if (e == null) {
if (!timeHasElapsed()) continue;
} else {
if (collected.isEmpty()) {
millisOfFirstWaitingRequest = System.currentTimeMillis();
}
collected.put(e.getKey(), e.getValue());
if (collected.size() < m && !timeHasElapsed()) continue;
}
final List<String> processedIds = callTheServer();
processedIds.forEach(id -> collected.remove(id));
}
}
}
public CompletableFuture<R> enqueue(String id) {
final CompletableFuture<R> result = new CompletableFuture<>();
newEntries.add(new AbstractMap.SimpleImmutableEntry<>(id, result));
return result;
}
}
You'd initialize it like
ResourceMultigetter resourceMultigetter = new ResourceMultigetter();
new Thread(resourceMultigetter.new Processor()).start();
The client code would do something like
R r = resourceMultigetter.enqueue(id); // this blocks

Related

Is this the correct way to extract counts from a Concurrent Hash Map without missing some or double counting?

Working on something where I'm trying to count the number of times something is happening. Instead of spamming the database with millions of calls, I'm trying to sum the updates in-memory and then dumping the results into the database once per second (so like turning 10 +1s into a single +10)
I've noticed some strange inconsistency with the counts (like there should be exactly 1 million transactions but instead there are 1,000,016 or something).
I'm looking into other possible causes but I wanted to check that this is the correct way of doing things. The use case is that it needs to be eventually correct, so it's okay as long as the counts aren't double counted or dropped.
Here is my sample implementation.
public class Aggregator {
private Map<String, LongAdder> transactionsPerUser = new ConcurrentHashMap<>();
private StatisticsDAO statisticsDAO;
public Aggregator(StatisticsDAO statisticsDAO) {
this.statisticsDAO = statisticsDAO;
}
public void incrementCount(String userId) {
transactionsPerId.computeIfAbsent(userId, k -> new LongAdder()).increment();
}
#Scheduled(every = "1s")
public void sendAggregatedStatisticsToDatabase() {
for (String userId : transactionsPerUser.keySet()) {
long count = transactionsPerUser.remove(userId).sum();
statisticsDAO.updateCount(userId, count);
}
}
}
You will have updates dropped in the following scenario:
Thread A calls incrementCount, and finds an already existing LongAdder instance for the given userId, this instance is returned from computeIfAbsent.
Thread B is at the same time handling a sendAggregatedStatisticsToDatabase call, which removes that LongAdder instance from the map.
Thread B calls sum() on the LongAdder instance.
Thread A, still executing that same incrementCount invocation, now calls increment() on the LongAdder instance.
This update is now dropped. It will not be seen by the next invocation of sendAggregatedStatisticsToDatabase, because the increment() call happened on an instance that was removed from the map in between the calls to computeIfAbsent() and increment() in the incrementCount method.
You might be better off reusing the LongAdder instances by doing something like this in sendAggregatedStatisticsToDatabase:
LongAdder longAdder = transactionsPerUser.get(userId);
long count = longAdder.sum();
longAdder.add(-count);
I agree with the answer of #NorthernSky. My answer should be seen as an alternative solution to the problem. Specifically addressing the comments on the accepted answer, saying that a correct and performant solution would be more complex.
I would propose to use a producer/consumer pattern here, using an unbounded blocking queue. The producers call incrementCount() which just adds a userId to the queue.
The consumer is scheduled to run every second and reads the queue into a HashMap, and then pushes the map's data to the DAO.
public class Aggregator {
private final Queue<String> queue = new LinkedBlockingQueue<>();
private final StatisticsDao statisticsDAO;
public Aggregator(StatisticsDao statisticsDAO) {
this.statisticsDAO = statisticsDAO;
}
public void incrementCount(String userId) {
queue.add(userId);
}
#Scheduled(every = "1s")
public void sendAggregatedStatisticsToDatabase() {
int size = queue.size();
HashMap<String, LongAdder> counts = new HashMap<>();
for (int i = 0; i < size; i++) {
counts.computeIfAbsent(queue.remove(), k -> new LongAdder()).increment();
}
counts.forEach((userId, adder) -> statisticsDAO.updateCount(userId, adder.sum()));
}
}
Even better would be to not have a scheduled consumer, but one that keeps reading from the queue into a local HashMap until a timout happens or a size threshold is reached, or even when the queue is empty.
Then it would process the current map and push it entirly into the DAO, clear the map and start reading the queue again until the next time there's enough data to process.

Data structure to manage a maximum number of requests per minute for an api

I need to send data to an external api, but this API has a limit of requests per endpoint(i.e: 60 requests per minute).
The data come from Kafka, then every message goes to redis(because I can send a request with 200 items). So, I use a simple cache to help me, and I can guarantee that if my server goes down, I wont lose any message.
The problem is, that there are moments when the Kafka starts to send to many messages, then the redis starts to grow(more than 1 million of messages to send to the api), and we can not make requests too fast as messages come in. Then, we have a big delay.
My first code was simple: ExecutorService executor = Executors.newFixedThreadPool(1);
This works very well, when there are few messages, and the delay is minimal.
So, the first thing that I did was change the executor to: ExecutorService executor = Executors.newCachedThreadPool();
So I can demand new threads as I need to make the requests to the external api faster, but, I have the problem with the limit of requests per minute.
There are endpoints that I can make 300 requests per minutes, others 500, others 30 and so on.
The code that I did is not very good, and this is for the company that I work, so, I really need to make this better.
So, every time that I am going to request the external api, I call the makeRequest method, this method is synchronized, I know that I could use a synchonized list, but I think that a synchronized method is better at this situation.
// This is an inner class
private static class IntegrationType {
final Queue<Long> requests; // This queue is used to store the timestamp of the requests
final int maxRequestsPerMinute; // How many requests I can make per minute
public IntegrationType(final int maxRequestsPerMinute) {
this.maxRequestsPerMinute = maxRequestsPerMinute;
this.requests = new LinkedList<>();
}
synchronized void makeRequest() {
final long current = System.currentTimeMillis();
requests.add(current);
if (requests.size() >= maxRequestsPerMinute) {
long first = requests.poll(); // gets the first request
// The difference between the current request and the first request of the queue
final int differenceInSeconds = (int) (current - first) / 1000;
// if the difference is less than the maximum allowed
if (differenceInSeconds <= 60) {
// seconds to sleep.
final int secondsToSleep = 60 - differenceInSeconds;
sleep(secondsToSleep);
}
}
}
void sleep( int seconds){
try {
Thread.sleep(seconds * 1000);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}
So, there is a Data Structure that I could use?
What considerations should I take?
Thanks in advance.
If understand your problem correctly, you can use a BlockingQueue with a ScheduledExecutorService as follows.
BlockingQueues have the method put which will only add the given element at the queue if there is available space, otherwise the method call will wait (until there is free space). They also have the method take which will only remove an element from the queue if there are any elements at all, otherwise the method call will wait (until there is at least one element to take).
Specifically you can use a LinkedBlockingQueue or an ArrayBlockingQueue which can be given with a fixed size of elements to hold at any given time. This fixed size means that you can submit with put as many requests as you like, but you will only take requests and process them once every second or something (so as to make 60 requests per minute for example).
To instantiate a LinkedBlockingQueue with fixed size, just use the corresponding constructor (which accepts the size as the argument). LinkedBlockingQueue will take elements in FIFO order according to its documentation.
To instantiate an ArrayBlockingQueue with fixed size, use the constructor which accepts the size but also the boolean flag named fair. If this flag is true then the queue will take elements also in FIFO order.
Then you can have a ScheduledExecutorService (instead of waiting inside a loop) where you can submit a single Runnable which will take from the queue, make the communication with the external API and then wait for the required delay between communications.
Follows a simple demonstration example of the above:
import java.util.Objects;
import java.util.concurrent.ArrayBlockingQueue;
import java.util.concurrent.BlockingQueue;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;
public class Main {
public static class RequestSubmitter implements Runnable {
private final BlockingQueue<Request> q;
public RequestSubmitter(final BlockingQueue<Request> q) {
this.q = Objects.requireNonNull(q);
}
#Override
public void run() {
try {
q.put(new Request()); //Will block until available capacity.
}
catch (final InterruptedException ix) {
System.err.println("Interrupted!"); //Not expected to happen under normal use.
}
}
}
public static class Request {
public void make() {
try {
//Let's simulate the communication with the external API:
TimeUnit.MILLISECONDS.sleep((long) (Math.random() * 100));
}
catch (final InterruptedException ix) {
//Let's say here we failed to communicate with the external API...
}
}
}
public static class RequestImplementor implements Runnable {
private final BlockingQueue<Request> q;
public RequestImplementor(final BlockingQueue<Request> q) {
this.q = Objects.requireNonNull(q);
}
#Override
public void run() {
try {
q.take().make(); //Will block until there is at least one element to take.
System.out.println("Request made.");
}
catch (final InterruptedException ix) {
//Here the 'taking' from the 'q' is interrupted.
}
}
}
public static void main(final String[] args) throws InterruptedException {
/*The following initialization parameters specify that we
can communicate with the external API 60 times per 1 minute.*/
final int maxRequestsPerTime = 60;
final TimeUnit timeUnit = TimeUnit.MINUTES;
final long timeAmount = 1;
final BlockingQueue<Request> q = new ArrayBlockingQueue<>(maxRequestsPerTime, true);
//final BlockingQueue<Request> q = new LinkedBlockingQueue<>(maxRequestsPerTime);
//Submit some RequestSubmitters to the pool...
final ExecutorService pool = Executors.newFixedThreadPool(100);
for (int i = 0; i < 50_000; ++i)
pool.submit(new RequestSubmitter(q));
System.out.println("Serving...");
//Find out the period between communications with the external API:
final long delayMicroseconds = TimeUnit.MICROSECONDS.convert(timeAmount, timeUnit) / maxRequestsPerTime;
//We could do the same with NANOSECONDS for more accuracy, but that would be overkill I think.
//The most important line probably:
Executors.newSingleThreadScheduledExecutor().scheduleWithFixedDelay(new RequestImplementor(q), 0L, delayMicroseconds, TimeUnit.MICROSECONDS);
}
}
Note that I used scheduleWithFixedDelay and not scheduleAtFixedRate. You can see in their documentation that the first one will wait for the delay between the end of the call of the submitted Runnable to start the next one, while the second one will not wait and just resubmit the Runnable every period time units. But we don't know how long does it take to communicate with the external API, so what if for example we scheduleAtFixedRate with a period of once every minute, but the request takes more than a minute to be completed?... Then a new request would be submitted while the first one is not yet finished. So that is why I used scheduleWithFixedDelay instead of scheduleAtFixedRate. But there is more: I used a single thread scheduled executor service. Does that mean that if the first call is not finished, then a second cannot be started?... Well it seems, if you take a look at the implementation of Executors#newSingleThreadScheduledExecutor(), that a second call may occur because single thread core pool size, does not mean that the pool is of fixed size.
Another reason that I used scheduleWithFixedDelay is because of underflow of requests. For example what about the queue being empty? Then the scheduling should also wait and not submit the Runnable again.
On the other hand, if we use scheduleWithFixedDelay, with say a delay of 1/60f seconds between scheduling, and there are submitted more than 60 requests in a single minute, then this will surely make our throughput to the external API drop, because with scheduleWithFixedDelay we can guarantee that at most 60 requests will be made to the external API. It can be less than that, but we don't want it to be. We would like to reach the limit every single time. If that's not a concern to you, then you can use the above implementation already.
But let's say you do care to reach as close to the limit as possible every time, in which case and as far as I know, you can do this with a custom scheduler, which would be less clean solution than the first, but more time accurate.
Bottomline, with the above implementation, you need to make sure that the communication with the external API to serve the requests is as fast as possible.
Finaly, I should warn you to consider that I couldn't find what happens if the BlockingQueue implementations I suggested are not puting in FIFO order. I mean, what if 2 requests arrive at almost the same time while the queue is full? They will both wait, but will the first one which arrived be waiting and get puted first, or the second one be puted first? I don't know. If you don't care about some requests being made at the external API out of order, then don't worry and use the code up to this point. If you do care however, and you are able to put for example a serial number at each request, then you can use a PriorityQueue after the BlockingQueue, or even experiment with PriorityBlockingQueue (which is unfortunately unbounded). That would complicate things even more, so I didn't post relevant code with the PriorityQueue. At least I did my best and I hope I shed some good ideas. I am not saying this post is a complete solution to all your problems, but it is some considerations to start with.
I implemented something different that what #gthanop suggested.
Something that I discover, is that the limits may change. So, I might need to grow or shrink the blocking list. Another reason, would not be so easily to adapt our current code to this. And another one, we might use more than one instance, so we will need a distributed lock.
So, I implement something more easily, but not so efficiently as the answer of #ghtanop.
Here is my code(adapted, cause I can not show the company code):
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.ScheduledExecutorService;
import java.util.HashMap;
import java.util.Map;
import java.util.concurrent.ConcurrentHashMap;
import java.util.concurrent.Semaphore;
public class Teste {
private static enum ExternalApi {
A, B, C;
}
private static class RequestManager {
private long firstRequest; // First request in one minute
// how many request have we made
private int requestsCount = 0;
// A timer thread, it will execute at every minute, it will refresh the request count and the first request time
private final ScheduledExecutorService executor = Executors.newScheduledThreadPool(1);
RequestManager() {
final long initialDelay = 0L;
final long fixedRate = 60;
executor.scheduleAtFixedRate(() -> {
System.out.println("Clearing the current count!");
requestsCount = 0;
firstRequest = System.currentTimeMillis();
}, initialDelay, fixedRate, TimeUnit.SECONDS);
}
void incrementRequest() {
requestsCount++;
}
long getFirstRequest() {
return firstRequest;
}
boolean requestsExceeded(final int requestLimit) {
return requestsCount >= requestLimit;
}
}
public static class RequestHelper {
private static final byte SECONDS_IN_MINUTE = 60;
private static final short MILLISECONDS_IN_SECOND = 1000;
private static final byte ZERO_SECONDS = 0;
// Table to support the time, and count of the requests
private final Map<Integer, RequestManager> requests;
// Table that contains the limits of each type of request
private final Map<Integer, Integer> requestLimits;
/**
* We need an array of semaphores, because, we might lock the requests for ONE, but not for TWO
*/
private final Semaphore[] semaphores;
private RequestHelper(){
// one semaphore for type
semaphores = new Semaphore[ExternalApi.values().length];
requests = new ConcurrentHashMap<>();
requestLimits = new HashMap<>();
for (final ExternalApi type : ExternalApi.values()) {
// Binary semaphore, must be fair, because we are updating things.
semaphores[type.ordinal()] = new Semaphore(1, true);
}
}
/**
* When my token expire, I must update this, because the limits might change.
* #param limits the new api limits
*/
protected void updateLimits(final Map<ExternalApi, Integer> limits) {
limits.forEach((key, value) -> requestLimits.put(key.ordinal(), value));
}
/**
* Increments the counter for the type of the request,
* Using the mutual exclusion lock, we can handle and block other threads that are trying to
* do a request to the api.
* If the incoming requests are going to exceed the maximum, we will make the thread sleep for N seconds ( 60 - time since first request)
* since we are using a Binary Semaphore, it will block incoming requests until the thread that is sleeping, wakeup and release the semaphore lock.
*
* #param type of the integration, Supp, List, PET etc ...
*/
protected final void addRequest(final ExternalApi type) {
// the index of this request
final int requestIndex = type.ordinal();
// we get the permit for the semaphore of the type
final Semaphore semaphore = semaphores[requestIndex];
// Try to acquire a permit, if no permit is available, it will block until one is available.
semaphore.acquireUninterruptibly();
///gets the requestManager for the type
final RequestManager requestManager = getRequest(requestIndex);
// increments the number of requests
requestManager.incrementRequest();
if (requestManager.requestsExceeded(requestLimits.get(type.ordinal()))) {
// the difference in seconds between a minute - the time that we needed to reach the maximum of requests
final int secondsToSleep = SECONDS_IN_MINUTE - (int) (System.currentTimeMillis() - requestManager.getFirstRequest()) / MILLISECONDS_IN_SECOND;
// We reached the maximum in less than a minute
if (secondsToSleep > ZERO_SECONDS) {
System.out.printf("We reached the maximum of: %d per minute by: %s. We must wait for: %d before make a new request!\n", requestLimits.get(type.ordinal()), type.name(), secondsToSleep);
sleep(secondsToSleep * MILLISECONDS_IN_SECOND);
}
}
// releases the semaphore
semaphore.release();
}
private final void sleep(final long time) {
try {
Thread.sleep(time);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
/**
* Gets the first Request Manager, if it is the first request, it will create the
* RequestManager object
* #param index
* #return a RequestManager instance
*/
private RequestManager getRequest(final int index) {
RequestManager request = requests.get(index);
if(request == null) {
request = new RequestManager();
requests.put(index, request);
}
return request;
}
}
public static void main(String[] args) {
final RequestHelper requestHelper = new RequestHelper();
final Map<ExternalApi, Integer> apiLimits = Map.of(ExternalApi.A, 30, ExternalApi.B, 60, ExternalApi.C, 90);
// update the limits
requestHelper.updateLimits(apiLimits);
final ScheduledExecutorService executor = Executors.newScheduledThreadPool(3);
executor.scheduleWithFixedDelay(() -> {
System.out.println("A new request is going to happen");
requestHelper.addRequest(ExternalApi.A);
sleep(65);
}, 0, 100, TimeUnit.MILLISECONDS);
executor.scheduleWithFixedDelay(() -> {
System.out.println("B new request is going to happen");
requestHelper.addRequest(ExternalApi.B);
sleep(50);
}, 0, 200, TimeUnit.MILLISECONDS);
executor.scheduleWithFixedDelay(() -> {
System.out.println("C new request is going to happen");
requestHelper.addRequest(ExternalApi.C);
sleep(30);
}, 0, 300, TimeUnit.MILLISECONDS);
}
private static final void sleep(final long time) {
try {
Thread.sleep(time);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}

How to prioritise waiting CompletableFutures by access time instead of creation time?

TL;DR: When several CompletableFutures are waiting to get executed, how can I prioritize those whose values i'm interested in?
I have a list of 10,000 CompletableFutures (which calculate the data rows for an internal report over the product database):
List<Product> products = ...;
List<CompletableFuture<DataRow>> dataRows = products
.stream()
.map(p -> CompletableFuture.supplyAsync(() -> calculateDataRowForProduct(p), singleThreadedExecutor))
.collect(Collectors.toList());
Each takes around 50ms to complete, so the entire thing finishes in 500sec. (they all share the same DB connection, so cannot run in parallel).
Let's say I want to access the data row of the 9000th product:
dataRows.get(9000).join()
The problem is, all these CompletableFutures are executed in the order they have been created, not in the order they are accessed. Which means I have to wait 450sec for it to calculate stuff that at the moment I don't care about, to finally get to the data row I want.
Question:
Is there any way to change this behaviour, so that the Futures I try to access get priority over those I don't care about at the moment?
First thoughts:
I noticed that a ThreadPoolExecutor uses a BlockingQueue<Runnable> to queue up entries waiting for an available Thread.
So I thought about using a PriorityBlockingQueue, to change the priority of the Runnable when I access its CompletableFuture but:
PriorityBlockingQueue does not have a method to reprioritize an existing element, and
I need to figure out a way to get from the CompletableFuture to the corresponding Runnable entry in the queue.
Before I go further down this road, do you think this sounds like the correct approach. Do others ever had this kind of requirement? I tried to search for it, but found exactly nothing. Maybe CompletableFuture is not the correct way of doing this?
Background:
We have an internal report which displays 100 products per page. Initially we precalculated all DataRows for the report, which took way to long if someone has that many products.
So first optimization was to wrap the calculation in a memoized supplier:
List<Supplier<DataRow>> dataRows = products
.stream()
.map(p -> Suppliers.memoize(() -> calculateDataRowForProduct(p)))
.collect(Collectors.toList());
This means that initial display of first 100 entries now takes 5sec instead of 500sec (which is great), but when the user switches to the next pages, it takes another 5sec for each single one of them.
So the idea is, while the user is staring at the first screen, why not precalculate the next pages in the background. Which leads me to my question above.
Interesting problem :)
One way is to roll out custom FutureTask class to facilitate changing priorities of tasks dynamically.
DataRow and Product are both taken as just String here for simplicity.
import java.util.*;
import java.util.concurrent.*;
public class Testing {
private static String calculateDataRowForProduct(String product) {
try {
// Dummy operation.
Thread.sleep(200);
} catch (InterruptedException e) {
e.printStackTrace();
}
System.out.println("Computation done for " + product);
return "data row for " + product;
}
public static void main(String[] args) throws ExecutionException, InterruptedException {
PriorityBlockingQueue<Runnable> customQueue = new PriorityBlockingQueue<Runnable>(1, new CustomRunnableComparator());
ThreadPoolExecutor executor = new ThreadPoolExecutor(1, 1, 0L, TimeUnit.MILLISECONDS, customQueue);
List<String> products = new ArrayList<>();
for (int i = 0; i < 10; i++) {
products.add("product" + i);
}
Map<Integer, PrioritizedFutureTask<String>> taskIndexMap = new HashMap<>();
for (int i = 0; i < products.size(); i++) {
String product = products.get(i);
Callable callable = () -> calculateDataRowForProduct(product);
PrioritizedFutureTask<String> dataRowFutureTask = new PrioritizedFutureTask<>(callable, i);
taskIndexMap.put(i, dataRowFutureTask);
executor.execute(dataRowFutureTask);
}
List<Integer> accessOrder = new ArrayList<>();
accessOrder.add(4);
accessOrder.add(7);
accessOrder.add(2);
accessOrder.add(9);
int priority = -1 * accessOrder.size();
for (Integer nextIndex : accessOrder) {
PrioritizedFutureTask taskAtIndex = taskIndexMap.get(nextIndex);
assert (customQueue.remove(taskAtIndex));
customQueue.offer(taskAtIndex.set_priority(priority++));
// Now this task will be at the front of the thread pool queue.
// Hence this task will execute next.
}
for (Integer nextIndex : accessOrder) {
PrioritizedFutureTask<String> dataRowFutureTask = taskIndexMap.get(nextIndex);
String dataRow = dataRowFutureTask.get();
System.out.println("Data row for index " + nextIndex + " = " + dataRow);
}
}
}
class PrioritizedFutureTask<T> extends FutureTask<T> implements Comparable<PrioritizedFutureTask<T>> {
private Integer _priority = 0;
private Callable<T> callable;
public PrioritizedFutureTask(Callable<T> callable, Integer priority) {
super(callable);
this.callable = callable;
_priority = priority;
}
public Integer get_priority() {
return _priority;
}
public PrioritizedFutureTask set_priority(Integer priority) {
_priority = priority;
return this;
}
#Override
public int compareTo(#NotNull PrioritizedFutureTask<T> other) {
if (other == null) {
throw new NullPointerException();
}
return get_priority().compareTo(other.get_priority());
}
}
class CustomRunnableComparator implements Comparator<Runnable> {
#Override
public int compare(Runnable task1, Runnable task2) {
return ((PrioritizedFutureTask)task1).compareTo((PrioritizedFutureTask)task2);
}
}
Output:
Computation done for product0
Computation done for product4
Data row for index 4 = data row for product4
Computation done for product7
Data row for index 7 = data row for product7
Computation done for product2
Data row for index 2 = data row for product2
Computation done for product9
Data row for index 9 = data row for product9
Computation done for product1
Computation done for product3
Computation done for product5
Computation done for product6
Computation done for product8
There is one more scope of optimization here.
The customQueue.remove(taskAtIndex) operation has O(n) time complexity with respect to the size of the queue (or the total number of products).
It might not affect much if the number of products is less (<= 10^5).
But it might result in a performance issue otherwise.
One solution to that is to extend BlockingPriorityQueue and roll out functionality to remove an element from a priority queue in O(logn) rather than O(n).
We can achieve that by keeping a hashmap inside the PriorityQueue structure. This hashmap will keep a count of elements vs the index (or indices in case of duplicates) of that element in the underlying array.
Fortunately, I had already implemented such a heap in Python sometime back.
If you have more questions on this optimization, its probably better to ask a new question altogether.
You could avoid submitting all of the tasks to the executor at the start, instead only submit one background task and when it finishes submit the next. If you want to get the 9000th row submit it immediately (if it has not already been submitted):
static class FutureDataRow {
CompletableFuture<DataRow> future;
int index;
List<FutureDataRow> list;
Product product;
FutureDataRow(List<FutureDataRow> list, Product product){
this.list = list;
index = list.size();
list.add(this);
this.product = product;
}
public DataRow get(){
submit();
return future.join();
}
private synchronized void submit(){
if(future == null) future = CompletableFuture.supplyAsync(() ->
calculateDataRowForProduct(product), singleThreadedExecutor);
}
private void background(){
submit();
if(index >= list.size() - 1) return;
future.whenComplete((dr, t) -> list.get(index + 1).background());
}
}
...
List<FutureDataRow> dataRows = new ArrayList<>();
products.forEach(p -> new FutureDataRow(dataRows, p));
dataRows.get(0).background();
If you want you could also submit the next row inside the get method if you expect that they will navigate to the next page afterwards.
If you were instead using a multithreaded executor and you wanted to run multiple background tasks concurrently you could modify the background method to find the next unsubmitted task in the list and start it when the current background task has finished.
private synchronized boolean background(){
if(future != null) return false;
submit();
future.whenComplete((dr, t) -> {
for(int i = index + 1; i < list.size(); i++){
if(list.get(i).background()) return;
}
});
return true;
}
You would also need to start the first n tasks in the background instead of just the first one.
int n = 8; //number of active background tasks
for(int i = 0; i < dataRows.size() && n > 0; i++){
if(dataRows.get(i).background()) n--;
}
To answer my own question...
There is a surprisingly simple (and surprisingly boring) solution to my problem. I have no idea why it took me three days to find it, I guess it required the right mindset, that you only have when walking along an endless tranquilizing beach looking into the sunset on a quiet Sunday evening.
So, ah, it's a little bit embarrassing to write this, but when I need to fetch a certain value (say for 9000th product), and the future has not yet computed that value, I can, instead of somehow forcing the future to produce that value asap (by doing all this repriorisation and scheduling magic), I can, well, I can, ... simply ... compute that value myself! Yes! Wait, what? Seriously, that's it?
It's something like this: if (!future.isDone()) {future.complete(supplier.get());}
I just need to store the original Supplier alongside the CompletableFuture in some wrapper class. This is the wrapper class, which works like a charm, all it needs is a better name:
public static class FuturizedMemoizedSupplier<T> implements Supplier<T> {
private CompletableFuture<T> future;
private Supplier<T> supplier;
public FuturizedSupplier(Supplier<T> supplier) {
this.supplier = supplier;
this.future = CompletableFuture.supplyAsync(supplier, singleThreadExecutor);
}
public T get() {
// if the future is not yet completed, we just calculate the value ourselves, and set it into the future
if (!future.isDone()) {
future.complete(supplier.get());
}
supplier = null;
return future.join();
}
}
Now, I think, there is a small chance for a race condition here, which could lead to the supplier being executed twice. But actually, I don't care, it produces the same value anyway.
Afterthoughts:
I have no idea why I didn't think of this earlier, I was completely fixated on the idea, it has to be the CompletableFuture which calculates the value, and it has to run in one of these background threads, and whatnot, and, well, none of these mattered or were in any way a requirement.
I think this whole question is a classic example of Ask what problem you really want to solve instead of coming up with a half baked broken solution, and ask how to fix that. In the end, I didn't care about CompletableFuture or any of its features at all, it was just the easiest way that came to my mind to run something in the background.
Thanks for your help!

Passing a Set of Objects between threads

The current project I am working on requires that I implement a way to efficiently pass a set of objects from one thread, that runs continuously, to the main thread. The current setup is something like the following.
I have a main thread which creates a new thread. This new thread operates continuously and calls a method based on a timer. This method fetches a group of messages from an online source and organizes them in a TreeSet.
This TreeSet then needs to be passed back to the main thread so that the messages it contains can be handled independent of the recurring timer.
For better reference my code looks like the following
// Called by the main thread on start.
void StartProcesses()
{
if(this.IsWindowing)
{
return;
}
this._windowTimer = Executors.newSingleThreadScheduledExecutor();
Runnable task = new Runnable() {
public void run() {
WindowCallback();
}
};
this.CancellationToken = false;
_windowTimer.scheduleAtFixedRate(task,
0, this.SQSWindow, TimeUnit.MILLISECONDS);
this.IsWindowing = true;
}
/////////////////////////////////////////////////////////////////////////////////
private void WindowCallback()
{
ArrayList<Message> messages = new ArrayList<Message>();
//TODO create Monitor
if((!CancellationToken))
{
try
{
//TODO fix epochWindowTime
long epochWindowTime = 0;
int numberOfMessages = 0;
Map<String, String> attributes;
// Setup the SQS client
AmazonSQS client = new AmazonSQSClient(new
ClasspathPropertiesFileCredentialsProvider());
client.setEndpoint(this.AWSSQSServiceUrl);
// get the NumberOfMessages to optimize how to
// Receive all of the messages from the queue
GetQueueAttributesRequest attributesRequest =
new GetQueueAttributesRequest();
attributesRequest.setQueueUrl(this.QueueUrl);
attributesRequest.withAttributeNames(
"ApproximateNumberOfMessages");
attributes = client.getQueueAttributes(attributesRequest).
getAttributes();
numberOfMessages = Integer.valueOf(attributes.get(
"ApproximateNumberOfMessages")).intValue();
// determine if we need to Receive messages from the Queue
if (numberOfMessages > 0)
{
if (numberOfMessages < 10)
{
// just do it inline it's less expensive than
//spinning threads
ReceiveTask(numberOfMessages);
}
else
{
//TODO Create a multithreading version for this
ReceiveTask(numberOfMessages);
}
}
if (!CancellationToken)
{
//TODO testing
_setLock.lock();
Iterator<Message> _setIter = _set.iterator();
//TODO
while(_setIter.hasNext())
{
Message temp = _setIter.next();
Long value = Long.valueOf(temp.getAttributes().
get("Timestamp"));
if(value.longValue() < epochWindowTime)
{
messages.add(temp);
_set.remove(temp);
}
}
_setLock.unlock();
// TODO deduplicate the messages
// TODO reorder the messages
// TODO raise new Event with the results
}
if ((!CancellationToken) && (messages.size() > 0))
{
if (messages.size() < 10)
{
Pair<Integer, Integer> range =
new Pair<Integer, Integer>(Integer.valueOf(0),
Integer.valueOf(messages.size()));
DeleteTask(messages, range);
}
else
{
//TODO Create a way to divide this work among
//several threads
Pair<Integer, Integer> range =
new Pair<Integer, Integer>(Integer.valueOf(0),
Integer.valueOf(messages.size()));
DeleteTask(messages, range);
}
}
}catch (AmazonServiceException ase){
ase.printStackTrace();
}catch (AmazonClientException ace) {
ace.printStackTrace();
}
}
}
As can be seen by some of the commenting, my current preferred way to handle this is by creating an event in the timer thread if there are messages. The main thread will then be listening for this event and handle it appropriately.
Presently I am unfamiliar with how Java handles events, or how to create/listen for them. I also do not know if it is possible to create events and have the information contained within them passed between threads.
Can someone please give me some advice/insight on whether or not my methods are possible? If so, where might I find some information on how to implement them as my current searching attempts are not proving fruitful.
If not, can I get some suggestions on how I would go about this, keeping in mind I would like to avoid having to manage sockets if at all possible.
EDIT 1:
The main thread will also be responsible for issuing commands based on the messages it receives, or issuing commands to get required information. For this reason the main thread cannot wait on receiving messages, and should handle them in an event based manner.
Producer-Consumer Pattern:
One thread(producer) continuosly stacks objects(messages) in a queue.
another thread(consumer) reads and removes objects from the queue.
If your problem fits to this, Try "BlockingQueue".
http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/BlockingQueue.html
It is easy and effective.
If the queue is empty, consumer will be "block"ed, which means the thread waits(so do not uses cpu time) until producer puts some objects. otherwise cosumer continuosly consumes objects.
And if the queue is full, prducer will be blocked until consumer consumes some objects to make a room in the queue, vice versa.
Here's a example:
(a queue should be same object in both producer and consumer)
(Producer thread)
Message message = createMessage();
queue.put(message);
(Consumer thread)
Message message = queue.take();
handleMessage(message);

How do I return interim results from Callable?

I have Callable object executed using ExecutorService.
How to return interim results from this callable?
I know there is javax.swing.SwingWorker#publish(results) for Swing but I don't use Swing.
There are a couple of ways of doing this. You could do it with a callback or you could do it with a queue.
Here's an example of doing it with a callback:
public static interface Callback<T> {
public void on(T event);
}
Then, an implementation of the callback that does something with your in progress events:
final Callback<String> callback = new Callback<String>() {
public void on(String event) {
System.out.println(event);
}
};
Now you can use the callback in your pool:
Future<String> submit = pool.submit(new Callable<String>() {
public String call() throws Exception {
for(int i = 0; i < 10; i++) {
callback.on("process " + i);
}
return "done";
}
});
It is not clear what an "interim result" really is. The interfaces used in the concurrency package simply do not define this, but assume methods that resemble more or less pure functions.
Hence, instead this:
interim = compute something
finalresult = compute something else
do something like this:
interim = compute something
final1 = new Pair( interim, fork(new Future() { compute something else }) )
(Pseudocode, thought to convey the idea, not compileable code)
EDIT The idea is: instead of running a single monolithic block of computations (that happens to reach a state where some "interim results" are available) break it up so that the first task returns the former "interim" result and, at the same time, forks a second task that computes the final result. Of course, a handle to this task must be delivered to the caller so that it eventually can get the final result. Usually, this is done with the Future interface.
You can pass, let's say, an AtomicInteger to your class (the one that will be submitted by the executor) inside that class you increment it's value and from the calling thread you check it's value
Something like this:
public class LongComputation {
private AtomicInteger progress = new AtomicInteger(0);
public static void main(String[] args) throws InterruptedException,
ExecutionException {
AtomicInteger progress = new AtomicInteger(0);
LongComputation computation = new LongComputation(progress);
ExecutorService executor = Executors.newFixedThreadPool(2);
Future<Integer> result = executor.submit(() -> computation.compute());
executor.shutdown();
while (!result.isDone()) {
System.out.printf("Progress...%d%%%n", progress.intValue());
TimeUnit.MILLISECONDS.sleep(100);
}
System.out.printf("Result=%d%n", result.get());
}
public LongComputation(AtomicInteger progress) {
this.progress = progress;
}
public int compute() throws InterruptedException {
for (int i = 0; i < 100; i++) {
TimeUnit.MILLISECONDS.sleep(100);
progress.incrementAndGet();
}
return 1_000_000;
}
}
What you're looking for is java.util.concurrent.Future.
A Future represents the result of an asynchronous computation. Methods
are provided to check if the computation is complete, to wait for its
completion, and to retrieve the result of the computation. The result
can only be retrieved using method get when the computation has
completed, blocking if necessary until it is ready. Cancellation is
performed by the cancel method. Additional methods are provided to
determine if the task completed normally or was cancelled. Once a
computation has completed, the computation cannot be cancelled. If you
would like to use a Future for the sake of cancellability but not
provide a usable result, you can declare types of the form Future
and return null as a result of the underlying task.
You would have to roll your own API with something like Observer/Observerable if you want to publish intermediate results as a push. A simpler thing would be to just poll for current state through some self defined method.

Categories