Stream generate at fixed rate - java

I'm using Stream.generate to get data from Instagram. As instagram limits calls per hour I want generate to run less frequent then every 2 seconds.
I've chosen such title because I moved from ScheduledExecutorService.scheduleAtFixedRate and that's what I was searching for. I do realise that stream intermediate operations are lazy and cannot be called on schedule. If you have better idea for title let me know.
So again I want to have at least 2 second delay between genations.
My attempt wich doesn't take into consideration time consumed by operations after generate, which might take longer then 2s:
Stream.generate(() -> {
List<MediaFeedData> feedDataList = null;
while (feedDataList == null) {
try {
Thread.sleep(2000);
feedDataList = newData();
} catch (InstagramException e) {
notifyError(e.getMessage());
e.printStackTrace();
} catch (InterruptedException e) {
e.printStackTrace();
}
}
return feedDataList;
})

A solution would be to decouple the generator from the Stream, for example using a BlockingQueue
final BlockingQueue<Integer> queue = new LinkedBlockingQueue<>(100);
ScheduledExecutorService scheduler = new ScheduledThreadPoolExecutor(1);
scheduler.scheduleAtFixedRate(() -> {
// Generate new data every 2s, regardless of their processing rate
ThreadLocalRandom random = ThreadLocalRandom.current();
queue.offer(random.nextInt(10));
}, 0, 2, TimeUnit.SECONDS);
Stream.generate(() -> {
try {
// Accept new data if ready, or wait for some more to be generated
return queue.take();
} catch (InterruptedException e) {}
return -1;
}).forEach(System.out::println);
If the data processing takes more than 2s, new data will be enqueued and wait to be consumed. If it takes less than 2s, the take method in the generator will wait for new data to be produced by the scheduler.
This way, you are guaranteed to make less than N calls per hour to Instagram !

As far as I understand, your question is about solving two problems:
waiting at a fixed rate rather than a fixed delay
creating a stream for an unknown number of items which allows processing until some point of time (i.e. is not infinite)
You can solve the first task by using a deadline-based waiting and the second by implementing a Spliterator:
Stream<List<MediaFeedData>> stream = StreamSupport.stream(
new Spliterators.AbstractSpliterator<List<MediaFeedData>>(Long.MAX_VALUE, 0) {
long lastTime=System.currentTimeMillis();
#Override
public boolean tryAdvance(Consumer<? super List<MediaFeedData>> action) {
if(quitCondition()) return false;
List<MediaFeedData> feedDataList = null;
while (feedDataList == null) {
lastTime+=TimeUnit.SECONDS.toMillis(2);
while(System.currentTimeMillis()<lastTime)
LockSupport.parkUntil(lastTime);
try {
feedDataList=newData();
} catch (InstagramException e) {
notifyError(e.getMessage());
if(QUIT_ON_EXCEPTION) return false;
}
}
action.accept(feedDataList);
return true;
}
}, false);

Make a Timer and a semaphore. The timer raises the semaphore every 2 seconds, and in the stream you wait on every call for the semaphore.
This keeps the waits to the specified minimum (2 s), and - funnily - would even work with .parallel().
private final volatile Semaphore tickingSemaphore= new Semaphore(1, true);
In its own thread:
Stream.generate(() -> {
tickingSemaphore.acquire();
...
};
In the timer:
tickingSemaphore.release();

Related

How to check when all CompleteableFuture are done?

I have a Stream<Item> which I'm mapping to a CompleteableFuture<ItemResult>
What I'd like to do is to know when all the futures are completed.
One may suggest to:
collect all the futures to an array and use CompleteableFuture.allOf(). This is somewhat problematic since there could be hundreds of thousands of items
just continue with forEach(CompleteableFuture::join). This is problematic too as calling forEach with join will just block the stream and it will be essentially a serial processing and not concurrent
Inject a poisoned item in the end of the stream. This could work but it's not that elegant in my view
check if the executor queue is empty - This is quite limiting because I might use more than one executor in the future. Also, the queue can be momentarily empty
Monitor the database instead and check the number of new items
I feel like all the suggested solutions aren't good enough.
What is the appropriate way to monitor the futures?
Thanks
EDIT:
another (vague) idea I had in mind is to use a counter and wait for it to go down to zero. But again, need to check that it's not a momentarily 0..
Disclaimer: I'm not sure whether Phaser is the right tool here, and if yes, whether it's better to have one root with multiple children or to chain them like I'm proposing below, so feel free to correct me.
Here's one approach that uses Phaser.
A Phaser has a limited number of parties, so we need to create a new child Phaser if that limit is about to get reached:
private Phaser register(Phaser phaser) {
if (phaser.getRegisteredParties() < 65534) {
// warning: side-effect,
// conflicts with AtomicReference#updateAndGet recommendation,
// might not fit well if the Stream is parallel:
phaser.register();
return phaser;
} else {
return new Phaser(phaser, 1);
}
}
Register each CompletableFuture against that Phaser chain, and deregister once done:
private void register(CompletableFuture<?> future, AtomicReference<Phaser> phaser) {
Phaser registeredPhaser = phaser.updateAndGet(this::register);
future
.thenRun(registeredPhaser::arriveAndDeregister)
.exceptionally(e -> {
// log e?
registeredPhaser.arriveAndDeregister();
return null;
});
}
Wait for all futures to be finished:
private <T> void await(Stream<CompletableFuture<T>> futures) {
Phaser rootPhaser = new Phaser(1);
AtomicReference<Phaser> phaser = new AtomicReference<>(rootPhaser);
futures.forEach(future -> register(future, phaser));
rootPhaser.arriveAndAwaitAdvance();
rootPhaser.arriveAndDeregister();
}
Example:
ExecutorService executor = Executors.newFixedThreadPool(500);
// creating fake stream with 500,000 futures:
Stream<CompletableFuture<Integer>> stream = IntStream
.rangeClosed(1, 500_000)
.mapToObj(i -> CompletableFuture.supplyAsync(() -> {
try {
TimeUnit.MILLISECONDS.sleep(10);
if (i % 50_000 == 0) {
System.out.println(Thread.currentThread().getName() + ": " + i);
}
return i;
} catch (InterruptedException e) {
throw new IllegalStateException(e);
}
}, executor));
// usage:
await(stream);
System.out.println("Done");
Outputs:
pool-1-thread-348: 50000
pool-1-thread-395: 100000
pool-1-thread-333: 150000
pool-1-thread-30: 200000
pool-1-thread-120: 250000
pool-1-thread-10: 300000
pool-1-thread-241: 350000
pool-1-thread-340: 400000
pool-1-thread-283: 450000
pool-1-thread-176: 500000
Done

How to enforce timeout and cancel async CompletableFuture Jobs

I am using Java 8, and I want to know the recommended way to enforce timeout on 3 async jobs that I would to execute async and retrieve the result from the future. Note that the timeout is the same for all 3 jobs. I also want to cancel the job if it goes beyond time limit.
I am thinking something like this:
// Submit jobs async
List<CompletableFuture<String>> futures = submitJobs(); // Uses CompletableFuture.supplyAsync
List<CompletableFuture<Void>> all = CompletableFuture.allOf(futures.toArray(new CompletableFuture[0]));
try {
allFutures.get(100L, TimeUnit.MILLISECONDS);
} catch (TimeoutException e){
for(CompletableFuture f : future) {
if(!f.isDone()) {
/*
From Java Doc:
#param mayInterruptIfRunning this value has no effect in this
* implementation because interrupts are not used to control
* processing.
*/
f.cancel(true);
}
}
}
List<String> output = new ArrayList<>();
for(CompeletableFuture fu : futures) {
if(!fu.isCancelled()) { // Is this needed?
output.add(fu.join());
}
}
return output;
Will something like this work? Is there a better way?
How to cancel the future properly? Java doc says, thread cannot be interrupted? So, if I were to cancel a future, and call join(), will I get the result immediately since the thread will not be interrupted?
Is it recommended to use join() or get() to get the result after waiting is over?
It is worth noting that calling cancel on CompletableFuture is effectively the same as calling completeExceptionally on the current stage. The cancellation will not impact prior stages. With that said:
In principle, something like this will work assuming upstream cancellation is not necessary (from a pseudocode perspective, the above has syntax errors).
CompletableFuture cancellation will not interrupt the current thread. Cancellation will cause all downstream stages to be triggered immediately with a CancellationException (will short circuit the execution flow).
'join' and 'get' are effectively the same in the case where the caller is willing to wait indefinitely. Join handles wrapping the checked Exceptions for you. If the caller wants to timeout, get will be needed.
Including a segment to illustrate the behavior on cancellation. Note how downstream processes will not be started, but upstream processes continue even after cancellation.
public static void main(String[] args) throws Exception
{
int maxSleepTime = 1000;
Random random = new Random();
AtomicInteger value = new AtomicInteger();
List<String> calculatedValues = new ArrayList<>();
Supplier<String> process = () -> { try { Thread.sleep(random.nextInt(maxSleepTime)); System.out.println("Stage 1 Running!"); } catch (InterruptedException e) { e.printStackTrace(); } return Integer.toString(value.getAndIncrement()); };
List<CompletableFuture<String>> stage1 = IntStream.range(0, 10).mapToObj(val -> CompletableFuture.supplyAsync(process)).collect(Collectors.toList());
List<CompletableFuture<String>> stage2 = stage1.stream().map(Test::appendNumber).collect(Collectors.toList());
List<CompletableFuture<String>> stage3 = stage2.stream().map(Test::printIfCancelled).collect(Collectors.toList());
CompletableFuture<Void> awaitAll = CompletableFuture.allOf(stage2.toArray(new CompletableFuture[0]));
try
{
/*Wait 1/2 the time, some should be complete. Some not complete -> TimeoutException*/
awaitAll.get(maxSleepTime / 2, TimeUnit.MILLISECONDS);
}
catch(TimeoutException ex)
{
for(CompletableFuture<String> toCancel : stage2)
{
boolean irrelevantValue = false;
if(!toCancel.isDone())
toCancel.cancel(irrelevantValue);
else
calculatedValues.add(toCancel.join());
}
}
System.out.println("All futures Cancelled! But some Stage 1's may still continue printing anyways.");
System.out.println("Values returned as of cancellation: " + calculatedValues);
Thread.sleep(maxSleepTime);
}
private static CompletableFuture<String> appendNumber(CompletableFuture<String> baseFuture)
{
return baseFuture.thenApply(val -> { System.out.println("Stage 2 Running"); return "#" + val; });
}
private static CompletableFuture<String> printIfCancelled(CompletableFuture<String> baseFuture)
{
return baseFuture.thenApply(val -> { System.out.println("Stage 3 Running!"); return val; }).exceptionally(ex -> { System.out.println("Stage 3 Cancelled!"); return ex.getMessage(); });
}
If it is necessary to cancel the upstream process (ex: cancel some network call), custom handling will be needed.
After calling cancel you cannot join the furure, since you get an exception.
One way to terminate the computation is to let it have a reference to the future and check it periodically: if it was cancelled abort the computation from inside.
This can be done if the computaion is a loop where at each iteration you can do the check.
Do you need it to be a CompletableFuture? Cause another way is to avoid to use a CompleatableFuture, and use a simple Future or a FutureTask instead: if you execute it with an Executor calling future.cancel(true) will terminate the computation if possbile.
Answerring to the question: "call join(), will I get the result immediately".
No you will not get it immediately, it will hang and wait to complete the computation: there is no way to force a computation that takes a long time to complete in a shorter time.
You can call future.complete(value) providing a value to be used as default result by other threads that have a reference to that future.

How can multithreading help increase performance in this situation?

I have a piece of code like this:
while(){
x = jdbc_readOperation();
y = getTokens(x);
jdbc_insertOperation(y);
}
public List<String> getTokens(String divText){
List<String> tokenList = new ArrayList<String>();
Matcher subMatcher = Pattern.compile("\\[[^\\]]*]").matcher(divText);
while (subMatcher.find()) {
String token = subMatcher.group();
tokenList.add(token);
}
return tokenList;
}
What I know is using multithreading can save time when one thread is get blocked by I/O or network. In this synchronous operations every step have to wait for its previous step get finished. What I want here is to maximize cpu utilization on getTokens().
My first thought is put getTokens() in the run method of a class, and create multiple threads. But I think it will not work since it seems not able to get performance benefit by having multiple threads on pure computation operations.
Is adoption of multithreading going to help increase performance in this case? If so, how can I do that?
It will depend on the pace that jdbc_readOperation() produces data to be processed in comparison with the pace that getTokens(x) processes the data. Knowing that will help you figure out if multi-threading is going to help you.
You could try something like this (just for you to get the idea):
int workToBeDoneQueueSize = 1000;
int workDoneQueueSize = 1000;
BlockingQueue<String> workToBeDone = new LinkedBlockingQueue<>(workToBeDoneQueueSize);
BlockingQueue<String> workDone = new LinkedBlockingQueue<>(workDoneQueueSize);
new Thread(() -> {
try {
while (true) {
workToBeDone.put(jdbc_readOperation());
}
} catch (InterruptedException e) {
e.printStackTrace();
// handle InterruptedException here
}
}).start();
int numOfWorkerThreads = 5; // just an example
for (int i = 0; i < numOfWorkerThreads; i++) {
new Thread(() -> {
try {
while (true) {
workDone.put(getTokens(workToBeDone.take()));
}
} catch (InterruptedException e) {
e.printStackTrace();
// handle InterruptedException here
}
}).start();
}
new Thread(() -> {
// you could improve this by making a batch operation
try {
while (true) {
jdbc_insertOperation(workDone.take());
}
} catch (InterruptedException e) {
e.printStackTrace();
// handle InterruptedException here
}
}).start();
Or you could learn how to use the ThreadPoolExecutor. (https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/ThreadPoolExecutor.html)
Okay to speed up getTokens() you can split the inputted String divText by using String.substring() method. You split it into as many substrings as you will run Threads running the getTokens() method. Then every Thread will "run" on a certain substring of divText.
Creating more Threads than the CPU can handle should be avoided since context switches create inefficiency.
https://docs.oracle.com/javase/8/docs/api/java/lang/String.html#substring-int-int-
An alternative could be splitting the inputted String of getTokens with the String.split method http://docs.oracle.com/javase/7/docs/api/java/lang/String.html#split%28java.lang.String%29 e.g. in case the text is made up of words seperated by spaces or other symbols. Then specific parts of the resulting String array could be passed to different Threads.

run threads according to the time limit

I want to start max 40 http requests each second and after 1 second, I want it to run another 40 from its own queue(like threadpooltaskexecutor's blocking queue). I am looking for an executor or thread pool implementation for this requirement.
Any recommendations?
Thx
Ali
EDIT: Fix rate is not efficient for the obvious reasons. As the queue items start one by one, the ones on the back of the queue will be just started but ones that has been started for a while may be finished.
Extra EDIT: The problem is to call only 40 request in a second, not have max 40 active. It can be 80 at other second but in 1 second there should only 40 newly created connections.
One way to do this is to use another architecture, it will make the process that much easiser.
1) Create a Thread class that implements the runnable.
2) It takes as parameters a list<>() of http requests that you want to make
3) Make the run() function loop the entire list (size 40)
4) Let the thread live for one second.
Here is a sample example:
class MyClass extends Thread
private ArrayList<...> theList;
public MyClass(ArrayList<..> theList){
this.theList = theList;
}
public void run(){
//Here, you simply want to loop for the entier list (max 40)
for(Req r: theList){
r.sendRequest()
)
}
public statc void main(String args[]){
//Create an instance of your thread:
MyClass t = new MyClass(ReqList<..>());
//Now that you have your thread, simply do the following:
while(true){
t = new MyClass( (insert your new list));
t.start();
try{
Thread.sleep(1000);
}catch(Exception e){
}
)
}
}
And there you have it
First define a class that implements Callable which will do your thread's treatment :
class MyClass implements Callable<String>
{
/**
* Consider this as a new Thread.
*/
#Override
public String call()
{
//Treatment...
return "OK"; //Return whatever the thread's result you want to be then you can access it and do the desired treatment.
}
}
Next step is to create an ExecutorService in my example, a Thread pool and throw in some tasks.
int nbThreadToStart = 40;
ExecutorService executor = Executors.newFixedThreadPool(/* Your thread pool limit */);
List<Future<String>> allTasks = new ArrayList<Future<String>>(/* Specify a number here if you want to limit your thread pool */);
for(int i = 0; i < 10; i++)//Number of iteration you want
{
for(int i = 0; i < nbThreadToStart; i++)
{
try
{
allTasks.add(executor.submit(new MyClass()));
}
catch(Exception e)
{
e.printStackTrace();
}
}
try
{
Thread.sleep(1000);
}
catch(Exception e)
{
e.printStackTrace();
}
}
executor.shutdown();
//You can then access all your thread(Tasks) and see if they terminated and even add a timeout :
try
{
for(Future<String> task : allTasks)
task.get(60, TimeUnit.SECONDS);//Timeout of 1 seconds. The get will return what you specified in the call method.
}
catch (TimeOutException te)
{
...
}
catch(InterruptedException ie)
{
...
}
catch(ExecutionException ee)
{
...
}
I'm not sure what you really want, but I think you should handle multi-threading with a thread pool specially if you're planning on receiving a lot of requests to avoid any undesired memory leak etc.
If my example is not clear enough, note that there is many other methods offered by ExexutorService,Future etc. that are very usefull when dealing with Thread.
Check this out :
http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/Executors.html
http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/Executor.html
http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/Future.html
That's it for my recommandations.

How to make the main thread wait for the other threads to complete in ThreadPoolExecutor

I am using the ThreadPoolExecutor to implement threading in my Java Application.
I have a XML which I need to parse and add each node of it to a thread to execute the completion. My implementation is like this:
parse_tp is a threadpool object created & ParseQuotesXML is the class with the run method.
try {
List children = root.getChildren();
Iterator iter = children.iterator();
//Parsing the XML
while(iter.hasNext()) {
Element child = (Element) iter.next();
ParseQuotesXML quote = new ParseQuotesXML(child, this);
parse_tp.execute(quote);
}
System.out.println("Print it after all the threads have completed");
catch(Exception ex) {
ex.printStackTrace();
}
finally {
System.out.println("Print it in the end.");
if(!parse_tp.isShutdown()) {
if(parse_tp.getActiveCount() == 0 && parse_tp.getQueue().size() == 0 ) {
parse_tp.shutdown();
} else {
try {
parse_tp.awaitTermination(30, TimeUnit.SECONDS);
} catch (InterruptedException ex) {
log.info("Exception while terminating the threadpool "+ex.getMessage());
ex.printStackTrace();
}
}
}
parse_tp.shutdown();
}
The problem is, the two print out statements are printed before the other threads exit. I want to make the main thread wait for all other threads to complete.
In normal Thread implementation I can do it using join() function but not getting a way to achieve the same in ThreadPool Executor. Also would like to ask if the code written in finally block to close the threadpool proper ?
Thanks,
Amit
A CountDownLatch is designed for this very purpose. Examples may be found here and here. When the number of threads is not known in advance, consider a Phaser, new in Java 1.7, or an UpDownLatch.
To answer your second question, I think you are doing a reasonable job trying to clean up your thread pool.
With respect to your first question, I think the method that you want to use is submit rather than execute. Rather than try to explain it all in text, here's an edited fragment from a unit test that I wrote that makes many tasks, has each of them do a fragment of the total work and then meets back at the starting point to add the results:
final AtomicInteger messagesReceived = new AtomicInteger(0);
// ThreadedListenerAdapter is the class that I'm testing
// It's not germane to the question other than as a target for a thread pool.
final ThreadedListenerAdapter<Integer> adapter =
new ThreadedListenerAdapter<Integer>(listener);
int taskCount = 10;
List<FutureTask<Integer>> taskList = new ArrayList<FutureTask<Integer>>();
for (int whichTask = 0; whichTask < taskCount; whichTask++) {
FutureTask<Integer> futureTask =
new FutureTask<Integer>(new Callable<Integer>() {
#Override
public Integer call() throws Exception {
// Does useful work that affects messagesSent
return messagesSent;
}
});
taskList.add(futureTask);
}
for (FutureTask<Integer> task : taskList) {
LocalExecutorService.getExecutorService().submit(task);
}
for (FutureTask<Integer> task : taskList) {
int result = 0;
try {
result = task.get();
} catch (InterruptedException ex) {
Thread.currentThread().interrupt();
} catch (ExecutionException ex) {
throw new RuntimeException("ExecutionException in task " + task, ex);
}
assertEquals(maxMessages, result);
}
int messagesSent = taskCount * maxMessages;
assertEquals(messagesSent, messagesReceived.intValue());
I think this fragment is similar to what you're trying to do. The key components were the submit and get methods.
First of all you can use ThreadPoolExecutor.submit() method, which returns Future instance, then after you submitted all your work items you can iterate trough those futures and call Future.get() on each of them.
Alternatively, you can prepare your runnable work items and submit them all at once using ThreadPoolExecutor.invokeAll(), which will wait until all work items completed and then you can get the execution results or exception calling the same Future.get() method.

Categories