In a non-JavaFX application I would like to have the same Class like Task.
A Thread which executes something and is able to return its progress.
Is there something that could perform a task similar to the above mentioned?
The Task class adds a bunch of functionality to a FutureTask, but all of the non-obvious parts are to do with providing observable properties and ensuring they are updated on the FX Application Thread. It sounds like you don't need any of the difficult parts: you are querying the task to check its progress (so you don't need observability, i.e. callbacks to be invoked when the progress changes) and you don't have an FX Application Thread on which to schedule updates.
So, for example, if you want to track progress, just add the appropriate property to your Callable implementation. If you want the progress to be accessible from multiple threads, use an atomic reference to represent the progress internally (or at least make it volatile):
import java.util.concurrent.Callable ;
import java.util.concurrent.atomic.AtomicLong ;
public class MyCountingTask implements Callable<Void> {
private AtomicLong progressCount = new AtomicLong();
private final long max = 1000 ;
#Override
public Void call() throws InterruptedException {
for (int count = 0; count < max ; count++) {
progressCount.set(count);
// in real life, do actual work instead of sleeping...
Thread.sleep(100);
}
progressCount.set(max);
return null ;
}
public double getProgress() {
return 1.0*progressCount.get() / max ;
}
}
Related
I'm new to using Java and I'm trying to learn threading, so it'd be helpful to get some pointers on this. I don't understand much about using threads, so a for-dummies type of explanation would also help a lot.
I'm working on a project where I have a function foo, and a List of strings which I would like to pass through foo. If the processing time for foo(S) goes over T milliseconds, I will stop processing foo(S), and then I would move on to foo(S+1). I have a large dataset of strings and they can take too long so I tried using threading and termination to speed things up, but I'm not sure how to terminate if processing takes too long. This is more or less what foo does:
public static foo(String s){
// use a while loop to manipulate string
// return some integer
}
After some research, I found out that ExecutorService would be a good approach, and this is the structure of what I have so far:
public static void process(List<String> l, int n) {
ExecutorService executor = Executors.newFixedThreadPool(n);
for (final String s: l) {
executor.submit(new Runnable() {
#Override
public void run() {
System.out.println(foo(s));
//interrupt this process if foo(s) takes >T
}
});
}
executor.shutdown();
}
However, if I run my code without on a small set of strings, it does not terminate nor print anything at all and I am unsure why. Is it stuck inside the for loop in foo? How do I fix that? Can I add a sleep(T) between threads to interrupt a thread that is taking too long? thanks!
It's not really possible to cancel a task exactly after a specific time. Canceling a task at a specific time (canceling in a sense that the task terminates immediately) would be dangerous, because it could leave data in an inconsistent state, without having the chance to "clear things up".
Therefore, we have interrupts to do a similar thing. You interrupt a task (or a thread), which is basically a notification for the task e.g. to stop its execution by itself. In this way, the task is able to terminate in a controlled manner and leave data in a consistent state.
In your example, it would work like so: Every processing of a string in your list would be a task. You have two ways to implement the timeout-mechanism: The task is canceled externally (using interrupts) or it cancels itself. When the task cancels itself, it periodically checks if it is running longer than a timeout, while processing the string. If the timeout is exceeded, the initial value is the result, otherwise the processed string.
In Java it would be something like this:
import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.Callable;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.Future;
class Main {
static class ProcessTask implements Callable<String> {
final String initialValue;
final long timeout;
ProcessTask(String initialValue, long timeout) {
this.initialValue = initialValue;
this.timeout = timeout;
}
#Override
public String call() throws Exception {
long complexity = (long) (Math.random() * 1_000_000L);
long starttime = System.currentTimeMillis();
for (long i = 0; i < complexity; ++i) {
Math.sqrt(Math.log(i));
if (System.currentTimeMillis() - starttime > timeout) {
return initialValue;
}
}
return "new";
}
}
static void process(List<String> initialList, int parallelism, int timeout) {
ExecutorService executor = Executors.newFixedThreadPool(parallelism);
List<Future<String>> futures = new ArrayList<>();
initialList.forEach(s -> {
futures.add(executor.submit(new ProcessTask(s, timeout)));
});
for (int i = 0; i < initialList.size(); ++i) {
try {
initialList.set(i, futures.get(i).get());
} catch (InterruptedException | ExecutionException e) {
e.printStackTrace();
}
}
executor.shutdown();
}
public static void main(String[] args) {
List<String> myList = new ArrayList<>();
for (int i = 0; i < 1_000; ++i) {
myList.add("old");
}
process(myList, Runtime.getRuntime().availableProcessors(), 10);
myList.forEach(System.out::println);
}
}
This is a very explicit way of doing it, but you've asked for it. There are definitely ways of doing that less wordy. If you are unsure about any class here, just do a little bit of research. There is nothing crazy I've used.
What you should see is a mix of olds and news in the output, where new just indicates that the processing was finished and old that the timeout was exceeded and the corresponding task was stopped. If you see something else, play around with the timeout parameter of process.
The most important part for you to look at is the call method. There is the mechanism that stops a task after a timeout by periodically checking the runtime. You need place those checks at correct places in your processing algorithm to stop the task.
So in Java concurrency, there is the concept of a task which is really any implementing Runnable or Callable (and, more specifically, the overridden run() or call() method of that interface).
I'm having a tough time understanding the relationship between:
A task (Runnable/Callable); and
An ExecutorService the task is submitted to; and
An underlying, concurrent work queue or list structure used by the ExecutorService
I believe the relationship is something of the following:
You, the developer, must select which ExecutorService and work structure best suits the task at hand
You initialize the ExecutorService (say, as a ScheduledThreadPool) with the underlying structure to use (say, an ArrayBlockingQueue) (if so, how?!?!)
You submit your task to the ExecutorService which then uses its threading/pooling strategy to populate the given structure (ABQ or otherwise) with copies of the task
Each spawned/pooled thread now pulls copies of the task off of the work structure and executes it
First off, please correct/clarify any of the above assumptions if I am off-base on any of them!
Second, if the task is simply copied/replicated over and over again inside the underlying work structure (e.g., identical copies in each index of a list), then how do you ever decompose a big problem down into smaller (concurrent) ones? In other words, if the task simply does steps A - Z, and you have an ABQ with 1,000 of those tasks, then won't each thread just do A - Z as well? How do you say "some threads should work on A - G, while other threads should work on H, and yet other threads should work on I - Z", etc.?
For this second one I might need a code example to visualize how it all comes together. Thanks in advance.
Your last assumption is not quite right. The ExecutorService does not pull copies of the task. The program must supply all tasks individually to be performed by the ExecutorService. When a task has finished, the next task in the queue is executed.
An ExecutorService is an interface for working with a thread pool. You generally have multiple tasks to be executed on the pool, and each operates on a different part of the problem. As the developer, you must specify which parts of the problem each task should work on when creating it, before sending it to the ExecutorService. The results of each task (assuming they are working on a common problem) should be added to a BlockingQueue or other concurrent collection, where another thread may use the results or wait for all tasks to finish.
Here is an article you may want to read about how to use an ExecutorService: http://www.vogella.com/articles/JavaConcurrency/article.html#threadpools
Update: A common use of the ExecutorService is to implement the producer/consumer pattern. Here is an example I quickly threw together to get you started--it is intended for demonstration purposes only, as some details and concerns have been omitted for simplicity. The thread pool contains multiple producer threads and one consumer thread. The job being performed is to sum the numbers from 0...N. Each producer thread sums a smaller interval of numbers, and publishes the result to the BlockingQueue. The consumer thread processes each result added to the BlockingQueue.
import java.util.concurrent.ArrayBlockingQueue;
import java.util.concurrent.BlockingQueue;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
public class NumberCounter {
private final ExecutorService pool = Executors.newFixedThreadPool(2);
private final BlockingQueue<Integer> queue = new ArrayBlockingQueue(100);
public void startCounter(int max, int workers) {
// Create multiple tasks to add numbers. Each task submits the result
// to the queue.
int increment = max / workers;
for (int worker = 0; worker < workers; worker++) {
Runnable task = createProducer(worker * increment, (worker + 1) * increment);
pool.execute(task);
}
// Create one more task that will consume the numbers, adding them up
// and printing the results.
pool.execute(new Runnable() {
#Override
public void run() {
int sum = 0;
while (true) {
try {
Integer result = queue.take();
sum += result;
System.out.println("New sum is " + sum);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}
});
}
private Runnable createProducer(final int start, final int stop) {
return new Runnable() {
#Override
public void run() {
System.out.println("Worker started counting from " + start + " to " + stop);
int count = 0;
for (int i = start; i < stop; i++) {
count += i;
}
queue.add(count);
}
};
}
public static void main(String[] args) throws InterruptedException {
NumberCounter counter = new NumberCounter();
counter.startCounter(10000, 5);
}
}
I have Callable object executed using ExecutorService.
How to return interim results from this callable?
I know there is javax.swing.SwingWorker#publish(results) for Swing but I don't use Swing.
There are a couple of ways of doing this. You could do it with a callback or you could do it with a queue.
Here's an example of doing it with a callback:
public static interface Callback<T> {
public void on(T event);
}
Then, an implementation of the callback that does something with your in progress events:
final Callback<String> callback = new Callback<String>() {
public void on(String event) {
System.out.println(event);
}
};
Now you can use the callback in your pool:
Future<String> submit = pool.submit(new Callable<String>() {
public String call() throws Exception {
for(int i = 0; i < 10; i++) {
callback.on("process " + i);
}
return "done";
}
});
It is not clear what an "interim result" really is. The interfaces used in the concurrency package simply do not define this, but assume methods that resemble more or less pure functions.
Hence, instead this:
interim = compute something
finalresult = compute something else
do something like this:
interim = compute something
final1 = new Pair( interim, fork(new Future() { compute something else }) )
(Pseudocode, thought to convey the idea, not compileable code)
EDIT The idea is: instead of running a single monolithic block of computations (that happens to reach a state where some "interim results" are available) break it up so that the first task returns the former "interim" result and, at the same time, forks a second task that computes the final result. Of course, a handle to this task must be delivered to the caller so that it eventually can get the final result. Usually, this is done with the Future interface.
You can pass, let's say, an AtomicInteger to your class (the one that will be submitted by the executor) inside that class you increment it's value and from the calling thread you check it's value
Something like this:
public class LongComputation {
private AtomicInteger progress = new AtomicInteger(0);
public static void main(String[] args) throws InterruptedException,
ExecutionException {
AtomicInteger progress = new AtomicInteger(0);
LongComputation computation = new LongComputation(progress);
ExecutorService executor = Executors.newFixedThreadPool(2);
Future<Integer> result = executor.submit(() -> computation.compute());
executor.shutdown();
while (!result.isDone()) {
System.out.printf("Progress...%d%%%n", progress.intValue());
TimeUnit.MILLISECONDS.sleep(100);
}
System.out.printf("Result=%d%n", result.get());
}
public LongComputation(AtomicInteger progress) {
this.progress = progress;
}
public int compute() throws InterruptedException {
for (int i = 0; i < 100; i++) {
TimeUnit.MILLISECONDS.sleep(100);
progress.incrementAndGet();
}
return 1_000_000;
}
}
What you're looking for is java.util.concurrent.Future.
A Future represents the result of an asynchronous computation. Methods
are provided to check if the computation is complete, to wait for its
completion, and to retrieve the result of the computation. The result
can only be retrieved using method get when the computation has
completed, blocking if necessary until it is ready. Cancellation is
performed by the cancel method. Additional methods are provided to
determine if the task completed normally or was cancelled. Once a
computation has completed, the computation cannot be cancelled. If you
would like to use a Future for the sake of cancellability but not
provide a usable result, you can declare types of the form Future
and return null as a result of the underlying task.
You would have to roll your own API with something like Observer/Observerable if you want to publish intermediate results as a push. A simpler thing would be to just poll for current state through some self defined method.
So this seems like a pretty common use case, and maybe I'm over thinking it, but I'm having an issue with keeping centralized metrics from multiple threads. Say I have multiple worker threads all processing records and I every 1000 records I want to spit out some metric. Now I could have each thread log individual metrics, but then to get throughput numbers, but I'd have to add them up manually (and of course time boundaries won't be exact). Here's a simple examples:
public class Worker implements Runnable {
private static int count = 0;
private static long processingTime = 0;
public void run() {
while (true) {
...get record
count++;
long start = System.currentTimeMillis();
...do work
long end = System.currentTimeMillis();
processingTime += (end-start);
if (count % 1000 == 0) {
... log some metrics
processingTime = 0;
count = 0;
}
}
}
}
Hope that makes some sense. Also I know the two static variables will probably be AtomicInteger and AtomicLong . . . but maybe not. Interested in what kinds of ideas people have. I had thought about using Atomic variables and using a ReeantrantReadWriteLock - but I really don't want the metrics to stop the processing flow (i.e. the metrics should have very very minimal impact on the processing). Thanks.
Offloading the actual processing to another thread can be a good idea. The idea is to encapsulate your data and hand it off to a processing thread quickly so you minimize impact on the threads that are doing meaningful work.
There is a small handoff contention, but that cost is usually a lot smaller than any other type of synchronization that it should be a good candidate in many situations. I think M. Jessup's solution is pretty close to mine, but hopefully the following code illustrates the point clearly.
public class Worker implements Runnable {
private static final Metrics metrics = new Metrics();
public void run() {
while (true) {
...get record
long start = System.currentTimeMillis();
...do work
long end = System.currentTimeMillis();
// process the metric asynchronously
metrics.addMetric(end - start);
}
}
private static final class Metrics {
// a single "background" thread that actually handles
// processing
private final ExecutorService metricThread =
Executors.newSingleThreadExecutor();
// data (no synchronization needed)
private int count = 0;
private long processingTime = 0;
public void addMetric(final long time) {
metricThread.execute(new Runnable() {
public void run() {
count++;
processingTime += time;
if (count % 1000 == 0) {
... log some metrics
processingTime = 0;
count = 0;
}
}
});
}
}
}
I would suggest if you don't want the logging to interfere with the processing, you should have a separate log worker thread and have your processing threads simply provide some type of value object that can be handed off. In the example I choose a LinkedBlockingQueue since it has the ability to block for an insignificant amount of time using offer() and you can defer the blocking to another thread that pulls the values from a queue. You might need to have increased logic in the MetricProcessor to order data, etc depending on your requirements, but even if it is a long running operation it wont keep the VM thread scheduler from restarting the real processing threads in the mean time.
public class Worker implements Runnable {
public void run() {
while (true) {
... do some stuff
if (count % 1000 == 0) {
... log some metrics
if(MetricProcessor.getInstance().addMetrics(
new Metrics(processingTime, count, ...)) {
processingTime = 0;
count = 0;
} else {
//the call would have blocked for a more significant
//amount of time, here the results
//could be abandoned or just held and attempted again
//as a larger data set later
}
}
}
}
}
public class WorkerMetrics {
...some interesting data
public WorkerMetrics(... data){
...
}
...getter setters etc
}
public class MetricProcessor implements Runnable {
LinkedBlockingQueue metrics = new LinkedBlockingQueue();
public boolean addMetrics(WorkerMetrics m) {
return metrics.offer(m); //This may block, but not for a significant amount of time.
}
public void run() {
while(true) {
WorkMetrics m = metrics.take(); //wait here for something to come in
//the above call does all the significant blocking without
//interrupting the real processing
...do some actual logging, aggregation, etc of the metrics
}
}
}
If you depend on the state of count and the state of processingTime to be in synch then you would have to be using a Lock. For example if when ++count % 1000 == 0 is true, you want to evaluate the metrics of processingTime at THAT time.
For that case, it would make sense to use a ReentrantLock. I wouldn't use a RRWL because there isn't really an instance where a pure read is occuring. It is always a read/write set. But you would need to Lock around all of
count++
processingTime += (end-start);
if (count % 1000 == 0) {
... log some metrics
processingTime = 0;
count = 0;
}
Whether or not count++ is going to be at that location, you will need to lock around that also.
Finally if you are using a Lock, you do not need an AtomicLong and AtomicInteger. It just adds to the overhead and isn't more thread-safe.
I am using a ThreadPoolExecutor to execute tasks in my Java application. I have a requirement where I want to get the number of active tasks in the queue at any point in time in the executor queue . I looked up at the javadoc for ThreadPoolExecutor and found two relevant methods: getTaskCount() and getCompletedTaskCount().
Per the documentation, I could get the number of scheduled tasks and completed tasks from the above two methods respectively. But I am not able to find a solution for getting the number of active tasks in the queue at any point in time. I can do something like:
getTaskCount() = getCompletedTaskCount() + failed tasks + active tasks
But the number of failed tasks is not directly available to arrive at the intended calculation.
Am I missing something here ?
I don't think you need to know the failed count with the calculation you're trying to use.
long submitted = executor.getTaskCount();
long completed = executor.getCompletedTaskCount();
long notCompleted = submitted - completed; // approximate
Would be (approximately) sufficient.
Alternatively, you can use getQueue() with size():
int queued = executor.getQueue().size();
int active = executor.getActiveCount();
int notCompleted = queued + active; // approximate
This answer presumes you're looking for a "not yet completed" count. Your question contradicts itself so I'm not completely certain what you're asking. Reply to my comment on your question if this is incorrect, and I'll update this answer accordingly.
Have you tried using the beforeExecute and afterExecute methods? These are called before and after a task is executed. The after execute method even supplies a throwable as a second argument, so you know when a task has failed.
You could add a hook so that beforeExecute increments the value of the active tasks, and afterExecute decrements it. Ofcourse, these methods are called on their respective fields, so that you would have to synchronize the result on a mutual lock Object.
To use these methods, just override the ThreadPoolExecutor object of your choice and add the hook there.
For instance, the following code should hopefully work:
public class MyExecutor extends ThreadPoolExecutor {
//Lock object used for synchronization
private final Object lockObject = new Object();
//Contains the active task count
private int activeTaskCount = 0;
//Failed task count
private int failedTaskCount = 0;
private int succeededTaskCount = 0;
public MyExecutor () {
//call super here with your parameters;
}
public int getActiveTaskCount(){
synchronized(lockObject){
return activeTaskCount;
}
}
public int getFailedTaskCount(){
synchronized(lockObject){
return failedTaskCount ;
}
}
public int getSucceededTaskCount(){
synchronized(lockObject){
return succeededTaskCount ;
}
}
protected void beforeExecute(Thread t,
Runnable r){
super.beforeExecute(t,r);
synchronized(lockObject){
activeTaskCount++;
}
}
protected void afterExecute(Runnable r,Throwable t){
super.afterExecute(r,t);
synchronized(lockObject){
activeTaskCount--;
if(t!=null){
failedTaskCount++;
}else{
succeededTaskCount++;
}
}
}
}