I want to create a DAG out of tasks in Java, where the tasks may depend upon the output of other tasks. If there is no directed path between two tasks, they may run in parallel. Tasks may be canceled. If any task throws an exception, all tasks are canceled.
I wanted to use CompleteableFuture for this, but despite implementing the Future interface (including Future.cancel(boolean), CompletableFuture does not support cancelation -- CompletableFuture.cancel(true) is simply ignored. (Does anybody know why?)
Therefore, I am resorting to building my own DAG of tasks using Future. It's a lot of boilerplate, and complicated to get right. Is there any better method than this?
Here is an example:
I want to call Process process = Runtime.getRuntime().exec(cmd) to start a commandline process, creating a Future<Process>. Then I want to launch (fan out to) three subtasks:
One task that consumes input from process.getInputStream()
One task that consumes input from process.getErrorStream()
One task that calls process.waitFor(), and then waits for the result.
Then I want to wait for all three of the launched sub-tasks to complete (i.e. fan-in / a completion barrier). This should be collected in a final Future<Integer> exitCode that collects the exit code returned by the process.waitFor() task. The two input consumer tasks simply return Void, so their output can be ignored, but the completion barrier should still wait for their completion.
I want a failure in any of the launched subtasks to cause all subtasks to be canceled, and the underlying process destroyed.
Note that Process process = Runtime.getRuntime().exec(cmd) in the first step can throw an exception, which should cause the failure to cascade all the way to exitCode.
#FunctionalInterface
public static interface ConsumerThrowingIOException<T> {
public void accept(T val) throws IOException;
}
public static Future<Integer> exec(
ConsumerThrowingIOException<InputStream> stdoutConsumer,
ConsumerThrowingIOException<InputStream> stderrConsumer,
String... cmd) {
Future<Process> processFuture = executor.submit(
() -> Runtime.getRuntime().exec(cmd));
AtomicReference<Future<Void>> stdoutProcessorFuture = //
new AtomicReference<>();
AtomicReference<Future<Void>> stderrProcessorFuture = //
new AtomicReference<>();
AtomicReference<Future<Integer>> exitCodeFuture = //
new AtomicReference<>();
Runnable cancel = () -> {
try {
processFuture.get().destroy();
} catch (Exception e) {
// Ignore (exitCodeFuture.get() will still detect exceptions)
}
if (stdoutProcessorFuture.get() != null) {
stdoutProcessorFuture.get().cancel(true);
}
if (stderrProcessorFuture.get() != null) {
stderrProcessorFuture.get().cancel(true);
}
if (exitCodeFuture.get() != null) {
stderrProcessorFuture.get().cancel(true);
}
};
if (stdoutConsumer != null) {
stdoutProcessorFuture.set(executor.submit(() -> {
try {
InputStream inputStream = processFuture.get()
.getInputStream();
stdoutConsumer.accept(inputStream != null
? inputStream
: new ByteArrayInputStream(new byte[0]));
return null;
} catch (Exception e) {
cancel.run();
throw e;
}
}));
}
if (stderrConsumer != null) {
stderrProcessorFuture.set(executor.submit(() -> {
try {
InputStream errorStream = processFuture.get()
.getErrorStream();
stderrConsumer.accept(errorStream != null
? errorStream
: new ByteArrayInputStream(new byte[0]));
return null;
} catch (Exception e) {
cancel.run();
throw e;
}
}));
}
exitCodeFuture.set(executor.submit(() -> {
try {
return processFuture.get().waitFor();
} catch (Exception e) {
cancel.run();
throw e;
}
}));
// Async completion barrier -- wait for process to exit,
// and for output processors to complete
return executor.submit(() -> {
Exception exception = null;
int exitCode = 1;
try {
exitCode = exitCodeFuture.get().get();
} catch (InterruptedException | CancellationException
| ExecutionException e) {
cancel.run();
exception = e;
}
if (stderrProcessorFuture.get() != null) {
try {
stderrProcessorFuture.get().get();
} catch (InterruptedException | CancellationException
| ExecutionException e) {
cancel.run();
if (exception == null) {
exception = e;
} else if (e instanceof ExecutionException) {
exception.addSuppressed(e);
}
}
}
if (stdoutProcessorFuture.get() != null) {
try {
stdoutProcessorFuture.get().get();
} catch (InterruptedException | CancellationException
| ExecutionException e) {
cancel.run();
if (exception == null) {
exception = e;
} else if (e instanceof ExecutionException) {
exception.addSuppressed(e);
}
}
}
if (exception != null) {
throw exception;
} else {
return exitCode;
}
});
}
Note: I realize that Runtime.getRuntime().exec(cmd) should be non-blocking, so doesn't require its own Future, but I wrote the code using one anyway, to make the point about DAG construction.
No way. Process has no asynchronous interface (except for Process.onExit()). So you have to use threads to wait for process creation and while reading from InputStreams. Other components of your DAG can be async tasks (CompletableFutures).
This is not a big problem. The only advantage of async tasks over threads is less memory consumption. Your Process consumes a lot if memery anyway, so there is not much sense to save memory here.
Related
Recently, I find some BlockingOperationException in my netty4 project.
Some people said that when using the sync() method of start netty's ServerBootstrap can cause dead lock, because sync() will invoke await() method, and there is a method called 'checkDeadLock' in await().
But I don't think so. ServerBootstrap use the EventLoopGroup called boosGroup, and Channel use the workerGroup to operation IO, I don't think they will influence each other, they have different EventExecutor.
And in my practice, Deadlock exception doesn't appear in the Netty startup process, most of which occurs after the Channel of the await writeAndFlush.
Analysis source code, checkDeadLock, BlockingOperationException exception thrown is when the current thread and executor thread of execution is the same.
My project code is blow:
private void channelWrite(T message) {
boolean success = true;
boolean sent = true;
int timeout = 60;
try {
ChannelFuture cf = cxt.write(message);
cxt.flush();
if (sent) {
success = cf.await(timeout);
}
if (cf.isSuccess()) {
logger.debug("send success.");
}
Throwable cause = cf.cause();
if (cause != null) {
this.fireError(new PushException(cause));
}
} catch (LostConnectException e) {
this.fireError(new PushException(e));
} catch (Exception e) {
this.fireError(new PushException(e));
} catch (Throwable e) {
this.fireError(new PushException("Failed to send message“, e));
}
if (!success) {
this.fireError(new PushException("Failed to send message"));
}
}
I know Netty officials advise not to use sync() or await() method, but I want to know what situation will causes deadlocks in process and the current thread and executor thread of execution is the same.
I change my project code.
private void pushMessage0(T message) {
try {
ChannelFuture cf = cxt.writeAndFlush(message);
cf.addListener(new ChannelFutureListener() {
#Override
public void operationComplete(ChannelFuture future) throws PushException {
if (future.isSuccess()) {
logger.debug("send success.");
} else {
throw new PushException("Failed to send message.");
}
Throwable cause = future.cause();
if (cause != null) {
throw new PushException(cause);
}
}
});
} catch (LostConnectException e) {
this.fireError(new PushException(e));
} catch (Exception e) {
this.fireError(new PushException(e));
} catch (Throwable e) {
this.fireError(new PushException(e));
}
}
But I face a new problem, I can't get the pushException from the ChannelHandlerListener.
BlockingOperationException will be throw by netty if you call sync*or await* on a Future in the same thread that the EventExecutor is using and to which the Future is tied to. This is usually the EventLoop that is used by the Channel itself.
Can not call await in IO thread is understandable. However, there are 2 points.
1. If you call below code in channel handler, no exception will be reported, because the the most of the time the check of isDone in await returns true, since you are in IO thread, and IO thread is writing data synchronously. the data has been written when await is called.
ChannelPromise p = ctx.writeAndFlush(msg);
p.await()
If add a handler in different EventExecutorGroup, this check is not necessary, since that executor is newly created and is not the same one with the channel's IO executor.
I have a Producer-Consumer problem to implement in Java, where I want the producer thread to run for a specific amount of time e.g. 1 day, putting objects in a BlockingQueue -specifically tweets, streamed from Twitter Streaming API via Twitter4j- and the consumer thread to consume these objects from the queue and write them to file. I've used the PC logic from Read the 30Million user id's one by one from the big file, where producer is the FileTask and consumer is the CPUTask (check first answer; my approach uses the same iterations/try-catch blocks with it). Of course I adapted the implementations accordingly.
My main function is:
public static void main(String[] args) {
....
final int threadCount = 2;
// BlockingQueue with a capacity of 200
BlockingQueue<Tweet> tweets = new ArrayBlockingQueue<>(200);
// create thread pool with given size
ExecutorService service = Executors.newFixedThreadPool(threadCount);
Future<?> f = service.submit(new GathererTask(tweets));
try {
f.get(1,TimeUnit.MINUTES); // Give specific time to the GathererTask
} catch (InterruptedException | ExecutionException | TimeoutException e) {
f.cancel(true); // Stop the Gatherer
}
try {
service.submit(new FileTask(tweets)).get(); // Wait til FileTask completes
} catch (InterruptedException | ExecutionException e) {
e.printStackTrace();
}
service.shutdownNow();
try {
service.awaitTermination(7, TimeUnit.DAYS);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
Now, the problem is that, although it does stream the tweets and writes them to file, it never terminates and never gets to the f.cancel(true) part. What should I change for it to work properly? Also, could you explain in your answer what went wrong here with the thread logic, so I learn from my mistake? Thank you in advance.
These are the run() functions of my PC classes:
Producer:
#Override
public void run() {
StatusListener listener = new StatusListener(){
public void onStatus(Status status) {
try {
tweets.put(new Tweet(status.getText(),status.getCreatedAt(),status.getUser().getName(),status.getHashtagEntities()));
} catch (InterruptedException e) {
e.printStackTrace();
Thread.currentTread.interrupt(); // Also tried this command
}
}
public void onException(Exception ex) {
ex.printStackTrace();
}
};
twitterStream.addListener(listener);
... // More Twitter4j commands
}
Consumer:
public void run() {
Tweet tweet;
try(PrintWriter out = new PrintWriter(new BufferedWriter(new FileWriter("out.csv", true)))) {
while(true) {
try {
// block if the queue is empty
tweet = tweets.take();
writeTweetToFile(tweet,out);
} catch (InterruptedException ex) {
break; // GathererTask has completed
}
}
// poll() returns null if the queue is empty
while((tweet = tweets.poll()) != null) {
writeTweetToFile(tweet,out);
}
} catch (IOException e) {
e.printStackTrace();
}
}
You should check if your Thread classes are handling the InterruptedException, if not, they will wait forever. This might help.
I am trying to get a very basic RxJava based application to work. I have defined the following Observable class which reads and returns lines from a file:
public Observable<String> getObservable() throws IOException
{
return Observable.create(subscribe -> {
InputStream in = getClass().getResourceAsStream("/trial.txt");
BufferedReader reader = new BufferedReader(new InputStreamReader(in));
String line = null;
try {
while((line = reader.readLine()) != null)
{
subscribe.onNext(line);
}
} catch (IOException e) {
subscribe.onError(e);
}
finally {
subscribe.onCompleted();
}
});
}
Next I have defined the subscrober code:
public static void main(String[] args) throws IOException, InterruptedException {
Thread thread = new Thread(() ->
{
RxObserver observer = new RxObserver();
try {
observer.getObservable()
.observeOn(Schedulers.io())
.subscribe( x ->System.out.println(x),
t -> System.out.println(t),
() -> System.out.println("Completed"));
} catch (IOException e) {
e.printStackTrace();
}
});
thread.start();
thread.join();
}
The file has close to 50000 records. When running the app I am getting "rx.exceptions.MissingBackpressureException". I have gone through some of the documentation and as suggested, I tried added the ".onBackpressureBuffer()" method in the call chain. But then I am not getting the exception but the completed call too isin't getting fired.
What is the right way to handle scenario wherein we have a fast producing Observable?
The first problem is that your readLine logic ignores backpressure. You can apply onBackpressureBuffer() just before observeOn to start with but there is a recent addition SyncOnSubscribe that let's you generate values one by one and takes care of backpressure:
SyncOnSubscribe.createSingleState(() => {
try {
InputStream in = getClass().getResourceAsStream("/trial.txt");
return new BufferedReader(new InputStreamReader(in));
} catch (IOException ex) {
throw new RuntimeException(ex);
}
},
(s, o) -> {
try {
String line = s.readLine();
if (line == null) {
o.onCompleted();
} else {
o.onNext(line);
}
} catch (IOException ex) {
s.onError(ex);
}
},
s -> {
try {
s.close();
} catch (IOException ex) {
}
});
The second problem is that your Thread will complete way before all elements on the io thread has been delivered and thus the main program exits. Either remove the observeOn, add .toBlocking or use a CountDownLatch.
RxObserver observer = new RxObserver();
try {
CountDownLatch cdl = new CountDownLatch(1);
observer.getObservable()
.observeOn(Schedulers.io())
.subscribe( x ->System.out.println(x),
t -> { System.out.println(t); cdl.countDown(); },
() -> { System.out.println("Completed"); cdl.countDown(); });
cdl.await();
} catch (IOException | InterruptedException e) {
e.printStackTrace();
}
The problem here is observeOn operator, since each Observer's onNext() call is scheduled to be called on a separate thread, your Observable keeps producing those scheduled calls in a loop regardless of subscriber (observeOn) capacity.
If you keep this synchronous, Observable will not emit next element until subscriber is done with the previous one, since it's all done on a one thread and you will not have backpressure problems anymore.
If you still want to use observeOn, you will have to implement backpressure logic in your Observable's OnSubscribe#call method
I use ScheduledExecutorService to schedule some tasks which need to run periodically.
I want to know whether this code works to recover the schedule when an exception happens.
ScheduledExecutorService service = Executors.newScheduledThreadPool(1);
this.startMemoryUpdateSchedule(service);//See below method
//Recursive method to handle exception when run schedule task
private void startMemoryUpdateSchedule(ScheduledExecutorService service) {
ScheduledFuture<?> future = service.scheduleWithFixedDelay(new MemoryUpdateThread(), 1, UPDATE_MEMORY_SCHEDULE, TimeUnit.MINUTES);
try {
future.get();
} catch (ExecutionException e) {
e.printStackTrace();
logger.error("Exception thrown for thread",e);
future.cancel(true);
this.startMemoryUpdateSchedule(service);
} catch(Exception e) {
logger.error("Other exception ",e);
}
}
You should probably enclose the try block in a while(true) loop because if the first run does not throw an exception, you will exit your method and if the second call throws one, you won't catch it.
I would also run the recursive call in its own thread to avoid the risk of a StackOverFlow error if things go bad.
So it would look like this:
private void startMemoryUpdateSchedule(final ScheduledExecutorService service) {
final ScheduledFuture<?> future = service.scheduleWithFixedDelay(new MemoryUpdateThread(), 1, UPDATE_MEMORY_SCHEDULE, TimeUnit.MINUTES);
Runnable watchdog = new Runnable() {
#Override
public void run() {
while (true) {
try {
future.get();
} catch (ExecutionException e) {
//handle it
startMemoryUpdateSchedule(service);
return;
} catch (InterruptedException e) {
//handle it
return;
}
}
}
};
new Thread(watchdog).start();
}
ScheduledExecutorService.scheduleWithFixedDelay(Runnable, long, long, TimeUnit) throws RejectedExecutionException (a child of RuntimeException) ==> We can catch it & retry submission once more.
Now as future.get() is supposed to return the result of one execution, we need to invoke it in a loop.
Also, the failure of one execution does not affect the next scheduled execution, which differentiates the ScheduledExecutorService from the TimerTask which executes the scheduled tasks in the same thread => failure in one execution would abort the schedule in case of TimerTask (http://stackoverflow.com/questions/409932/java-timer-vs-executorservice)
We just need to catch all the three exceptions thrown by Future.get(), but we can not rethrow them, then we won't be able to get the result of the subsequent executions.
The code could be:
public void startMemoryUpdateSchedule(final ScheduledExecutorService service) {
final ScheduledFuture<?> future;
try {
future = service.scheduleWithFixedDelay(new MemoryUpdateThread(),
1, UPDATE_MEMORY_SCHEDULE, TimeUnit.SECONDS);
} catch (RejectedExecutionException ree) {
startMemoryUpdateSchedule(service);
return;
}
while (true) {
try {
future.get();
} catch (InterruptedException ie) {
Thread.currentThread().interrupt();
} catch (ExecutionException ee) {
Throwable cause = ee.getCause();
// take action, log etc.
} catch (CancellationException e) {
// safety measure if task was cancelled by some external agent.
}
}
}
Try to use VerboseRunnable class from jcabi-log, which is designed exactly for this purpose:
import com.jcabi.log.VerboseRunnable;
Runnable runnable = new VerboseRunnable(
Runnable() {
public void run() {
// do business logic, may Exception occurs
}
},
true // it means that all exceptions will be swallowed and logged
);
Now, when anybody calls runnable.run() no exceptions are thrown. Instead, they are swallowed and logged (to SLF4J).
I've added the loop as discussed.
public void startMemoryUpdateSchedule(final ScheduledExecutorService service) {
boolean retry = false;
do {
ScheduledFuture<?> future = null;
try {
retry = false;
future = service.scheduleWithFixedDelay(new MemoryUpdateThread(), 1, UPDATE_MEMORY_SCHEDULE, TimeUnit.SECONDS);
future.get();
} catch (ExecutionException e) {
// handle
future.cancel(true);
retry = true;
} catch(Exception e) {
// handle
}
} while (retry);
}
This question is about java.lang.Process and its handling of stdin, stdout and stderr.
We have a class in our project that is an extension to org.apache.commons.io.IOUtils. There we have a quiet new method for closing the std-streams of a Process-Object appropriate? Or is it not appropriate?
/**
* Method closes all underlying streams from the given Process object.
* If Exit-Code is not equal to 0 then Process will be destroyed after
* closing the streams.
*
* It is guaranteed that everything possible is done to release resources
* even when Throwables are thrown in between.
*
* In case of occurances of multiple Throwables then the first occured
* Throwable will be thrown as Error, RuntimeException or (masked) IOException.
*
* The method is null-safe.
*/
public static void close(#Nullable Process process) throws IOException {
if(process == null) {
return;
}
Throwable t = null;
try {
close(process.getOutputStream());
}
catch(Throwable e) {
t = e;
}
try{
close(process.getInputStream());
}
catch(Throwable e) {
t = (t == null) ? e : t;
}
try{
close(process.getErrorStream());
}
catch (Throwable e) {
t = (t == null) ? e : t;
}
try{
try {
if(process.waitFor() != 0){
process.destroy();
}
}
catch(InterruptedException e) {
t = (t == null) ? e : t;
process.destroy();
}
}
catch (Throwable e) {
t = (t == null) ? e : t;
}
if(t != null) {
if(t instanceof Error) {
throw (Error) t;
}
if(t instanceof RuntimeException) {
throw (RuntimeException) t;
}
throw t instanceof IOException ? (IOException) t : new IOException(t);
}
}
public static void closeQuietly(#Nullable Logger log, #Nullable Process process) {
try {
close(process);
}
catch (Exception e) {
//log if Logger provided, otherwise discard
logError(log, "Fehler beim Schließen des Process-Objekts (inkl. underlying streams)!", e);
}
}
public static void close(#Nullable Closeable closeable) throws IOException {
if(closeable != null) {
closeable.close();
}
}
Methods like these are basically used in finally-blocks.
What I really want to know is if I am safe with this implementation? Considering things like: Does a process object always return the same stdin, stdout and stderr streams during its lifetime? Or may I miss closing streams previously returned by process' getInputStream(), getOutputStream() and getErrorStream() methods?
There is a related question on StackOverflow.com: java: closing subprocess std streams?
Edit
As pointed out by me and others here:
InputStreams have to be totally consumed. When not done then the subprocess may not terminate, because there is outstanding data in its output streams.
All three std-streams have to be closed. Regardless if used before or not.
When the subprocess terminates normally everything should be fine. When not then it have to be terminated forcibly.
When an exit code is returned by subprocess then we do not need to destroy() it. It has terminated. (Even when not necessarily terminated normally with Exit Code 0, but it terminated.)
We need to monitor waitFor() and interrupt when timeout exceeds to give process a chance to terminate normally but killing it when it hangs.
Unanswered parts:
Consider Pros and Cons of consuming the InputStreams in parallel. Or must they be consumed in particular order?
An attempt at simplifying your code:
public static void close(#Nullable Process process) throws IOException
{
if(process == null) { return; }
try
{
close(process.getOutputStream());
close(process.getInputStream());
close(process.getErrorStream());
if(process.waitFor() != 0)
{
process.destroy();
}
}
catch(InterruptedException e)
{
process.destroy();
}
catch (RuntimeException e)
{
throw (e instanceof IOException) ? e : new IOException(e);
}
}
By catching Throwable I assume you wish to catch all unchecked exceptions. That is either a derivative of RuntimeException or Error. However Error should never be catched, so I have replaced Throwable with RuntimeException.
(It is still not a good idea to catch all RuntimeExceptions.)
As the question you linked to states, it is better to read and discard the output and error streams. If you are using apache commons io, something like,
new Thread(new Runnable() {public void run() {IOUtils.copy(process.getInputStream(), new NullOutputStream());}}).start();
new Thread(new Runnable() {public void run() {IOUtils.copy(process.getErrorStream(), new NullOutputStream());}}).start();
You want to read and discard stdout and stderr in a separate thread to avoid problems such as the process blocking when it writes enough info to stderr or stdout to fill the buffer.
If you are worried about having two many threads, see this question
I don't think you need to worry about catching IOExceptions when copying stdout, stdin to NullOutputStream, since if there is an IOException reading from the process stdout/stdin, it is probably due to the process being dead itself, and writing to NullOutputStream will never throw an exception.
You don't need to check the return status of waitFor().
Do you want to wait for the process to complete? If so, you can do,
while(true) {
try
{
process.waitFor();
break;
} catch(InterruptedException e) {
//ignore, spurious interrupted exceptions can occur
}
}
Looking at the link you provided you do need to close the streams when the process is complete, but destroy will do that for you.
So in the end, the method becomes,
public void close(Process process) {
if(process == null) return;
new Thread(new Runnable() {public void run() {IOUtils.copy(process.getInputStream(), new NullOutputStream());}}).start();
new Thread(new Runnable() {public void run() {IOUtils.copy(process.getErrorStream(), new NullOutputStream());}}).start();
while(true) {
try
{
process.waitFor();
//this will close stdin, stdout and stderr for the process
process.destroy();
break;
} catch(InterruptedException e) {
//ignore, spurious interrupted exceptions can occur
}
}
}
Just to let you know what I have currently in our codebase:
public static void close(#Nullable Process process) throws IOException {
if (process == null) {
return;
}
Throwable t = null;
try {
flushQuietly(process.getOutputStream());
}
catch (Throwable e) {
t = mostImportantThrowable(t, e);
}
try {
close(process.getOutputStream());
}
catch (Throwable e) {
t = mostImportantThrowable(t, e);
}
try {
skipAllQuietly(null, TIMEOUT, process.getInputStream());
}
catch (Throwable e) {
t = mostImportantThrowable(t, e);
}
try {
close(process.getInputStream());
}
catch (Throwable e) {
t = mostImportantThrowable(t, e);
}
try {
skipAllQuietly(null, TIMEOUT, process.getErrorStream());
}
catch (Throwable e) {
t = mostImportantThrowable(t, e);
}
try {
close(process.getErrorStream());
}
catch (Throwable e) {
t = mostImportantThrowable(t, e);
}
try {
try {
Thread monitor = ThreadMonitor.start(TIMEOUT);
process.waitFor();
ThreadMonitor.stop(monitor);
}
catch (InterruptedException e) {
t = mostImportantThrowable(t, e);
process.destroy();
}
}
catch (Throwable e) {
t = mostImportantThrowable(t, e);
}
if (t != null) {
if (t instanceof Error) {
throw (Error) t;
}
if (t instanceof RuntimeException) {
throw (RuntimeException) t;
}
throw t instanceof IOException ? (IOException) t : new IOException(t);
}
}
skipAllQuietly(...) consumes complete InputStreams. It uses internally an implementation similar to org.apache.commons.io.ThreadMonitor to interrupt consumption if a given timeout exceeded.
mostImportantThrowable(...) decides over what Throwable should be returned. Errors over everything. First occured higher prio than later occured. Nothing very important here since these Throwable are most probably discarded anyway later. We want to go on working here and we can only throw one, so we have to decide what we throw at the end, if ever.
close(...) are null-safe implementations to close stuff but throwing Exception when something went wrong.