I'm trying to create a small service to accept file upload, unzip it and then delete the uploaded file. Those three steps should be chained as futures. I'm using Google Guava library.
Workflow is:
A future to download the file, if the operation completed, then a future to unzip the file. If unzipping is done, a future to delete the original uploaded file.
But honestly, it isn't clear to me how I would chain the futures, and even how to create them in Guava's way. Documentation is simply terse and unclear. Ok, there is transform method but no concrete example at all. chain method is deprecated.
I miss RxJava library.
Futures.transform is not fluently chainable like RxJava, but you can still use it to set up Futures that depend on one another. Here is a concrete example:
final ListeningExecutorService service = MoreExecutors.listeningDecorator(Executors.newCachedThreadPool());
final ListenableFuture<FileClass> fileFuture = service.submit(() -> fileDownloader.download())
final ListenableFuture<UnzippedFileClass> unzippedFileFuture = Futures.transform(fileFuture,
//need to cast this lambda
(Function<FileClass, UnzippedFileClass>) file -> fileUnzipper.unzip(file));
final ListenableFuture<Void> deletedFileFuture = Futures.transform(unzippedFileFuture,
(Function<UnzippedFileClass, Void>) unzippedFile -> fileDeleter.delete(unzippedFile));
deletedFileFuture.get(); //or however you want to wait for the result
This example assumes fileDownloader.download() returns an instance of FileClass, fileUpzipper.unzip() returns an UnzippedFileClass etc. If FileDownloader.download() instead returns a ListenableFuture<FileClass>, use AsyncFunction instead of Function.
This example also uses Java 8 lambdas for brevity. If you are not using Java 8, pass in anonymous implementations of Function or AsyncFunction instead:
Futures.transform(fileFuture, new AsyncFunction<FileClass, UpzippedFileClass>() {
#Override
public ListenableFuture<UnzippedFileClass> apply(final FileClass input) throws Exception {
return fileUnzipper.unzip();
}
});
More info on transform here: http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/util/concurrent/Futures.html#transform (scroll or search for "transform" -- deep linking appears to be broken currently)
Guava extends the Future interface with ListenableFuture for this purpose.
Something like this should work:
Runnable downloader, unzipper;
ListeningExecutorService service = MoreExecutors.listeningDecorator(Executors.newCachedThreadPool());
service.submit(downloader).addListener(unzipper, service);
I would include deleting the file in the unzipper, since it is a near instantaneous action, and it would complicate the code to separate it.
Related
I have some code that looks like this (simplified pseudo-code):
[...]
// stream constructed of series of web service calls
Stream<InputStream> slowExternalSources = StreamSupport.stream(spliterator, false);
[...]
then this
public Stream<String> getLines(Stream<InputStream> slowExternalSources) {
return slowExternalSources.flatMap(is -> new BufferedReader(new InputStreamReader(is)).lines())
.onClose(() -> is.close());
}
and later this
Stream<String> lineStream = getLines();
lineStream.parallel().forEach( ... do some fast CPU-intensive stuff here ... }
I've been strugging to try to make this code execute with some level of parallelisation.
Inspection in jps/jstack/jmc shows that all the InputStream reading is occurring in the main thread, and not paralleling at all.
Possible culprints:
BufferedReader.lines() uses a Spliterator with parallel=false to construct the stream (source: see Java sources)
I think I read some articles that said flatMap does not interact well with parallel(). I am not able to locate that article right now.
How can I fix this code so that it runs in parallel?
I would like to retain the Java8 Streams if possible, to avoid rewriting existing code that expects a Stream.
NOTE I added java.util.concurrent to the tags because I suspect it might be part of the answer, even though it's not part of the question.
I am working with Akka Java API's, in one of the actors I want to receive a callback and process it on completion.
I want to achieve something like:
Future future = Patterns.ask(actorRefMap.get(order.getInstrument()), order, 500);
future.onComplete(getSender().tell(String.format("{} order processed for instrument {} with price {}", order.getOrderType(), order.getInstrument(), order.getPrice()), getSelf()), getContext().dispatcher());
With my current code I am getting error wrong first argument, Found 'void' required 'scala.Function1'. How do we implement the scala.Function1 in Java?
You need to pass it as a function:
Future future = Patterns.ask(actorRefMap.get(order.getInstrument()), order, 500);
future.onComplete(() -> getSender().tell(String.format("{} order processed for instrument {} with price {}", order.getOrderType(), order.getInstrument(), order.getPrice()), getSelf()), getContext().dispatcher());
... the essential part is:
future.onComplete(() -> ...)
instead of
future.onComplete(...)
And if it requires scala.Function1 instead of java.util.Function, make sure you import the Java DSL (akka.actor.typed.javadsl.AskPattern), not the Scala DSL ...
Is there a way in Java to copy one file into another in an asynchrnous way? Something similar to Stream.CopyToAsync in C# is what I'm trying to find.
What I'm trying to achieve is to download a series of ~40 files from the Internet, and this is the best I've come up with for each file:
CompletableFuture.allOf(myFiles.stream()
.map(file -> CompletableFuture.supplyAsync(() -> syncDownloadFile(file)))
.toArray(CompletableFuture[]::class))
.then(ignored -> doSomethingAfterAllDownloadsAreComplete());
Where syncDownloadFile is:
private void syncDownloadFile(MyFile file) {
try (InputStream is = file.mySourceUrl.openStream()) {
long actualSize = Files.copy(is, file.myDestinationNIOPath);
// size validation here
} catch (IOException e) {
throw new RuntimeException(e);
}
}
But that means I have some blocking calls inside of the task executors, and I'd like to avoid that so I don't block too many executors at once.
I'm not sure if the C# method internally does the same (I mean, something has to be downloading that file right?).
Is there a better way to accomplish this?
AsynchronousFileChannel (AFC for short) is the right way to manage Files in Java with non-blocking IO. Unfortunately it does not provide a promises based (aka as Task in .net) API such as the CopyToAsync(Stream) of .Net.
The alternative RxIo library is built on top of the AFC and provides the AsyncFiles asynchronous API with different calling idioms: callbacks based, CompletableFuture (equivalent to .net Task) and also reactive streams.
For instance, copying from one file to another asynchronously can be done though:
Path in = Paths.get("input.txt");
Path out = Paths.get("output.txt");
AsyncFiles
.readAllBytes(in)
.thenCompose(bytes -> AsyncFiles.writeBytes(out, bytes))
.thenAccept(index -> /* invoked on completion */)
Note that continuations are invoked by a thread from the background AsynchronousChannelGroup.
Thus you may solve your problem using a non-blocking HTTP client, with ComplableFuture based API chained with the AsyncFiles use. For instance, AHC is valid choice. See usage here: https://github.com/AsyncHttpClient/async-http-client#using-continuations
I'm writing a program in Java for Spark 1.6.0 (so, please, don't supply Scala or Python code in your answers), and this is the code I'd like to implement:
double i = 0d;
JavaRDD<Vector> ideas = objects.map(
new Function<BSONObject, Vector>()
{
#Override public Vector call(final BSONObject t) throws Exception
{
double[] xy = new double[2];
xy[0] = i++;
xy[1] = ((Long)((Date)t.get("date")).toInstant().toEpochMilli()).doubleValue();
return Vectors.dense(xy);
}
}
);
but NetBeans shows an error: "Local variables referenced from an inner class must be final or effectively final".
I also tried to use Spark Accumulators, but if I call the value() method from the call() method I'm defining, a SparkException is raised during the job, telling me that "Task is not serializable", then the job fails.
So, how can I achieve my goal?
I apologize in advance if my English is not perfect (it's not my native language), and if my question could appear noob-ish, but I can't find any solution online.
Even if it compiled it wouldn't work as you expect. Each executor get its own copy of the variables referenced inside closure and any modifications are strictly local and are not propagated back to original source. Spark supports writable accumulators which can be used as follows:
Accumulator<Double> accum = sc.accumulator(0d);
objects.map(
...
accum.add(1d)
...
)
but these provide very weak guarantees (called at-least-once) when used inside transformations and, as you've already realized, are write-only from the worker perspective.
Regarding your code it looks like all you need is zipWithIndex:
objects.zipWithIndex().map(...)
I need to find a way to execute mutually dependent tasks.
First task has to download a zip file from remote server.
Second tasks goal is to unzip the file downloaded by the first task.
Third task has to process files extracted from zip.
so, third is dependent on second and second on first task.
Naturally if one of the tasks fails, others shouldn't be executed. Since the first task downloads files from remote server, there should be a mechanism for restarting the task is server is not available.
Tasks have to be executed daily.
Any suggestions, patterns or java API?
Regards!
It seems that you do not want to devide them into tasks, just do like this:
process(unzip(download(uri)));
It depends a bit on external requirements. Is there any user involvement? Monitoring? Alerting?...
The simplest would obviously be just methods that check if the previous has done what it should.
download() downloads file to specified place.
unzip() extracts the file to a specified place if the downloaded file is in place.
process() processes the data if it has been extracted.
A more "formal" way of doing it would be to use a workflow engine. Depending on requirements, you can get some that do everything from fancy UIs, to some that follow formal standardised .XML-definitions of the workflow - and any in between.
http://java-source.net/open-source/workflow-engines
Create one public method to execute the full chain and private methods for the tasks:
public void doIt() {
if (download() == false) {
// download failed
} else if (unzip() == false) {
// unzip failed;
} else (process() == false)
// processing failed
}
}
private boolean download() {/* ... */}
private boolean unzip() {/* ... */}
private boolean process() {/* ... */}
So you have an API that gurantees that all steps are executed in the correct sequence and that a step is only executed if certain conditions are met (the above example just illustrates this pattern)
For daily execution you can use the Quartz Framework.
As the tasks are depending on each other I would recommend to evaluate the error codes or exceptions the tasks are returning. Then just continue if the previous task was successful.
The normal way to perform these tasks is to; call each task in order, and throw an exception when you have a failure which prevents the following tasks being performed. Something like
try {
download();
unzip();
process();
} catch(Exception failed) {
failed.printStackTrace();
}
I think what you are interested in is some kind of transaction definition.
I.e.
- Define TaskA (e.g. download)
- Define TaskB (e.g. unzip)
- Define TaskC (e.g. process)
Assuming that you intention is to have tasks working independent as well, e.g. only download a file (not execute also TaskB, TaskC) you should define Transaction1 composed of TaskA,TaskB,TaskC or Transaction2 composed of only TaskA.
The semantics e.g. concerning Transaction1 that TaskA,TaskB and TaskC should be executed sequentially and all or none could be captured in your transaction definitions.
The definitions can be in xml configuration files and you can use a framework e.g. Quartz for scheduling.
A higher construct shall check for the transactions and execute them as defined.
Dependent tasks execution made easy with Dexecutor
Disclaimer : I am the owner of the library
Basically you need the following pattern
Use Dexecutor.addDependency method
DefaultDexecutor<Integer, Integer> executor = newTaskExecutor();
//Building
executor.addDependency(1, 2);
executor.addDependency(2, 3);
executor.addDependency(3, 4);
executor.addDependency(4, 5);
//Execution
executor.execute(ExecutionConfig.TERMINATING);