Encapsulating a multi-threaded operation in Java - java

I have a situation where I have a large number of classes that need to do file (read only) access. This is part of a web app running on top of OSGI, so there will be a lot of concurrent needs to access.
So I'm building an OSGI service to access the file system for all the other pieces that will need it and provide a centralized access as this also simplifies configuration of file locations, etc.
It occurs to me that a multi-threaded approach makes the most sense along with a thread pool.
So the question is this:
If I do this and I have a service with an interface like:
FileService.getFileAsClass(class);
and the method getFileAsClass(class) looks kinda like this: (this is a sketch it may not be perfect java code)
public < T> T getFileAsClass(Class< T> clazz) {
Future<InputStream> classFuture = threadpool.submit(new Callable< InputStream>() {
/* initialization block */
{
//any setup from configs.
}
/* implement Callable */
public InputStream call() {
InputStream stream = //new inputstream from file location;
boolean giveUp = false;
while(null == stream && !giveUp) {
//Code that tries to read in the file 4
// times with a Thread.sleep() then gives up
// this is here t make sure we aren't busy updating file.
}
return stream;
}
});
//once we have the file, convert it and return it.
return InputStreamToClassConverter< T>.convert(classFuture.get());
}
Will that correctly wait until the relevant operation is done to call InputStreamtoClassConverter.convert?
This is my first time writing multithreaded java code so I'm not sure what I can expect for some of the behavior. I don't care about order of which threads complete, only that the file handling is handled async and once that file pull is done, then and only then is the Converter used.

Related

get input from multiple threads and upload file with fixed size to S3

I write a thread safe class to get input from multiple threads and upload the result to S3 once it runs up to a fixed size.
S3Exporter class
// this class is thread safe.
public class S3Exporter {
private static final int BUFFER_PADDING = 1000;
private final int targetSize;
private final ByteArrayOutputStream buf;
private volatile boolean started;
public S3Exporter(final int targetSize) {
buf = new ByteArrayOutputStream(targetSize + BUFFER_PADDING);
this.targetSize = targetSize;
started = false;
}
public synchronized void start() {
started = true;
}
public synchronized void end() {
started = false;
flush();
}
public synchronized void export(byte[] data) throws IOException {
Preconditions.checkState(started, "Not started!");
buf.write(b, buf.size(), b.length);
flushIfNeeded();
}
private void flushIfNeeded() {
if (buf.size() >= targetSize) {
flush();
}
}
public synchronized void flush() {
if (buf.size() > 0) {
// upload buf to s3, it's a time-consuming operation
buf.reset();
}
}
}
The client calls export method to pass data and if exception is thrown the client will pass that data later.
To avoid losing data when restarting the application, I add a shutdown hook when creating S3Exporter object:
S3Exporter exporter = new S3Exporter(10000);
Runtime.getRuntime().addShutdownHook(new Thread(() -> exporter.end()));
My concern is the class is not scalable, I mean it could become bottleneck of the system when data are getting more. I could figure out 2 ways to improve the situation:
do the time-consuming upload operation asynchronously: use an executor to upload and call ThreadPoolExecutor.awaitTermination() in the shutdown hook.
just put data to a LinkedBlockingQueue in export method and use multiple threads to handle it.( This way is more scalable than the first per my understanding)
Then I need to do more work in the shutdown hook thread to make sure not losing the accepted data and it's not a good idea as I know. I'll take the risk of losing data when restarting the application, which is the last thing I wanna see.
My question
Is my concern about the scalability a really problem?( To make the question less stupid, let's say the data size is a few bytes and TPS to call export method is 500)
If the answer to the 1st question is yes, what about my improvements, are they right? How to do the cleanup work to avoid losing data?
Scalability depends on requirements, constraints, desired service level, personal preferences, expected users growth rate, and especially money: given infinite resources, every piece of software can be scaled. You didn't mention any, so I guess you don't have any actual figure. In this phase, as a programmer, your job is to make a correct program that uses a predictable amount of resources.
Your program seems correct, and most of your assumptions are correct, too. However I suggest to immediately store chunks to some local persistent database (or the raw filesystem) and have a periodic job, run in a separate thread, that upload group of chunks to S3, and remove any shutdown hooks (you can use Camel for the boring parts). This is because such hooks are unreliable and should only be used as last resources for quick and optional cleanup (optional in the sense that you must be prepared that the cleanup could not have been run properly until the end).
Using a file instead of memory, your data can survive fatal errors and the working memory required by your application is almost independent on the load: there's an irrelevant amount of extra CPU and some disk I/O that is way cheaper then memory.

Monitoring directory for changes from web service

Don't know if it is clear from title, I'll explain it deeper.
First of all limitations: Java 1.5 IBM.
This is the situation:
I have spring web service that receives request with pdf document in it. I need to put this pdf into the some input directory that AFP application (not of the importance) monitors. This AFP application takes that pdf, do something with it and returns it to some output directory that I need to monitor. Monitoring of output directory would take some time, probably 30 seconds. Also, I know what is exact file name that I expect to appear in output directory. If nothing appears in 30 seconds than I would return some fault response.
Because of my poor knowledge of web services and multithreading I don't know in which possible problems I can fall into.
Also, searching the internet I realize that most of people recommend watchservice for directory monitoring, but this is introduced in Java 7.
Any suggestion, link, idea would be helpful.
So, the scenario is simple. In a main method, the following actions are done in order:
call the AFP service;
poll the directory for the output file;
deal with the output file.
We suppose here that outputFile is a File containing the absolute path to the generated file; this method returns void, adapt:
// We poll every second, so...
private static final int SAMPLES = 30;
public void dealWithAFP(whatever, arguments, are, there)
throws WhateverIsNecessary
{
callAfpService(here);
int i = 0;
try {
while (i < SAMPLES) {
TimeUnit.SECONDS.sleep(1);
if (outputFile.exists())
break;
}
throw new WhateverIsNecessary();
} catch (InterruptedException e) {
// Throw it back if the method does, otherwise the minimum is to:
Thread.currentThread().interrupt();
throw new WhateverIsNecessary();
}
dealWithOutputFile(outputFile);
}

interrupt all threads in Java in shutdown hook

I have a simple java program that creates a series of temporary files stored in a local tmp directory. I have added a simple shutdown hook that walks through all files and deletes them, then deletes the tmp directory, before exiting the program. here is the code:
Runtime.getRuntime().addShutdownHook(new Thread(new Runnable() {
#Override
public void run() {
File tmpDir = new File("tmp/");
for (File f : tmpDir.listFiles()) {
f.delete();
}
tmpDir.delete();
}
}));
My problem is that the thread that creates these files may not have terminated upon launch of the shutdown hook, and therefore, there may be a file created after listFiles() is called. this causes the tmp dir not to get deleted. I have come up with 2 hacks around this:
Hack # 1:
Runtime.getRuntime().addShutdownHook(new Thread(new Runnable() {
#Override
public void run() {
File tmpDir = new File("tmp/");
while (!tmp.delete()){
for (File f : tmpDir.listFiles()) {
f.delete();
}
}
}
}));
Hack # 2:
Runtime.getRuntime().addShutdownHook(new Thread(new Runnable() {
#Override
public void run() {
try{
Thread.sleep(1000);
} catch(InterruptedException e){
e.printStackTrace();
}
File tmpDir = new File("tmp/");
for (File f : tmpDir.listFiles()) {
f.delete();
}
tmpDir.delete();
}
}));
Neither is a particularly good solution. What would be ideal is to have the shutdown hook wait until all threads have terminated before continuing. Does anyone know if this can be done?
Just keep track of all your running threads and then.join() them before shutting down the program.
This is an answer to the question title as the ewok has said he can't use .deleteOnExit()
What Tyler said, but with a little more detail:
Keep references to the threads where the shutdown hook can access them.
Have the shutdown hook call interrupt on the threads.
Review the code of the threads to make sure they actually respond to interruption (instead of eating the InterruptedException and blundering on, which is typical of a lot of code). An interrupt should prompt the thread to stop looping or blocking, wrap up unfinished business, and terminate.
For each thread where you don't want to proceed until it finishes, check whether the thread is alive and if so call join on it, setting a timeout in case it doesn't finish in a reasonable time, in which case you can decide whether to delete the file or not.
UPDATE: Tyler Heiks accurately pointed out that deleteOnExit() isn't a valid solution since the OP tried it and it did not work. I am providing an alternate solution. It is again indirect, but mainly because the original design using threads and a ShutdownHook is fatally flawed.
Use finally blocks to delete the temp files.
Relying on ShutdownHooks for resource management is a very bad idea and makes the code very difficult to compose or reuse in a larger system. It's an even worse idea to hand resources from thread to thread. Resources like files and streams are among the most dangerous things to share between threads. There is likely very little to gain from this and it would make far more sense for each thread to independently obtain temp files using the library's createTempFile methods and manage their use and deletion using try/finally.
The convention for dealing with the temporary files on the system is to treat them as block boxes where:
location on disk is opaque (irrelevant to and not used directly by the program)
filename is irrelevant
filename is guaranteed to be mutually exclusive
The third above is very difficult to achieve if you hand-roll code to create and name temp files yourself. It is likely to be brittle and fail at the worst times (3AM pager anyone?).
The algorithm you present could delete files created by other processes that coincidentally share the same parent directory. That is unlikely to be a good thing for the stability of those other programs.
Here's the high-level process:
Get Path with Files.createTempFile() (or with legacy pre-Java 7 code File with File.createTempFile())
Use temp file however desired
Delete file
This is similar to InputStream or other resources that need to be manually managed.
That general pattern for explicit resource management (when AutoCloseable and try-with-resources aren't available) is as follows.
Resource r = allocateResource();
try {
useResource(r);
} finally {
releaseResource(r);
}
In the case of Path it looks like this:
Path tempDir = Paths.get("tmp/);
try {
Path p = Files.createTempFile(tempDir, "example", ".tmp");
try {
useTempFile(f);
} finally {
Files.delete(f);
}
} finally {
Files.delete(tempDir);
}
On pre-Java 7 legacy, the use with File looks like this:
File tempDir = new File("tmp/");
try {
File f = File.createTempFile(tempDir, "example", ".tmp");
try {
useTempFile(f);
} finally {
if (!f.delete()) {
handleFailureToDeleteTempFile(f);
}
}
} finally {
if (!tempDir.delete()) {
handleFailureToDeleteTempDir(tempDir);
}
}

Multi-threaded code and condition variable usage

A multi-threaded piece of code accesses a resource (eg: a filesystem) asynchronously.
To achieve this, I'll use condition variables. Suppose the FileSystem is an interface like:
class FileSystem {
// sends a read request to the fileSystem
read(String fileName) {
// ...
// upon completion, execute a callback
callback(returnCode, buffer);
}
}
I have now an application accessing the FileSystem. Suppose I can issue multiple reads through a readFile() method.
The operation should write data to the byte buffer passed to it.
// constructor
public Test() {
FileSystem disk = ...
boolean readReady = ...
Lock lock = ...
Condition responseReady = lock.newCondition();
}
// the read file method in quesiton
public void readFile(String file) {
try {
lock.lock(); // lets imagine this operation needs a lock
// this operation may take a while to complete;
// but the method should return immediately
disk.read(file);
while (!readReady) { // <<< THIS
responseReady.awaitUninterruptibly();
}
}
finally {
lock.unlock();
}
}
public void callback(int returnCode, byte[] buffer) {
// other code snipped...
readReady = true; // <<< AND THIS
responseReady.signal();
}
Is this the correct way to use condition variables? Will readFile() return immediately?
(I know there is some sillyness in using locks for reads, but writing to a file is also an option.)
There's a lot missing from your question (i.e. no specific mention of Threads) but I will try to answer anyway.
Neither the lock nor the conditional variables give you background capabilities -- they just are used for a thread to wait for signals from other threads. Although you don't mention it, the disk.read(file) method could spawn a thread to do the IO and then return immediately but the caller is going to sit in the readReady loop anyway which seems pointless. If the caller has to wait then it could perform the IO itself.
A better pattern could be to use something like the Java 5 Executors service:
ExecutorService pool = Executors.newFixedThreadPool(int numThreads);
You can then call pool.submit(Callable) which will submit the job to be performed in the background in another thread (when the pool next has one available). Submit returns a Future which the caller can use to investigate if the background task has finished. It can return a result object as well. The concurrent classes take care of the locking and conditional signal/wait logic for you.
Hope this helps.
p.s. Also, you should make readReady be volatile since it is not synchronized.

Locking file across services

What is the best way to share a file between two "writer" services in the same application?
Edit:
Sorry I should have given more details I guess.
I have a Service that saves entries into a buffer. When the buffer gets full it writes all the entries to the file (and so on). Another Service running will come at some point and read the file (essentially copy/compress it) and then empty it.
Here is a general idea of what you can do:
public class FileManager
{
private final FileWriter writer = new FileWriter("SomeFile.txt");
private final object sync = new object();
public void writeBuffer(string buffer)
{
synchronized(sync)
{
writer.write(buffer.getBytes());
}
}
public void copyAndCompress()
{
synchronized(sync)
{
// copy and/or compress
}
}
}
You will have to do some extra work to get it all to work safe, but this is just a basic example to give you an idea of how it looks.
A common method for locking is to create a second file in the same location as the main file. The second file may contain locking data or be blank. The benefit to having locking data (such as a process ID) is that you can easily detect a stale lockfile, which is an inevitability you must plan for. Although PID might not be the best locking data in your case.
example:
Service1:
creates myfile.lock
creates/opens myfile
Service2:
Notices that myfile.lock is present and pauses/blocks/waits
When myfile.lock goes away, it creates it and then opens myfile.
It would also be advantageous for you to double-check that the file contains your locking information (identification specific to your service) right after creating it - just in case two or more services are waiting and create a lock at the exact same time. The last one succeeds and so all other services should notice that their locking data is no longer in the file. Also - pause a few milliseconds before checking its contents.

Categories