Delete file after staring connection using FileInputStream

Delete file after staring connection using FileInputStream - java

I have a temporary file which I want to send the client from the controller in the Play Framework. Can I delete the file after opening a connection using FileInputStream? For example can I do something like this -
File file = getFile();
InputStream is = new FileInputStream(file);
file.delete();
renderBinary(is, "name.txt");
What if file is a large file? If I delete the file, will subsequent reads() on InputStream give an error? I have tried with files of around 1MB I don't get an error.
Sorry if this is a very naive question, but I could not find anything related to this and I am pretty new to Java

I just encountered this exact same scenario in some code I was asked to work on. The programmer was creating a temp file, getting an input stream on it, deleting the temp file and then calling renderBinary. It seems to work fine even for very large files, even into the gigabytes.
I was surprised by this and am still looking for some documentation that indicates why this works.
UPDATE: We did finally encounter a file that caused this thing to bomb. I think it was over 3 Gb. At that point, it became necessary to NOT delete the file while the rendering was in process. I actually ended up using the Amazon Queue service to queue up messages for these files. The messages are then retrieved by a scheduled deletion job. Works out nicely, even with clustered servers on a load balancer.

It seems counter-intuitive that the FileInputStream can still read after the file is removed.
DiskLruCache, a popular library in the Android world originating from the libcore of the Android platform, even relies on this "feature", as follows:
// Open all streams eagerly to guarantee that we see a single published
// snapshot. If we opened streams lazily then the streams could come
// from different edits.
InputStream[] ins = new InputStream[valueCount];
try {
for (int i = 0; i < valueCount; i++) {
ins[i] = new FileInputStream(entry.getCleanFile(i));
}
} catch (FileNotFoundException e) {
....
As #EJP pointed out in his comment on a similar question, "That's how Unix and Linux behave. Deleting a file is really deleting its name from the directory: the inode and the data persist while any processes have it open."
But I don't think it is a good idea to rely on it.

Related

How to prevent file wipe if an error occurs while writing to it?

This is an issue I have had in many applications.
I want to change the information inside a file, which has an outdated version.
In this instance, I am updating the file that records playlists after adding a song to a playlist. (For reference, I am creating an app for android.)
The problem is if I run this code:
FileOutputStream output = new FileOutputStream(file);
output.write(data.getBytes());
output.close();
And if an IOException occurs while trying to write to the file, the data is lost (since creating an instance of FileOutputStream empties the file). Is there a better method to do this, so if an IOException occurs, the old data remains intact? Or does this error only occur when the file is read-only, so I just need to check for that?
My only "work around" is to inform the user of the error, and give said user the correct data, which the user has to manually update. While this might work for a developer, there is a lot of issues that could occur if this happens. Additionally, in this case, the user doesn't have permission to edit the file themselves, so the "work around" doesn't work at all.
Sorry if someone else has asked this. I couldn't find a result when searching.
Thanks in advance!

One way you could ensure that you do not wipe the file is by creating a new file with a different name first. If writing that file succeeds, you could delete the old file and rename the new one.
There is the possibility that renaming fails. To be completely safe from that, your files could be named according to the time at which they are created. For instance, if your file is named save.dat, you could add the time at which the file was saved (from System.currentTimeMillis()) to the end of the file's name. Then, no matter what happens later (including failure to delete the old file or rename the new one), you can recover the most recent successful save. I have included a sample implementation below which represents the time as a 16-digit zero-padded hexadecimal number appended to the file extension. A file named save.dat will be instead saved as save.dat00000171ed431353 or something similar.
// name includes the file extension (i.e. "save.dat").
static File fileToSave(File directory, String name) {
return new File(directory, name + String.format("%016x", System.currentTimeMillis()));
}
// return the entire array if you need older versions for which deletion failed. This could be useful for attempting to purge any unnecessary older versions for instance.
static File fileToLoad(File directory, String name) {
File[] files = directory.listFiles((dir, n) -> n.startsWith(name));
Arrays.sort(files, Comparator.comparingLong((File file) -> Long.parseLong(file.getName().substring(name.length()), 16)).reversed());
return files[0];
}

What is the recommended way to append to files on HDFS?

I'm having trouble figuring out a safe way to append to files in HDFS.
I'm using a small, 3-node Hadoop cluster (CDH v.5.3.9 to be specific). Our process is a data pipeliner which is multi-threaded (8 threads) and it has a stage which appends lines of delimited text to files in a dedicated directory on HDFS. I'm using locks to synchronize access of the threads to the buffered writers which append the data.
My first issue is deciding on the approach generally.
Approach A is to open the file, append to it, then close it for every line appended. This seems slow and would seem to create too many small blocks, or at least I see some such sentiment in various posts.
Approach B is to cache the writers but periodically refresh them to make sure the list of writers doesn't grow unbounded (currently, it's one writer per each input file processed by the pipeliner). This seems like a more efficient approach but I imagine having open streams over a period of time however controlled may be an issue, especially for output file readers (?)
Beyond this, my real issues are two. I am using the FileSystem Java Hadoop API to do the appending and am intermittently getting these 2 exceptions:
org.apache.hadoop.ipc.RemoteException: failed to create file /output/acme_20160524_1.txt for DFSClient_NONMAPREDUCE_271210261_1 for client XXX.XX.XXX.XX because current leaseholder is trying to recreate file.
org.apache.hadoop.ipc.RemoteException: BP-1999982165-XXX.XX.XXX.XX-1463070000410:blk_1073760252_54540 does not exist or is not under Constructionblk_1073760252_545
40{blockUCState=UNDER_RECOVERY, primaryNodeIndex=1, replicas=[ReplicaUnderConstruction[[DISK]DS-ccdf4e55-234b-4e17-955f-daaed1afdd92:NORMAL|RBW], ReplicaUnderConst
ruction[[DISK]DS-1f66db61-759f-4c5d-bb3b-f78c260e338f:NORMAL|RBW]]}
Anyone have any ideas on either of those?
For the first problem, I've tried instrumenting logic discussed in this post but didn't seem to help.
I'm also interested in the role of the dfs.support.append property, if at all applicable.
My code for getting the file system:
userGroupInfo = UserGroupInformation.createRemoteUser("hdfs"); Configuration conf = new Configuration();
conf.set(key1, val1);
...
conf.set(keyN, valN);
fileSystem = userGroupInfo.doAs(new PrivilegedExceptionAction<FileSystem>() {
public FileSystem run() throws Exception {
return FileSystem.get(conf);
}
});
My code for getting the OutputStream:
org.apache.hadoop.fs.path.Path file = ...
public OutputStream getOutputStream(boolean append) throws IOException {
OutputStream os = null;
synchronized (file) {
if (isFile()) {
os = (append) ? fs.append(file) : fs.create(file, true);
} else if (append) {
// Create the file first, to avoid "failed to append to non-existent file" exception
FSDataOutputStream dos = fs.create(file);
dos.close();
// or, this can be: fs.createNewFile(file);
os = fs.append(file);
}
// Creating a new file
else {
os = fs.create(file);
}
}
return os;
}

I got file appending working with CDH 5.3 / HDFS 2.5.0. My conclusions so far are as follows:
Cannot have one dedicated thread doing appends per file, or multiple threads writing to multiple files, whether we’re writing data via one and the same instance of the HDFS API FileSystem, or different instances.
Cannot refresh (i.e. close and reopen) the writers; they must stay open.
This last item leads to occasional relatively rare ClosedChannelException which appears to be recoverable (by retrying to append).
We use a single thread executor service with a blocking queue (one for appending to all files); a writer per file, the writers stay open (till the end of processing when they’re closed).
When we upgrade to CDH newer than 5.3, we’ll want to revisit this and see what threading strategy makes sense: one and only thread, one thread per file, multiple threads writing to multiple files. Additionally, we’ll want to see if writers can be/need to be periodically closed and reopened.
In addition, I have seen the following error as well, and was able to make it go away by setting 'dfs.client.block.write.replace-datanode-on-failure.policy' to 'NEVER' on the client side.
java.io.IOException: Failed to replace a bad datanode on the existing pipeline due to no more good datanodes being available to try. (Nodes: current=[XXX.XX.XXX.XX:50010, XXX.XX.XXX.XX:50010], original=[XXX.XX.XXX.XX:50010, XXX.XX.XXX.XX:50010]). The current failed datanode replacement policy is DEFAULT, and a client may configure this via 'dfs.client.block.write.replace-datanode-on-failure.policy' in its configuration.
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:969) ~[hadoop-hdfs-2.5.0.jar:?]
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:1035) ~[hadoop-hdfs-2.5.0.jar:?]
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1184) ~[hadoop-hdfs-2.5.0.jar:?]
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:532) ~[hadoop-hdfs-2.5.0.jar:?]

How to check that file is opened by another process in Java? [duplicate]

I need to write a custom batch File renamer. I've got the bulk of it done except I can't figure out how to check if a file is already open. I'm just using the java.io.File package and there is a canWrite() method but that doesn't seem to test if the file is in use by another program. Any ideas on how I can make this work?

Using the Apache Commons IO library...
boolean isFileUnlocked = false;
try {
org.apache.commons.io.FileUtils.touch(yourFile);
isFileUnlocked = true;
} catch (IOException e) {
isFileUnlocked = false;
}
if(isFileUnlocked){
// Do stuff you need to do with a file that is NOT locked.
} else {
// Do stuff you need to do with a file that IS locked
}

(The Q&A is about how to deal with Windows "open file" locks ... not how implement this kind of locking portably.)
This whole issue is fraught with portability issues and race conditions:
You could try to use FileLock, but it is not necessarily supported for your OS and/or filesystem.
It appears that on Windows you may be unable to use FileLock if another application has opened the file in a particular way.
Even if you did manage to use FileLock or something else, you've still got the problem that something may come in and open the file between you testing the file and doing the rename.
A simpler though non-portable solution is to just try the rename (or whatever it is you are trying to do) and diagnose the return value and / or any Java exceptions that arise due to opened files.
Notes:
If you use the Files API instead of the File API you will get more information in the event of a failure.
On systems (e.g. Linux) where you are allowed to rename a locked or open file, you won't get any failure result or exceptions. The operation will just succeed. However, on such systems you generally don't need to worry if a file is already open, since the OS doesn't lock files on open.

// TO CHECK WHETHER A FILE IS OPENED
// OR NOT (not for .txt files)
// the file we want to check
String fileName = "C:\\Text.xlsx";
File file = new File(fileName);
// try to rename the file with the same name
File sameFileName = new File(fileName);
if(file.renameTo(sameFileName)){
// if the file is renamed
System.out.println("file is closed");
}else{
// if the file didnt accept the renaming operation
System.out.println("file is opened");
}

On Windows I found the answer https://stackoverflow.com/a/13706972/3014879 using
fileIsLocked = !file.renameTo(file)
most useful, as it avoids false positives when processing write protected (or readonly) files.

org.apache.commons.io.FileUtils.touch(yourFile) doesn't check if your file is open or not. Instead, it changes the timestamp of the file to the current time.
I used IOException and it works just fine:
try
{
String filePath = "C:\sheet.xlsx";
FileWriter fw = new FileWriter(filePath );
}
catch (IOException e)
{
System.out.println("File is open");
}

I don't think you'll ever get a definitive solution for this, the operating system isn't necessarily going to tell you if the file is open or not.
You might get some mileage out of java.nio.channels.FileLock, although the javadoc is loaded with caveats.

Hi I really hope this helps.
I tried all the options before and none really work on Windows. The only think that helped me accomplish this was trying to move the file. Event to the same place under an ATOMIC_MOVE. If the file is being written by another program or Java thread, this definitely will produce an Exception.
try{
Files.move(Paths.get(currentFile.getPath()),
Paths.get(currentFile.getPath()), StandardCopyOption.ATOMIC_MOVE);
// DO YOUR STUFF HERE SINCE IT IS NOT BEING WRITTEN BY ANOTHER PROGRAM
} catch (Exception e){
// DO NOT WRITE THEN SINCE THE FILE IS BEING WRITTEN BY ANOTHER PROGRAM
}

If file is in use FileOutputStream fileOutputStream = new FileOutputStream(file); returns java.io.FileNotFoundException with 'The process cannot access the file because it is being used by another process' in the exception message.

Xuggle can't open in-memory input

I am working on a program that integrates Hadoop's MapReduce framework with Xuggle. For that, I am implementing a IURLProtocolHandlerFactory class that reads and writes from and to in-memory Hadoop data objects.
You can see the relevant code here:
https://gist.github.com/4191668
The idea is to register each BytesWritable object in the IURLProtocolHandlerFactory class with a UUID so that when I later refer to that name while opening the file it returns a IURLProtocolHandler instance that is attached to that BytesWritable object and I can read and write from and to memory.
The problem is that I get an exception like this:
java.lang.RuntimeException: could not open: byteswritable:d68ce8fa-c56d-4ff5-bade-a4cfb3f666fe
at com.xuggle.mediatool.MediaReader.open(MediaReader.java:637)
(see also under the posted link)
When debugging I see that the objects are correctly found in the factory, what's more, they are even being read from in the protocol handler. If I remove the listeners from/to the output file, the same happens, so the problem is already with the input. Digging deeper in the code of Xuggle I reach the JNI code (which tries to open the file) and I can't get further than this. This apparently returns an error code.
XugglerJNI.IContainer_open__SWIG_0
I would really appreciate some hint where to go next, how should I continue debugging. Maybe my implementation has a flaw, but I can't see it.

I think the problem you are running into is that a lot of the types of inputs/outputs are converted to a native file descriptor in the IContainer JNI code, but the thing you are passing cannot be converted. It may not be possible to create your own IURLProtocolHandler in this way, because it would, after a trip through XuggleIO.map(), just end up calling IContainer again and then into the IContainer JNI code which will probably try to get a native file descriptor and call avio_open().
However, there may be a couple of things that you can open in IContainer which are not files/have no file descriptors, and which would be handled correctly. The things you can open can be seen in the IContainer code, namely java.io.DataOutput and java.io.DataOutputStream (and the corresponding inputs). I recommend making your DataInput/DataOutput implementation which wraps around BytesReadable/BytesWriteable, and opening it in IContainer.
If that doesn't work, then write your inputs to a temp file and read the outputs from a temp file :)

You can copy file to local first and then try open the container:
filePath = split.getPath();
final FileSystem fileSystem = filePath.getFileSystem(job);
Path localFile = new Path(filePath.getName());
fileSystem.createNewFile(localFile);
fileSystem.copyToLocalFile(filePath, localFile);
int result = container.open(filePath.getName(), IContainer.Type.READ, null);
This code works for me in the RecordReader class.
In your case you may copy the file to local first and then try to create the MediaReader

Reading a file and editing it in Java

What I am doing is I am reading in a html file and I am looking for a specific location in the html for me to enter some text.
So I am using a bufferedreader to read in the html file and split it by the tag . I want to enter some text before this but I am not sure how to do this. The html would then be along the lines of ...(newText)(/HEAD) (The brackets round head are meant to be angled brackets. Don't know how to insert them)
Would I need a PrintWriter to the same file and if so, how would I tell that to write it in the correct location.
I am not sure which way would be most efficient to do something like this.
Please Help.
Thanks in advance.
Here is part of my java code:
File f = new File("newFile.html");
FileOutputStream fos = new FileOutputStream(f);
PrintWriter pw = new PrintWriter(fos);
BufferedReader read = new BufferedReader(new FileReader("file.html"));
String str;
int i=0;
boolean found = false;
while((str= read.readLine()) != null)
{
String[] data = str.split("</HEAD>");
if(found == false)
{
pw.write(data[0]);
System.out.println(data[0]);
pw.write("</script>");
found = true;
}
if(i < 1)
{
pw.write(data[1]);
System.out.println(data[1]);
i++;
}
pw.write(str);
System.out.println(str);
}
}
catch (Exception e) {
e.printStackTrace( );
}
When I do this it gets to a point in the file and I get these errors:
FATAL ERROR: MERLIN: Unable to connect to EDG API,
Cannot find .edg_properties file.,
java.lang.OutOfMemoryError: unable to create new native thread,
Cannot truncate table,
EXCEPTION:Cannot open connection to server: SQLExceptio,
Caught IOException: java.io.IOException: JZ0C0: Connection is already closed, ...
I'm not sure why I get these or what all of these mean?
please Help.

Should be pretty easy:
Read file into a String
Split into before/after chunks
Open a temp file for writing
Write before chunk, your text, after chunk
Close up, and move temp file to original
Sounds like you are wondering about the last couple steps in particular. Here is the essential code:
File htmlFile = ...;
...
File tempFile = File.createTempFile("foo", ".html");
FileWriter writer = new FileWriter(tempFile);
writer.write(before);
writer.write(yourText);
writer.write(after);
writer.close();
tempFile.renameTo(htmlFile);

Most people suggest writing to a temporary file and then copying the temporary file over the original on successful completion.

The forum thread has some ideas of how to do it.
GL.

For reading and writing you can use FileReaders/FileWriters or the corresponding IO stream classes.
For the editing, I'd suggest to use an HTML parser to handle the document. It can read the HTML document into an internal datastructure which simplifies your effort to search for content and apply modification. (Most?) Parsers can serialize the document to HTML again.
At least you're sure to not corrupt the HTML document structure.

Following up on the list of errors in your edit, a lot of that possibly stems from the OutOfMemoryError. That means you simply ran out of memory in the JVM, so Java was unable to allocate objects. This may be caused by a memory leak in your application, or it could simply be that the work you're trying to do does need more memory transiently than you have allocated it.
You can increase the amount of memory that the JVM starts up with by providing the Xmx argument to the java executable, e.g.:
-Xmx1024m
would set the maximum heap size to 1024 megabytes.
The other issues might possibly caused by this; when objects can't reliably be created or modified, lots of weird things tend to happen. That said, there's a few things that look like you can take action. In particular, whatever MERLIN is it looks like it can't do it's work because it needs a property file for EDG, which it's unable to find in the location it's looking. You'll probably need to either put a config file there, or tell it to look at another location.
The other IOExceptions are fairly self-explanatory. Your program could not establish a connection to the server because of a SQLException (the underlying exception itself will probably be found in the logs); and some other part of the program tried to communicate to a remote machine using a closed connection.
I'd look at fixing the properties file (if it's not a benign error) and the memory issues first, and then seeing if any of the remaining problems still manifest.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.