Java: deleting gz files failed occasionally - java

UPDATE 2: Please close this question
After further debugging it is found that the problem is not in the inner try block, but a bug inside the 'while' loop. An exception was caused there and was not caught, which therefore skips the inner try block. Apologies for my mistake, please delete this thread.
UPDATE: added logging to capture errors during delete.
I am downloading 8000ish GZ files from a server, process its content locally, then delete the downloaded copy upon completion. I am running this over a number of threads, each process a disjoint batch of GZ files. But I do not understand for what reason that my code does not successfully delete the GZ files occasionally (not always). The code generally looks like this:
....
private static final Logger LOG = Logger.getLogger(....class.getName());
.....
for (String inputGZFile : gzFiles) { //gzFiles is a list of urls to be process by this thread
try {
File downloadTo = new
File(this.outFolder + "/" + new File(downloadFrom.getPath()).getName());
FileUtils.copyURLToFile(downloadFrom, downloadTo);
InputStream fileStream = new FileInputStream(downloadTo);
InputStream gzipStream = new GZIPInputStream(fileStream);
Reader decoder = new InputStreamReader(gzipStream, Charset.forName("utf8"));
Scanner inputScanner = new Scanner(decoder);
inputScanner.useDelimiter(" .");
while (inputScanner.hasNextLine() && (content = inputScanner.nextLine()) != null) {
//do something
}
try {
inputScanner.close();
FileUtils.forceDelete(downloadTo);
}catch (Exception e){
LOG.info("\t thread " + id + " deleting gz file error "+ inputGZFile);
LOG.info("\t thread " + id+ExceptionUtils.getFullStackTrace(e));
}
}catch(Exception e){
e.printStackTrace();
}
}
The only reason I can think of is that the scanner did not close the file or release the file handle. But that would be strange because I already call the close method to close the scanner.
Any suggestions highly appreiciated.

Without the ability to look into your log files, or debug your system first hand, it is close to impossible to tell you what is going wrong here.
But what you can definitely do: do that call to FileUtils.forceDelete(downloadTo); within a finally block for example.
The whole point of try/catch/finally is to enable you to enforce that specific actions always take place, no matter what happened in the try block!
Also note: if you are unable to tell what your code does, then add logging support to it. So that instead of printStackTrace(); you log the whole exception to a place where it does not get lost.
Meaning: the real answer here is that you step back and take the necessary actions to find out where your problems are coming from.

Related

Prevent Text File From Being Deleted When Accessed From Multiple Threads

I'm trying to debug a problem that just surfaced in my program. Until now, I've been writing, reading and updating props file with no problem using the following code structure:
public void setAndReplacePropValue(String dir, String key, String value) throws FileNotFoundException, IOException {
if (value != null) {
File file = new File(dir);
if (!file.exists()) {
System.out.println("File: " + dir + " is not present. Attempting to create new file now..");
new FilesAndFolders().createTextFileWithDirsIfNotPresent(dir);
}
if (file.exists()) {
try {
FileInputStream fileInputStream = null;
fileInputStream = new FileInputStream(file);
if (fileInputStream != null) {
Properties properties = new Properties();
properties.load(fileInputStream);
fileInputStream.close();
if (properties != null) {
FileOutputStream fileOutputStream = new FileOutputStream(file);
properties.setProperty(key, value);
properties.store(fileOutputStream, null);
fileOutputStream.close();
}
}
}
catch (Exception e) {
e.printStackTrace();
}
} else {
System.out.println("File: " + dir + " does not exist and attempt to create new file failed");
}
}
}
However, recently I noticed that a specific file (let's call it: C:\\Users\\Admin\\Desktop\\props.txt) is being deleted after being updated from multiple threads. I'm not sure the exact source of this error, as it seems to happen randomly.
I thought that, perhaps, if two threads call setAndReplacePropValue() and the first thread calls FileOutputStream fileOutputStream = new FileOutputStream(file); before it has a chance to re-write data to file (via properties.store(fileOutputStream, null) ) then the second thread might call fileInputStream = new FileInputStream(file); on an empty file - causing the thread to delete previous data when writing 'empty' data back to file.
To test my hypothesis I tried calling setAndReplacePropValue() from multiple threads several hundred to thousand times in a row while making changes to setAndReplacePropValue() as needed. Here are my results:
If setAndReplace() is declared as static + synchronized the original props data is preserved. This remains true even when I add a random delay after calling FileOutputStream - as long as JVM exists normally. If JVM is killed/terminated (after FileOutputStream is called) then previous data will be deleted.
If I remove both static and synchronized modifiers from setAndReplace() and call setAndReplace() 5,000 times, the old data is still preserved (why?) - as long as JVM ends normally. This appears to be true even when I add random delay in setAndReplace() (after calling FileOutputStream).
When I try modifying props file using ExecutorService (I occasionally access setAndReplacePropValue() via ExecutorService in my program), file content is preserved as long as there's no delay after FileOutputStream. If I add delay and the delay is > 'timout' value set in future.get() (so interrupted exception is thrown) the data is NOT preserved. This remains true even if I add static + synchronized keywords to method.
In short, my question is what is the most likely explanation for why file is being deleted? (I thought point 3 might explain error but I'm not actually sleeping after calling new FileOutputStream() so presumably this would not prevent data from being written back to file after calling new FileOutputStream()). Is there another possibility I didn't think of?
Also, why is point 2 true? If method is not declared as static/synchronized shouldn't this cause one thread to create InputStream from empty file? Thanks.
Unfortunately it is very difficult to provide feedback on your code without a ton of additional information but hopefully, my comments will be helpful.
In general, having multiple threads reading and writing from the same file is a really bad idea. I can't agree more with #Hovercraft-Full-Of-Eels who recommends that you have 1 thread do the reading/writing and the other threads just add updates to a shared BlockingQueue.
But that said here are some comments.
If setAndReplace() is declared as static + synchronized the original props data is preserved.
Right, this stops the terrible race condition in your code where 2 threads could be trying to write to the output file at the same time. Or it could be that 1 thread starts to write and another thread reads an empty file causing data to be lost.
If JVM is killed/terminated (after FileOutputStream is called) then previous data will be deleted.
I don't quite understand this part but your code should have good try/finally clauses to make sure that the files are closed appropriately when the JVM terminates. If the JVM is hard-killed then the file may have been opened but not be written yet (depending on timing). In this case, I would recommend that you write to a temporary file and rename to your properties file which is atomic. Then you might miss the update if the JVM is killed but the file will never be overwritten and be empty.
If I remove both static and synchronized modifiers from setAndReplace() and call setAndReplace() 5,000 times, the old data is still preserved (why?)
No idea. Depends on race conditions. Maybe you are just getting lucky.
When I try modifying props file using ExecutorService (I occasionally access setAndReplacePropValue() via ExecutorService in my program), file content is preserved as long as there's no delay after FileOutputStream. If I add delay and the delay is > 'timeout' value set in future.get() (so interrupted exception is thrown) the data is NOT preserved. This remains true even if I add static + synchronized keywords to method.
I can't answer that without seeing the specific code.
This would be a good idea actually if you had a fixed thread pool with 1 thread, then each of the threads that want to update a value would just submit the field/value object to the thread-pool. This is approximately what #Hovercraft-Full-Of-Eels was talking about.
Hope this helps.

How to check that file is opened by another process in Java? [duplicate]

I need to write a custom batch File renamer. I've got the bulk of it done except I can't figure out how to check if a file is already open. I'm just using the java.io.File package and there is a canWrite() method but that doesn't seem to test if the file is in use by another program. Any ideas on how I can make this work?
Using the Apache Commons IO library...
boolean isFileUnlocked = false;
try {
org.apache.commons.io.FileUtils.touch(yourFile);
isFileUnlocked = true;
} catch (IOException e) {
isFileUnlocked = false;
}
if(isFileUnlocked){
// Do stuff you need to do with a file that is NOT locked.
} else {
// Do stuff you need to do with a file that IS locked
}
(The Q&A is about how to deal with Windows "open file" locks ... not how implement this kind of locking portably.)
This whole issue is fraught with portability issues and race conditions:
You could try to use FileLock, but it is not necessarily supported for your OS and/or filesystem.
It appears that on Windows you may be unable to use FileLock if another application has opened the file in a particular way.
Even if you did manage to use FileLock or something else, you've still got the problem that something may come in and open the file between you testing the file and doing the rename.
A simpler though non-portable solution is to just try the rename (or whatever it is you are trying to do) and diagnose the return value and / or any Java exceptions that arise due to opened files.
Notes:
If you use the Files API instead of the File API you will get more information in the event of a failure.
On systems (e.g. Linux) where you are allowed to rename a locked or open file, you won't get any failure result or exceptions. The operation will just succeed. However, on such systems you generally don't need to worry if a file is already open, since the OS doesn't lock files on open.
// TO CHECK WHETHER A FILE IS OPENED
// OR NOT (not for .txt files)
// the file we want to check
String fileName = "C:\\Text.xlsx";
File file = new File(fileName);
// try to rename the file with the same name
File sameFileName = new File(fileName);
if(file.renameTo(sameFileName)){
// if the file is renamed
System.out.println("file is closed");
}else{
// if the file didnt accept the renaming operation
System.out.println("file is opened");
}
On Windows I found the answer https://stackoverflow.com/a/13706972/3014879 using
fileIsLocked = !file.renameTo(file)
most useful, as it avoids false positives when processing write protected (or readonly) files.
org.apache.commons.io.FileUtils.touch(yourFile) doesn't check if your file is open or not. Instead, it changes the timestamp of the file to the current time.
I used IOException and it works just fine:
try
{
String filePath = "C:\sheet.xlsx";
FileWriter fw = new FileWriter(filePath );
}
catch (IOException e)
{
System.out.println("File is open");
}
I don't think you'll ever get a definitive solution for this, the operating system isn't necessarily going to tell you if the file is open or not.
You might get some mileage out of java.nio.channels.FileLock, although the javadoc is loaded with caveats.
Hi I really hope this helps.
I tried all the options before and none really work on Windows. The only think that helped me accomplish this was trying to move the file. Event to the same place under an ATOMIC_MOVE. If the file is being written by another program or Java thread, this definitely will produce an Exception.
try{
Files.move(Paths.get(currentFile.getPath()),
Paths.get(currentFile.getPath()), StandardCopyOption.ATOMIC_MOVE);
// DO YOUR STUFF HERE SINCE IT IS NOT BEING WRITTEN BY ANOTHER PROGRAM
} catch (Exception e){
// DO NOT WRITE THEN SINCE THE FILE IS BEING WRITTEN BY ANOTHER PROGRAM
}
If file is in use FileOutputStream fileOutputStream = new FileOutputStream(file); returns java.io.FileNotFoundException with 'The process cannot access the file because it is being used by another process' in the exception message.

Pause the execution of Java if files are used

My application writes to Excel files. Sometimes the file can be used, in that case the FileNotFoundException thrown and then I do not know how to handle it better.
I am telling the user that the file is used and after that message I do not want to close the application, but to stop and wait while the file is available (assuming that it is opened by the same user). But I do not understand how to implement it. file.canWrite() doesn't work, it returns true even when the file is opened, to use FileLock and check that the lock is available I need to open a stream, but it throws FileNotFoundException (I've been thinking about checking the lock in a busy wait, I know that it is not a good solution, but I can't find another one).
This is a part of my code if it can help somehow to understand my problem:
File file = new File(filename);
FileOutputStream out = null;
try {
out = new FileOutputStream(file);
FileChannel channel = out.getChannel();
FileLock lock = channel.lock();
if (lock == null) {
new Message("lock not available");
// to stop the program here and wait when the file is available, then resume
}
// write here
lock.release();
}
catch (IOException e) {
new Message("Blocked");
// or to stop here and then create another stream when the file is available
}
What makes it more difficult for me is that it writes to different files, and if the first file is available, but the second is not, then it will update one file and then stop, and if I restart the program, it will update it again, so I can't allow the program to write into files until all of them are available.
I believe that there should be a common solution, since it must be a common issue in Windows to deal with such cases, but I can't find it.
To wait until a file exists you can make a simple loop:
File file = new File(filename);
while (!file.exists()) {
try {
Thread.sleep(100);
} catch (InterruptedException ie) { /* safe to ignore */ }
}
A better solution could be using WatchService but it's more code to implement.
The File.canWrite method only tells you if a path can be written to; if the path names a file that doesn't exist it will return false. You could use the canRead method instead of exists in a loop like above.
To use a file locks, the file has to exist first, so that wouldn't work either.
The only way to be sure you can write to a file is to try to open it. If the file doesn't exist, the java.io API will create it. To open a file for writing without creating you can use the java.nio.file.Files class:
try (OutputStream out = Files.newOutputStream(file.toPath(),
StandardOpenOption.WRITE))
{
// exists and is writable
} catch (IOException) {
// doesn't exist or can't be opened for writing
}

Rolling file implementation

I am always curious how a rolling file is implemented in logs.
How would one even start creating a file writing class in any language in order to ensure that the file size is not exceeded.
The only possible solution I can think of is this:
write method:
size = file size + size of string to write
if(size > limit)
close the file writer
open file reader
read the file
close file reader
open file writer (clears the whole file)
remove the size from the beginning to accommodate for new string to write
write the new truncated string
write the string we received
This seems like a terrible implementation, but I can not think up of anything better.
Specifically I would love to see a solution in java.
EDIT: By remove size from the beginning is, let's say I have 20 byte string (which is the limit), I want to write another 3 byte string, therefore I remove 3 bytes from the beginning, and am left with end 17 bytes, and by appending the new string I have 20 bytes.
Because your question made me look into it, here's an example from the logback logging framework. The RollingfileAppender#rollover() method looks like this:
public void rollover() {
synchronized (lock) {
// Note: This method needs to be synchronized because it needs exclusive
// access while it closes and then re-opens the target file.
//
// make sure to close the hereto active log file! Renaming under windows
// does not work for open files
this.closeOutputStream();
try {
rollingPolicy.rollover(); // this actually does the renaming of files
} catch (RolloverFailure rf) {
addWarn("RolloverFailure occurred. Deferring roll-over.");
// we failed to roll-over, let us not truncate and risk data loss
this.append = true;
}
try {
// update the currentlyActiveFile
currentlyActiveFile = new File(rollingPolicy.getActiveFileName());
// This will also close the file. This is OK since multiple
// close operations are safe.
// COMMENT MINE this also sets the new OutputStream for the new file
this.openFile(rollingPolicy.getActiveFileName());
} catch (IOException e) {
addError("setFile(" + fileName + ", false) call failed.", e);
}
}
}
As you can see, the logic is pretty similar to what you posted. They close the current OutputStream, perform the rollover, then open a new one (openFile()). Obviously, this is all done in a synchronized block since many threads are using the logger, but only one rollover should occur at a time.
A RollingPolicy is a policy on how to perform a rollover and a TriggeringPolicy is when to perform a rollover. With logback, you usually base these policies on file size or time.

How to find out which thread is locking a file in java?

I'm trying to delete a file that another thread within my program has previously worked with.
I'm unable to delete the file but I'm not sure how to figure out which thread may be using the file.
So how do I find out which thread is locking the file in java?
I don't have a straight answer (and I don't think there's one either, this is controlled at OS-level (native), not at JVM-level) and I also don't really see the value of the answer (you still can't close the file programmatically once you found out which thread it is), but I think you don't know yet that the inability to delete is usually caused when the file is still open. This may happen when you do not explicitly call Closeable#close() on the InputStream, OutputStream, Reader or Writer which is constructed around the File in question.
Basic demo:
public static void main(String[] args) throws Exception {
File file = new File("c:/test.txt"); // Precreate this test file first.
FileOutputStream output = new FileOutputStream(file); // This opens the file!
System.out.println(file.delete()); // false
output.close(); // This explicitly closes the file!
System.out.println(file.delete()); // true
}
In other words, ensure that throughout your entire Java IO stuff the code is properly closing the resources after use. The normal idiom is to do this in the try-with-resources statement, so that you can be certain that the resources will be freed up anyway, even in case of an IOException. E.g.
try (OutputStream output = new FileOutputStream(file)) {
// ...
}
Do it for any InputStream, OutputStream, Reader and Writer, etc whatever implements AutoCloseable, which you're opening yourself (using the new keyword).
This is technically not needed on certain implementations, such as ByteArrayOutputStream, but for the sake of clarity, just adhere the close-in-finally idiom everywhere to avoid misconceptions and refactoring-bugs.
In case you're not on Java 7 or newer yet, then use the below try-finally idiom instead.
OutputStream output = null;
try {
output = new FileOutputStream(file);
// ...
} finally {
if (output != null) try { output.close(); } catch (IOException logOrIgnore) {}
}
Hope this helps to nail down the root cause of your particular problem.
About this question, I also try to find out this answer, and ask this question and find answer:
Every time when JVM thread lock a file exclusively, also JVM lock
some Jave object, for example, I find in my case:
sun.nio.fs.NativeBuffer
sun.nio.ch.Util$BufferCache
So you need just find this locked Java object and analyzed them and
you find what thread locked your file.
I not sure that it work if file just open (without locked exclusively), but I'm sure that is work if file be locked exclusively by Thread (using java.nio.channels.FileLock, java.nio.channels.FileChannel and so on)
More info see this question

Categories