I need effective algorithm to keep only ten latest files on disk in particular folder to support some kind of publishing process. Only 10 files should present in this folder at any point of time. Please, give your advises what should be used here.
You can ask the File for the directory to listFiles, if there are more than 9 sort them by lastModified() and delete the files oldest (smallest number) to trim down to 9.
How about using a file system watcher like JNotify?
Register for events that you are interested in (for instance, Created event);
Mark your internal list for the number of files upon every created event.
As soon as you reach the 11th file, remove the file having oldest create date.
Or use Commons JCI FileAlterationMonitor (FAM) to monitor local filesystems and get notified about changes:
ReloadingClassLoader classloader = new ReloadingClassLoader(this.getClass().getClassLoader());
ReloadingListener listener = new ReloadingListener();
listener.addReloadNotificationListener(classloader);
FilesystemAlterationMonitor fam = new FilesystemAlterationMonitor();
fam.addListener(directory, listener);
fam.start();
This discussion may help you with file system watchers.
You'd have to poll the directory at regular intervals and delete everything that's older than the 10th oldest file in it.
Of course that leaves open to question what the "10th oldest file" actually is. The timestamp on the file might not indicate the date/time it was added to the folder after all.
So your system might actually need some independent way to keep track of files in the folder to determine when each was added, in order to delete files based on when they were
put there rather than how old the file actually is.
But that's a business requirement you don't provide (do you even know it yourself?).
Related
I'm adding code to a large JSP web application, integrating functionality to convert CGM files to PDFs (or PDFs to CGMs) to display to the user.
It looks like I can create the converted files and store them in the directory designated by System.getProperty("java.io.tmpdir"). How do I manage their deletion, though? The program resides on a Linux-based server. Will the OS automatically delete from /tmp or will I need to come up with functionality myself? If it's the latter scenario, what are good ways to go about doing it?
EDIT: I see I can use deleteOnExit() (relevant answer elsewhere), but I think the JVM runs more or less continuously in the background so I'm not sure if the exits would be frequent enough.
I don't think I need to cache any converted files--just convert a file anew every time it's needed.
You can do this
File file = File.createTempFile("base_name", ".tmp", new File(temporaryFolderPath));
file.deleteOnExit();
the file will be deleted when the virtual machine terminates
Edit:
If you want to delete it after the job is done, just do it:
File file = null;
try{
file = File.createTempFile("webdav", ".tmp", new File(temporaryFolderPath));
// do sth with the file
}finally{
file.delete();
}
There are ways to have the JVM delete files when the JVM exits using deleteOnExit() but I think there are known memory leaks using that method. Here is a blog explaining the leak: http://www.pongasoft.com/blog/yan/java/2011/05/17/file-dot-deleteOnExit-is-evil/
A better solution would either be to delete old files using a cron or if you know you aren't going to use the file again, why not just delete it after processing?
From your comment :
Also, could I just create something that checks to see if the size of my files exceeds a certain amount, and then deletes the oldest ones if that's true? Or am I overthinking it?
You could create a class that keeps track of the created files with a size limit. When the size of the created files, after creating a new one, goes over the limit, it deletes the oldest one. Beware that this may delete a file that still needs to exist even if it is the oldest one. You might need a way to know which files still need to be kept and delete only those that are not needed anymore.
You could have a timer in the class to check periodically instead of after each creation. This solution is tied to your application while using a cron isn't.
I saw this nifty guide on how to do streaming file uploads via Apache Commons. This got me thinking where is the data stored? And is it necessary to "close" or "clean" that location?
Thanks!
where is the data stored?
I don't think it is stored.
The Streaming API doesn't use DiskFileItemFactory. But it does use a buffer for copying data as BalusC has posted.
Once you have the stream of the upload, you can use
long bytesCopied = Streams.copy(yourInputStream, yourOutputStream, true);
Look at the API
Here is the javadoc for DiskFileItemFactory.
The default FileItemFactory implementation. This implementation
creates FileItem instances which keep their content either in memory,
for smaller items, or in a temporary file on disk, for larger items.
The size threshold, above which content will be stored on disk, is
configurable, as is the directory in which temporary files will be
created.
If not otherwise configured, the default configuration values are as
follows:
Size threshold is 10KB.
Repository is the system default temp directory, as returned by System.getProperty("java.io.tmpdir").
Temporary files, which are created for file items, should be deleted
later on. The best way to do this is using a FileCleaningTracker,
which you can set on the DiskFileItemFactory. However, if you do use
such a tracker, then you must consider the following: Temporary files
are automatically deleted as soon as they are no longer needed. (More
precisely, when the corresponding instance of File is garbage
collected.) This is done by the so-called reaper thread, which is
started automatically when the class FileCleaner is loaded. It might
make sense to terminate that thread, for example, if your web
application ends. See the section on "Resource cleanup" in the users
guide of commons-fileupload.
So, yes close and cleanup are necessary, as FileItem may denote a real file on disk.
It's stored as a byte[] in the Java memory.
Every 5 minutes in a thread I create new files and store them into a folder.
Every day at 11:10 A.M., I have to delete the old files. However, one condition is that to be deleted a file must have been created before this 11:00 A.M. Files created after 11:00 should not be deleted. How can I list the files at 11:10 and delete those from before 11:00? How to delete just those files? Please can anyone help me?
There are various methods available in the File class which can help.
To list the files in a directory use the listFiles method. This will return an array of Files which you can iterate over.
To check when a file was last modified use the lastModified method.
To delete a file use the delete method.
You also need to work out the value of 11:10am so that it can be compared to the file's last modified time. You can use the Calendar class for this.
First you should create a cronjob or a scheduled task that runs your java application at arround 11:10.
For determining if the file needs to be deleted check out the API of "File" (e.g. "lastModified()" and "delete()":
http://download.oracle.com/javase/6/docs/api/java/io/File.html
We have to monitor the change on a remote system file, that we acces throught FTP, SMB.
We do not have any SSH access to the remote system / os. Our only view of the remote system is what FTP or Samba let us see.
What we do today :
periodicly scan the whole directory, construct a representation in memory for doing our stuff, and then merge it with what we have in database.
What we would like to do :
Being able to determine if the directory have change, and thus if a parsing is needed. Ideally, never have to do a full parsing. We dont want to rely too much on the OS capability ( inodes... )because it could change from a installation to another.
Main Goal : This process begin to be slow when the amount of data is very large. Only a few % of this date is new and need to be parsed. How parse and add to our database only this part ?
The leads we discuss at this moment :
Checking the size of folder
using checksum on file
Checking the last date of modification of folder / file
What we really want :
Some input and best practice, because this problem seams pretty commons, and should have bean already discussed, and we dont want to end up doing something overly complicated on this point.
Thanks in advance, a bunch of fellow developpers ;-)
We use a java/spring/hibernate stack, but i dont think that matters much here.
Edit : basicly, we acces a FTP server or equivalent. A local copy is not a option, since the amount of data is way to large.
The Remote Directory Poller for Java (rdp4j) library can help you out with polling your FTP location and notify you with the following events: file Added/Removed/Modified in a directory. It uses the lastModified date for each file in the directory and compares them with previous poll.
See complete User Guide, which contains implementations of the FtpDirectory and MyListener in below quick tutorial of the API:
package example
import java.util.concurrent.TimeUnit;
import com.github.drapostolos.rdp4j.DirectoryPoller;
import com.github.drapostolos.rdp4j.spi.PolledDirectory;
public class FtpExample {
public static void main(String[] args) throws Exception {
String host = "ftp.mozilla.org";
String workingDirectory = "pub/addons";
String username = "anonymous";
String password = "anonymous";
PolledDirectory polledDirectory = new FtpDirectory(host, workingDirectory, username, password);
DirectoryPoller dp = DirectoryPoller.newBuilder()
.addPolledDirectory(polledDirectory)
.addListener(new MyListener())
.setPollingInterval(10, TimeUnit.MINUTES)
.start();
TimeUnit.HOURS.sleep(2);
dp.stop();
}
}
You cannot use directory sizes or modification dates to tell if subdirectories have changed. Full stop. At a minimum you have to do a full directory listing of the whole tree.
You may be able to avoid reading file contents if you are satisified you can rely on the combination of the modification date and time.
My suggestion is use off-the-shelf software to create a local clone (e.g. rsync, robocopy) then do the comparison/parse on the local clone. The question "is it updated" is then a question for rsync to answer.
As previously mentioned, there is no way you can track directories via FTP or SMB. What you can do is to list all files on the remote server and construct a snapshot that contains:
for file: name, size and modification date,
for directory: name and latest modification date among its contents,
Using this information you will be able to determine which directories need to be looked into and which files need to be transferred.
The safe and portable solution is to use a strong hash/checksum such as SHA1 or (preferably) SHA512. The hash can be mapped to whatever representation you want to compute and store. You can use the following recursive recipe (adapted from the Git version control system):
The hash of a file is the hash of its contents, disregarding the name;
to hash a directory, consider it as a sorted list of filename-hash pairs in a textual representation and hash that.
Maybe prepend f to every file and d to every directory representation before hashing.
You could also put the directory under version control using Git (or Mercurial, or whatever you like), periodically git add everything in it, use git status to find out what was updated, and git commit the changes.
I want to save a video file in C:\ by incrementing the file name e.g. video001.avi video002.avi video003.avi etc. i want to do this in java. The program is on
Problem in java programming on windows7 (working well in windows xp)
How do i increment the file name so that it saves without replacing the older file.
Using the File.createNewFile() you can atomically create a file and determine whether or not the current thread indeed created the file, the return value of that method is a boolean that will let you know whether or not a new file was created or not. Simply checking whether or not a file exists before you create it will help but will not guarantee that when you create and write to the file that the current thread created it.
You have two options:
just increment a counter, and rely on the fact that you're the only running process writing these files (and none exist already). So you don't need to check for clashes. This is (obviously) prone to error.
Use the File object (or Apache Commons FileUtils) to get the list of files, then increment a counter and determine if the corresponding file exists. If not, then write to it and exit. This is a brute force approach, but unless you're writing thousands of files, is quite acceptable performance-wise.