File deletion in Java - java

Every 5 minutes in a thread I create new files and store them into a folder.
Every day at 11:10 A.M., I have to delete the old files. However, one condition is that to be deleted a file must have been created before this 11:00 A.M. Files created after 11:00 should not be deleted. How can I list the files at 11:10 and delete those from before 11:00? How to delete just those files? Please can anyone help me?

There are various methods available in the File class which can help.
To list the files in a directory use the listFiles method. This will return an array of Files which you can iterate over.
To check when a file was last modified use the lastModified method.
To delete a file use the delete method.
You also need to work out the value of 11:10am so that it can be compared to the file's last modified time. You can use the Calendar class for this.

First you should create a cronjob or a scheduled task that runs your java application at arround 11:10.
For determining if the file needs to be deleted check out the API of "File" (e.g. "lastModified()" and "delete()":
http://download.oracle.com/javase/6/docs/api/java/io/File.html

Related

Amazon S3 Copy 1.5 millions of objects with renaming of the folder

I have folder with 1.5 millions of objects (about 5 TB of data) which has folders with the next format 123-John.
I need to copy all these folders content in the new folders with renaming it to format 123.
I want to do it by the means of java.
Obviously I can't just do it one by one like this:
ObjectListing objectListing = s3.listObjects(listObjectsRequest);
boolean processable = true;
while (processable) {
processable = objectListing.isTruncated();
renameAndCopyOneByOne(objectListing.getObjectSummaries()); // this edits name and makes call to s3.copyObject()
if (processable) {
objectListing = s3.listNextBatchOfObjects(objectListing);
}
}
it would lead to making about 1.5 millions calls to
s3.copyObject(bucket, sourceKey, bucket, destinationKey)
I wanted to do it with batch , but the thing is that it could be done only with creating of manifest file in CSV format with format like
bucketName,keyName
But this is just manifest for the objects I want to make action to. I can't list locations where to save to and specify edited folder name. And also I still have to split CSV with 1.5 millions into smaller ones and create several request to S3 to create several jobs which would be not obvious to track.
Could you please give me a hint what from AWS tools would perfectly suffice all my needs for this task?
Well, after some time spent on how to do it properly I think the only way is to make such migration by some batch job from Java, to split the load.
Because AWS does not have proper tool for my case.

How to manage the creation and deletion of temporary files

I'm adding code to a large JSP web application, integrating functionality to convert CGM files to PDFs (or PDFs to CGMs) to display to the user.
It looks like I can create the converted files and store them in the directory designated by System.getProperty("java.io.tmpdir"). How do I manage their deletion, though? The program resides on a Linux-based server. Will the OS automatically delete from /tmp or will I need to come up with functionality myself? If it's the latter scenario, what are good ways to go about doing it?
EDIT: I see I can use deleteOnExit() (relevant answer elsewhere), but I think the JVM runs more or less continuously in the background so I'm not sure if the exits would be frequent enough.
I don't think I need to cache any converted files--just convert a file anew every time it's needed.
You can do this
File file = File.createTempFile("base_name", ".tmp", new File(temporaryFolderPath));
file.deleteOnExit();
the file will be deleted when the virtual machine terminates
Edit:
If you want to delete it after the job is done, just do it:
File file = null;
try{
file = File.createTempFile("webdav", ".tmp", new File(temporaryFolderPath));
// do sth with the file
}finally{
file.delete();
}
There are ways to have the JVM delete files when the JVM exits using deleteOnExit() but I think there are known memory leaks using that method. Here is a blog explaining the leak: http://www.pongasoft.com/blog/yan/java/2011/05/17/file-dot-deleteOnExit-is-evil/
A better solution would either be to delete old files using a cron or if you know you aren't going to use the file again, why not just delete it after processing?
From your comment :
Also, could I just create something that checks to see if the size of my files exceeds a certain amount, and then deletes the oldest ones if that's true? Or am I overthinking it?
You could create a class that keeps track of the created files with a size limit. When the size of the created files, after creating a new one, goes over the limit, it deletes the oldest one. Beware that this may delete a file that still needs to exist even if it is the oldest one. You might need a way to know which files still need to be kept and delete only those that are not needed anymore.
You could have a timer in the class to check periodically instead of after each creation. This solution is tied to your application while using a cron isn't.

In Pentaho kettle, how to check the filename is exists or not?

I am new to pentaho kettle...
For now, I have a folder contain many .txt files.
Let say for example: 20121012.txt, 20121014.txt.....
Everytime I run the kettle job, it will grep all these files for import into database.
I need to handle the checking before import into db to prevent data duplication.
The problem is that, how can I let the kettle notice the filename which is already imported?
For example:
20121012.txt <=if this file is imported, it will check the filename of it on next time, if it is same filename, then it will be not imported.
In this case, I cannot just simply set a specific file "20121012.txt" in the step "Check if files exists". It was because the txt file is large amount. If the filename refer to a day, then 1 year contain 365-366 days. I cannot hard code all days file in this way.
So, the possible way is to check the filename of that process file whether the filename is existed before import into database.
And that is my question that how can I do this? What steps or work flow I need to use?
Could anyone provide the detail step that is possible to do this?
I am looking forward to hearing from you and please let me know if you need more information.
Thanks all for helping!
You can do this by storing the already processed file list in a place like a table in the database. Load in the table in another step, then join the streams from the steps with a merge and pass through only those files from the file load step that are not in the other stream.
Make sure to later update your already processed table with any newly processed files later on.
You can use "Get File Names" step. In this step: set the folder(s) which store your files, and then set the wildcard (for example ".*" if you want all files from folder).
If your database stores already imported filenames, you can make your transformation indepotent by using "Database Lookup" to check if your filename is already in database, and then filter on a stream, to pass only filenames that weren't found in the database.

Saving a video file by increamenting file name in a loop

I want to save a video file in C:\ by incrementing the file name e.g. video001.avi video002.avi video003.avi etc. i want to do this in java. The program is on
Problem in java programming on windows7 (working well in windows xp)
How do i increment the file name so that it saves without replacing the older file.
Using the File.createNewFile() you can atomically create a file and determine whether or not the current thread indeed created the file, the return value of that method is a boolean that will let you know whether or not a new file was created or not. Simply checking whether or not a file exists before you create it will help but will not guarantee that when you create and write to the file that the current thread created it.
You have two options:
just increment a counter, and rely on the fact that you're the only running process writing these files (and none exist already). So you don't need to check for clashes. This is (obviously) prone to error.
Use the File object (or Apache Commons FileUtils) to get the list of files, then increment a counter and determine if the corresponding file exists. If not, then write to it and exit. This is a brute force approach, but unless you're writing thousands of files, is quite acceptable performance-wise.

Ten latest files on disk

I need effective algorithm to keep only ten latest files on disk in particular folder to support some kind of publishing process. Only 10 files should present in this folder at any point of time. Please, give your advises what should be used here.
You can ask the File for the directory to listFiles, if there are more than 9 sort them by lastModified() and delete the files oldest (smallest number) to trim down to 9.
How about using a file system watcher like JNotify?
Register for events that you are interested in (for instance, Created event);
Mark your internal list for the number of files upon every created event.
As soon as you reach the 11th file, remove the file having oldest create date.
Or use Commons JCI FileAlterationMonitor (FAM) to monitor local filesystems and get notified about changes:
ReloadingClassLoader classloader = new ReloadingClassLoader(this.getClass().getClassLoader());
ReloadingListener listener = new ReloadingListener();
listener.addReloadNotificationListener(classloader);
FilesystemAlterationMonitor fam = new FilesystemAlterationMonitor();
fam.addListener(directory, listener);
fam.start();
This discussion may help you with file system watchers.
You'd have to poll the directory at regular intervals and delete everything that's older than the 10th oldest file in it.
Of course that leaves open to question what the "10th oldest file" actually is. The timestamp on the file might not indicate the date/time it was added to the folder after all.
So your system might actually need some independent way to keep track of files in the folder to determine when each was added, in order to delete files based on when they were
put there rather than how old the file actually is.
But that's a business requirement you don't provide (do you even know it yourself?).

Categories