Find first file in directory with Java - java

I have some sort of batch program that should pick up a file from a directory and process it.
Since this program is supposed to:
run on JDK 5,
be small (no external libraries)
and fast! (with a bang)
...what is the best way to only pick one file from the directory - without using File.list() (might be hundreds of files)?

In Java 7 you could use a DirectoryStream, but in Java 5, the only ways to get directory entries are list() and listFiles().
Note that listing a directory with hundreds of files is not ideal but still probably no big deal compared to processing one of the files. But it would probably start to be problematic once the directory contains many thousands of files.

Use a FileFilter (or FilenameFilter) written to accept only once, for example:
File dir = new File("/some/dir");
File[] files = dir.listFiles(new FileFilter() {
boolean first = true;
public boolean accept(final File pathname) {
if (first) {
first = false;
return true;
}
return false;
}
});

It seems from what you said that you want to process every file in the directory once (including files that get added to the directory). You can do the following: set a monitor on the directory that generates notifications when files are added. you then process each file that you get notified about. Since you use JDK 5 , i suggest using jpathwatch . note that you need to make sure the file writing has finished before trying to process it. after starting the monitor to insure you will be processing every new file, make a one time usage of file listing to process the current content.

Edit: My implementation made use of .list() as you said it wouldn't but It may hold some value anyways :)
If you look at the File implementation public String[] list() method seems to have less overhead than File[] listFiles(). So fastest should be
String[] ss = myDir.list();
File toProcess = null;
for(int i = o ; i< ss.length ; i++){
toProcess = new File(myDir.list()[i], myDir));
if(toProcess.isFile())break;
}
From File.class
public File[] listFiles() {
String[] ss = list();
if (ss == null) return null;
int n = ss.length;
File[] fs = new File[n];
for (int i = 0; i < n; i++) {
fs[i] = new File(ss[i], this);
}
return fs;
}

If one look at the class class FileSystem which it boils down to for filesystem access there is only the list method so there seems to be no other way in "pure" JAVA to select a file than to list them all in a String array.

There is no good solution here on Java 1.5, you can use a filter to get only 1 file, but then java will only return one file but parse over all of them anyways. If you don't need the actual file object you could try something like Runtime.getRuntime().exec("dir") split the returned string on \n and print out the first line :-P

Related

How create function that takes as an argument the name of an operating system

How i create a function in JAVA that takes as an argument the name of an operating system folder and returns the number of files in the folder as well as all its subfolders ?
returns the number of files in the folder as well as all its subfolders ?
Its easy to do using the java File API. The methods you are interested in start with the word list
Number of files in a dir
var numFilesInDir = new File(<directory path>).listFiles().length
Getting subdirectory list
File myDir = new File(<directory path>);
String[] myDirSubdirectoryNames = file.list(new FilenameFilter() {
#Override
public boolean accept(File current, String name) {
return new File(current, name).isDirectory();
}
});
Does this solve your problem ? Leave me a comment to tell.
If you are looking to count all files in the directory and all subdirectories you likely need a file tree walker:
long count = Files.walk(startPath)
.filter(p -> !Files.isDirectory(p))
.count();
There are lots of options for what type of files to include in the walk and maximum depth etc. You can also write your own FileVisitor to do more sophisticated things for each file. See Javadoc https://docs.oracle.com/en/java/javase/16/docs/api/java.base/java/nio/file/Files.html for all the details.
One particular thing to be aware of is that the walk leaves directories open until they are closed. The easiest way to ensure this is done is to use a try-with-resources block to ensure the directories are closed even if an exception is thrown.
For example:
try(var walk = Files.walk(Path.of("/mypath"))) {
long count = walk.filter(p -> !Files.isDirectory(p)).count();
...
} catch (IOException | SecurityException e) {
...
}

Java: How to get File.listOfFiles working non-recursively on linux?

I use this piece of code to find XML files that another part of my program creates in a given directory:
String fileName;
File folder = new File(mainController.XML_FILES_LOCATION);
File[] listOfFiles = folder.listFiles();
for (int i = 0; i < listOfFiles.length; i++) {
if (listOfFiles[i].isFile()) {
fileName = listOfFiles[i].getName();
if (fileName.endsWith(".xml")) {
Document readFile = readFoundXmlFile(fileName);
if (readFile != null) {
boolean postWasSuccesful = mainController.sendXmlFile(readFile, fileName);
reproduceXmlFile(readFile, fileName, postWasSuccesful);
deleteXmlFile(fileName);
}
}
}
}
What it does is that it reads every XML file that gets placed in the given directory, it sends it to an URL and it copies it to a subdirectory (either 'sent' or 'failed' based on the boolean postWasSuccedful) and deletes the original so it won't be sent again.
In Windows this works as expected, but I've transferred the working code to a Linux machine and all of a sudden it get's in this loop of sending bla.xml and a second later sent\bla.xml and again a second later sent\sent\bla.xml followed by sent\sent\sent\bla.xml, etc.
Why is Linux deciding for itself that listFiles() is recursive?? And, better, how to prevent that? I can add an extra check in the if-statement looking for files ending with .xml that there isn't a directory-char allowed in the fileName, but that's a workaround I don't want as the amount of files in the pick-up directory will never be high whereas the amount of files in the sent subdirectory can get quite high after a while and I wouldn't want this piece of code to become slow
My psychic powers tell me that reproduceXmlFile() builds the target pathname using a hard-coded backslash ("\"), and therefore you're actually creating files with backslashes in their names.
You need to use File.separator rather than that hard-coded "\". Or use something like new File("sent", fileName).toString() to generate your output pathnames.
(Apologies if I'm wrong!)

Iterate through a log file in Java. Pull files scanned

I currently have a log file(see bellow) that I need to iterate through and pull out a list of files that were scanned. Then by using that list of files copy the scanned files into a different directory.
So in this case, it would go through and pull
c:\tools\baregrep.exe
c:\tools\baretail.exe
etc etc
and then move them to a folder, say c:\folder\SafeFolder with the same file structure
I wish I had a sample of what the output was on a failed scan, but this will get me a good head start and I can probably figure the rest out
Symantec Image of Log File
Thanks in advanced, I really appreciate any help that you can lend me.
This question is tagged as Java, and as much as I love Java, this problem is something that would be easier and quicker to solve in a language such as Perl (so if you only want the end result and do not need to run in a particular environment then you may wish to use a scripting language instead).
Not a working implementation, but code along the lines of the below is all it would take in perl: (Syntax untested and likely broken as is, only serves as a guideline.. been awhile since I wrote any perl).
use File::Copy;
my $outdir = "c:/out/";
while(<>)
{
my ($path) = /Processing File\s+\'([^\']+)\'/;
my ($file) = $path =~ /(.*\\)+([^\\]+)/;
if (($file) && (-e $path))
{
copy($path,$outdir . $file);
}
}
This should do the trick. Now, just adapt for your solution!
public static void find(String logPath, String safeFolder) throws FileNotFoundException, IOException {
ArrayList<File> files = new ArrayList<File>();
BufferedReader br = new BufferedReader(new FileReader(logPath));
StringBuilder sb = new StringBuilder();
String line = null;
while ((line = br.readLine()) != null) {
Pattern pattern = Pattern.compile("'[a-zA-Z]:\\\\.+?'");
Matcher matcher = pattern.matcher(line);
if (matcher.matches()) {
}
if (matcher.find()) {
files.add(new File(matcher.group()));
System.out.println("Got a new file! " + files.get(files.size() - 1));
}
}
for (File f : files) {
// Make sure we get a file indeed
if (f.exists()) {
if (!f.renameTo(new File(safeFolder, f.getName()))) {
System.err.println("Unable to move file! " + f);
}
} else {
System.out.println("I got a wrong file! " + f);
}
}
}
Its straight forward.
Read the Log file line by line using NEW_LINE as your deliminator. If this is a small file, feel free to load it & process it via String.split("\n") or StringTokenizer
As you loop each line, you need to do a simple test to detect if that string contains 'Processing File '.
If it does, using Regular Expression (harder) or simple parsing to capture the file names. It should be within the ['], so detect the first occurrence of ['], and detect the second, and get the string in between.
If your string is valid (you may test using java.io.File) or existing, you could copy the file name to another file. I would not advise you against copying it in java for memory restrictions for starters.
Instead, copy the string of files to form a batch file to copy them at once using the OS Script like Windows BAT or Bash Script, eg cp 'filename_from' 'copy_to_dir'
Let me know of you need a working example
regards

How to iterate over the files of a certain directory, in Java? [duplicate]

This question already has answers here:
Closed 12 years ago.
The community reviewed whether to reopen this question 1 year ago and left it closed:
Original close reason(s) were not resolved
Possible Duplicate:
Best way to iterate through a directory in java?
I want to process each file in a certain directory using Java.
What is the easiest (and most common) way of doing this?
If you have the directory name in myDirectoryPath,
import java.io.File;
...
File dir = new File(myDirectoryPath);
File[] directoryListing = dir.listFiles();
if (directoryListing != null) {
for (File child : directoryListing) {
// Do something with child
}
} else {
// Handle the case where dir is not really a directory.
// Checking dir.isDirectory() above would not be sufficient
// to avoid race conditions with another process that deletes
// directories.
}
I guess there are so many ways to make what you want. Here's a way that I use. With the commons.io library you can iterate over the files in a directory. You must use the FileUtils.iterateFiles method and you can process each file.
You can find the information here: http://commons.apache.org/proper/commons-io/download_io.cgi
Here's an example:
Iterator it = FileUtils.iterateFiles(new File("C:/"), null, false);
while(it.hasNext()){
System.out.println(((File) it.next()).getName());
}
You can change null and put a list of extentions if you wanna filter. Example: {".xml",".java"}
Here is an example that lists all the files on my desktop. you should change the path variable to your path.
Instead of printing the file's name with System.out.println, you should place your own code to operate on the file.
public static void main(String[] args) {
File path = new File("c:/documents and settings/Zachary/desktop");
File [] files = path.listFiles();
for (int i = 0; i < files.length; i++){
if (files[i].isFile()){ //this line weeds out other directories/folders
System.out.println(files[i]);
}
}
}
Use java.io.File.listFiles
Or If you want to filter the list prior to iteration (or any more complicated use case), use apache-commons FileUtils. FileUtils.listFiles

How to list a 2 million files directory in java without having an "out of memory" exception

I have to deal with a directory of about 2 million xml's to be processed.
I've already solved the processing distributing the work between machines and threads using queues and everything goes right.
But now the big problem is the bottleneck of reading the directory with the 2 million files in order to fill the queues incrementally.
I've tried using the File.listFiles() method, but it gives me a java out of memory: heap space exception. Any ideas?
First of all, do you have any possibility to use Java 7? There you have a FileVisitor and the Files.walkFileTree, which should probably work within your memory constraints.
Otherwise, the only way I can think of is to use File.listFiles(FileFilter filter) with a filter that always returns false (ensuring that the full array of files is never kept in memory), but that catches the files to be processed along the way, and perhaps puts them in a producer/consumer queue or writes the file-names to disk for later traversal.
Alternatively, if you control the names of the files, or if they are named in some nice way, you could process the files in chunks using a filter that accepts filenames on the form file0000000-filefile0001000 then file0001000-filefile0002000 and so on.
If the names are not named in a nice way like this, you could try filtering them based on the hash-code of the file-name, which is supposed to be fairly evenly distributed over the set of integers.
Update: Sigh. Probably won't work. Just had a look at the listFiles implementation:
public File[] listFiles(FilenameFilter filter) {
String ss[] = list();
if (ss == null) return null;
ArrayList v = new ArrayList();
for (int i = 0 ; i < ss.length ; i++) {
if ((filter == null) || filter.accept(this, ss[i])) {
v.add(new File(ss[i], this));
}
}
return (File[])(v.toArray(new File[v.size()]));
}
so it will probably fail at the first line anyway... Sort of disappointing. I believe your best option is to put the files in different directories.
Btw, could you give an example of a file name? Are they "guessable"? Like
for (int i = 0; i < 100000; i++)
tryToOpen(String.format("file%05d", i))
If Java 7 is not an option, this hack will work (for UNIX):
Process process = Runtime.getRuntime().exec(new String[]{"ls", "-f", "/path"});
BufferedReader reader = new BufferedReader(new InputStreamReader(process.getInputStream()));
String line;
while (null != (line = reader.readLine())) {
if (line.startsWith("."))
continue;
System.out.println(line);
}
The -f parameter will speed it up (from man ls):
-f do not sort, enable -aU, disable -lst
In case you can use Java 7 this can be done in this way and you won't have those out of memory problems.
Path path = FileSystems.getDefault().getPath("C:\\path\\with\\lots\\of\\files");
Files.walkFileTree(path, new FileVisitor<Path>() {
#Override
public FileVisitResult preVisitDirectory(Path dir, BasicFileAttributes attrs) throws IOException {
return FileVisitResult.CONTINUE;
}
#Override
public FileVisitResult visitFile(Path file, BasicFileAttributes attrs) throws IOException {
// here you have the files to process
System.out.println(file);
return FileVisitResult.CONTINUE;
}
#Override
public FileVisitResult visitFileFailed(Path file, IOException exc) throws IOException {
return FileVisitResult.TERMINATE;
}
#Override
public FileVisitResult postVisitDirectory(Path dir, IOException exc) throws IOException {
return FileVisitResult.CONTINUE;
}
});
Use File.list() instead of File.listFiles() - the String objects it returns consume less memory than the File objects, and (more importantly, depending on the location of the directory) they don't contain the full path name.
Then, construct File objects as needed when processing the result.
However, this will not work for arbitrarily large directories either. It's an overall better idea to organize your files in a hierarchy of directories so that no single directory has more than a few thousand entries.
You can do that with Apache FileUtils library. No memory problem. I did check with visualvm.
Iterator<File> it = FileUtils.iterateFiles(folder, null, true);
while (it.hasNext())
{
File fileEntry = (File) it.next();
}
Hope that helps.
bye
This also requires Java 7, but it's simpler than the Files.walkFileTree answer if you just want to list the contents of a directory and not walk the whole tree:
Path dir = Paths.get("/some/directory");
try (DirectoryStream<Path> stream = Files.newDirectoryStream(dir)) {
for (Path path : stream) {
handleFile(path.toFile());
}
} catch (IOException e) {
handleException(e);
}
The implementation of DirectoryStream is platform-specific and never calls File.list or anything like it, instead using the Unix or Windows system calls that iterate over a directory one entry at a time.
Since you're on Windows, it seems like you should have simply used ProcessBuilder to start something like "cmd /k dir /b target_directory", capture the output of that, and route it into a file. You can then process that file a line at a time, reading the file names out and dealing with them.
Better late than never? ;)
Why do you store 2 million files in the same directory anyway? I can imagine it slows down access terribly on the OS level already.
I would definitely want to have them divided into subdirectories (e.g. by date/time of creation) already before processing. But if it is not possible for some reason, could it be done during processing? E.g. move 1000 files queued for Process1 into Directory1, another 1000 files for Process2 into Directory2 etc. Then each process/thread sees only the (limited number of) files portioned for it.
At fist you could try to increase the memory of your JVM with passing -Xmx1024m e.g.
Please post the full stack trace of the OOM exception to identify where the bottleneck is, as well as a short, complete Java program showing the behaviour you see.
It is most likely because you collect all of the two million entries in memory, and they don't fit. Can you increase heap space?
If file names follow certain rules, you can use File.list(filter) instead of File.listFiles to get manageable portions of file listing.
I faced same problem when I developed malware scanning application. My solution is execute shell command to list all files. It's faster than recursively methods to browse folder by folder.
see more about shell command here: http://adbshell.com/commands/adb-shell-ls
Process process = Runtime.getRuntime().exec("ls -R /");
BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(process.getInputStream()));
//TODO: Read the stream to get a list of file path.
You could use listFiles with a special FilenameFilter. The first time the FilenameFilter is sent to listFiles it accepts the first 1000 files and then saves them as visited.
The next time FilenameFilter is sent to listFiles, it ignores the first 1000 visited files and returns the next 1000, and so on until complete.
As a first approach you might try tweaking some JVM memory settings, e.g. increase heap size as it was suggested or even use AggressiveHeap option.
Taking into account the large amount of files, this may not help, then I would suggest to workaround the problem. Create several files with filenames in each, say 500k filenames per file and read from them.
Try this, it works to me, but I hadn't so many documents...
File dir = new File("directory");
String[] children = dir.list();
if (children == null) {
//Either dir does not exist or is not a directory
System.out.print("Directory doesn't exist\n");
}
else {
for (int i=0; i<children.length; i++) {
// Get filename of file or directory
String filename = children[i];
}

Categories