This should be really simple. If I have a String like this:
../Test?/sample*.txt
then what is a generally-accepted way to get a list of files that match this pattern? (e.g. it should match ../Test1/sample22b.txt and ../Test4/sample-spiffy.txt but not ../Test3/sample2.blah or ../Test44/sample2.txt)
I've taken a look at org.apache.commons.io.filefilter.WildcardFileFilter and it seems like the right beast but I'm not sure how to use it for finding files in a relative directory path.
I suppose I can look the source for ant since it uses wildcard syntax, but I must be missing something pretty obvious here.
(edit: the above example was just a sample case. I'm looking for the way to parse general paths containing wildcards at runtime. I figured out how to do it based on mmyers' suggestion but it's kind of annoying. Not to mention that the java JRE seems to auto-parse simple wildcards in the main(String[] arguments) from a single argument to "save" me time and hassle... I'm just glad I didn't have non-file arguments in the mix.)
Try FileUtils from Apache commons-io (listFiles and iterateFiles methods):
File dir = new File(".");
FileFilter fileFilter = new WildcardFileFilter("sample*.java");
File[] files = dir.listFiles(fileFilter);
for (int i = 0; i < files.length; i++) {
System.out.println(files[i]);
}
To solve your issue with the TestX folders, I would first iterate through the list of folders:
File[] dirs = new File(".").listFiles(new WildcardFileFilter("Test*.java");
for (int i=0; i<dirs.length; i++) {
File dir = dirs[i];
if (dir.isDirectory()) {
File[] files = dir.listFiles(new WildcardFileFilter("sample*.java"));
}
}
Quite a 'brute force' solution but should work fine. If this doesn't fit your needs, you can always use the RegexFileFilter.
Consider DirectoryScanner from Apache Ant:
DirectoryScanner scanner = new DirectoryScanner();
scanner.setIncludes(new String[]{"**/*.java"});
scanner.setBasedir("C:/Temp");
scanner.setCaseSensitive(false);
scanner.scan();
String[] files = scanner.getIncludedFiles();
You'll need to reference ant.jar (~ 1.3 MB for ant 1.7.1).
Here are examples of listing files by pattern powered by Java 7 nio globbing and Java 8 lambdas:
try (DirectoryStream<Path> dirStream = Files.newDirectoryStream(
Paths.get(".."), "Test?/sample*.txt")) {
dirStream.forEach(path -> System.out.println(path));
}
or
PathMatcher pathMatcher = FileSystems.getDefault()
.getPathMatcher("regex:Test./sample\\w+\\.txt");
try (DirectoryStream<Path> dirStream = Files.newDirectoryStream(
new File("..").toPath(), pathMatcher::matches)) {
dirStream.forEach(path -> System.out.println(path));
}
Since Java 8 you can use Files#find method directly from java.nio.file.
public static Stream<Path> find(Path start,
int maxDepth,
BiPredicate<Path, BasicFileAttributes> matcher,
FileVisitOption... options)
Example usage
Files.find(startingPath,
Integer.MAX_VALUE,
(path, basicFileAttributes) -> path.toFile().getName().matches(".*.pom")
);
Or an example of putting items in a simple string collection:
import java.io.UncheckedIOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.ArrayList;
import java.util.Collection;
import java.util.stream.Stream;
final Collection<String> simpleStringCollection = new ArrayList<>();
String wildCardValue = "*.txt";
final Path dir = Paths.get(".");
try {
Stream<Path> results = Files.find(dir,
Integer.MAX_VALUE,
(path, basicFileAttributes) -> path.toFile().getName().matches(wildCardValue)
);
results.forEach(p -> simpleStringCollection.add(p.toString()));
} catch (IOException e) {
throw new UncheckedIOException(e);
}
You could convert your wildcard string to a regular expression and use that with String's matches method. Following your example:
String original = "../Test?/sample*.txt";
String regex = original.replace("?", ".?").replace("*", ".*?");
This works for your examples:
Assert.assertTrue("../Test1/sample22b.txt".matches(regex));
Assert.assertTrue("../Test4/sample-spiffy.txt".matches(regex));
And counter-examples:
Assert.assertTrue(!"../Test3/sample2.blah".matches(regex));
Assert.assertTrue(!"../Test44/sample2.txt".matches(regex));
Might not help you right now, but JDK 7 is intended to have glob and regex file name matching as part of "More NIO Features".
The wildcard library efficiently does both glob and regex filename matching:
http://code.google.com/p/wildcard/
The implementation is succinct -- JAR is only 12.9 kilobytes.
Simple Way without using any external import is to use this method
I created csv files named with billing_201208.csv ,billing_201209.csv ,billing_201210.csv and it looks like working fine.
Output will be the following if files listed above exists
found billing_201208.csv
found billing_201209.csv
found billing_201210.csv
//Use Import ->import java.io.File
public static void main(String[] args) {
String pathToScan = ".";
String target_file ; // fileThatYouWantToFilter
File folderToScan = new File(pathToScan);
File[] listOfFiles = folderToScan.listFiles();
for (int i = 0; i < listOfFiles.length; i++) {
if (listOfFiles[i].isFile()) {
target_file = listOfFiles[i].getName();
if (target_file.startsWith("billing")
&& target_file.endsWith(".csv")) {
//You can add these files to fileList by using "list.add" here
System.out.println("found" + " " + target_file);
}
}
}
}
As posted in another answer, the wildcard library works for both glob and regex filename matching: http://code.google.com/p/wildcard/
I used the following code to match glob patterns including absolute and relative on *nix style file systems:
String filePattern = String baseDir = "./";
// If absolute path. TODO handle windows absolute path?
if (filePattern.charAt(0) == File.separatorChar) {
baseDir = File.separator;
filePattern = filePattern.substring(1);
}
Paths paths = new Paths(baseDir, filePattern);
List files = paths.getFiles();
I spent some time trying to get the FileUtils.listFiles methods in the Apache commons io library (see Vladimir's answer) to do this but had no success (I realise now/think it can only handle pattern matching one directory or file at a time).
Additionally, using regex filters (see Fabian's answer) for processing arbitrary user supplied absolute type glob patterns without searching the entire file system would require some preprocessing of the supplied glob to determine the largest non-regex/glob prefix.
Of course, Java 7 may handle the requested functionality nicely, but unfortunately I'm stuck with Java 6 for now. The library is relatively minuscule at 13.5kb in size.
Note to the reviewers: I attempted to add the above to the existing answer mentioning this library but the edit was rejected. I don't have enough rep to add this as a comment either. Isn't there a better way...
You should be able to use the WildcardFileFilter. Just use System.getProperty("user.dir") to get the working directory. Try this:
public static void main(String[] args) {
File[] files = (new File(System.getProperty("user.dir"))).listFiles(new WildcardFileFilter(args));
//...
}
You should not need to replace * with [.*], assuming wildcard filter uses java.regex.Pattern. I have not tested this, but I do use patterns and file filters constantly.
Glob of Java7: Finding Files. (Sample)
The Apache filter is built for iterating files in a known directory. To allow wildcards in the directory also, you would have to split the path on '\' or '/' and do a filter on each part separately.
Using Java streams only
Path testPath = Paths.get("C:\");
Stream<Path> stream =
Files.find(testPath, 1,
(path, basicFileAttributes) -> {
File file = path.toFile();
return file.getName().endsWith(".java");
});
// Print all files found
stream.forEach(System.out::println);
Why not use do something like:
File myRelativeDir = new File("../../foo");
String fullPath = myRelativeDir.getCanonicalPath();
Sting wildCard = fullPath + File.separator + "*.txt";
// now you have a fully qualified path
Then you won't have to worry about relative paths and can do your wildcarding as needed.
Implement the JDK FileVisitor interface. Here is an example http://wilddiary.com/list-files-matching-a-naming-pattern-java/
Util Method:
public static boolean isFileMatchTargetFilePattern(final File f, final String targetPattern) {
String regex = targetPattern.replace(".", "\\."); //escape the dot first
regex = regex.replace("?", ".?").replace("*", ".*");
return f.getName().matches(regex);
}
jUnit Test:
#Test
public void testIsFileMatchTargetFilePattern() {
String dir = "D:\\repository\\org\my\\modules\\mobile\\mobile-web\\b1605.0.1";
String[] regexPatterns = new String[] {"_*.repositories", "*.pom", "*-b1605.0.1*","*-b1605.0.1", "mobile*"};
File fDir = new File(dir);
File[] files = fDir.listFiles();
for (String regexPattern : regexPatterns) {
System.out.println("match pattern [" + regexPattern + "]:");
for (File file : files) {
System.out.println("\t" + file.getName() + " matches:" + FileUtils.isFileMatchTargetFilePattern(file, regexPattern));
}
}
}
Output:
match pattern [_*.repositories]:
mobile-web-b1605.0.1.pom matches:false
mobile-web-b1605.0.1.war matches:false
_remote.repositories matches:true
match pattern [*.pom]:
mobile-web-b1605.0.1.pom matches:true
mobile-web-b1605.0.1.war matches:false
_remote.repositories matches:false
match pattern [*-b1605.0.1*]:
mobile-web-b1605.0.1.pom matches:true
mobile-web-b1605.0.1.war matches:true
_remote.repositories matches:false
match pattern [*-b1605.0.1]:
mobile-web-b1605.0.1.pom matches:false
mobile-web-b1605.0.1.war matches:false
_remote.repositories matches:false
match pattern [mobile*]:
mobile-web-b1605.0.1.pom matches:true
mobile-web-b1605.0.1.war matches:true
_remote.repositories matches:false
The most simple and easy way by using the io library's File class would be :
String startingdir="The directory name";
String filenameprefix="The file pattern"
File startingDirFile=new File(startingdir);
final File[] listFiles=startingDirFile.listFiles(new FilenameFilter() {
public boolean accept(File arg0,String arg1)
{System.out.println(arg0+arg1);
return arg1.matches(filenameprefix);}
});
System.out.println(Arrays.toString(listFiles));
Related
I want to get path after given token "html" which is a fix token and file path is below
String token = "html"
Path path = D:\data\test\html\css\Core.css
Expected Output : css\Core.css
below is input folder for the program. and defined as the constant in the code.
public static final String INPUT_DIR = "D:\data\test\html"
which will contains input html, css, js files. and want to copy these files to different location E:\data\test\html\ here so just need to extract sub path after html from the input file path to append it to the output path.
lets say input file are
D:\data\test\html\css\Core.css
D:\data\test\html\css\Core.html
D:\data\test\html\css\Core.js
so want to extract css\Core.css, css\Core.html, css\Core.js to append it to the destination path E:\data\test\html\ to copy it.
Tried below
String [] array = path.tostring().split("html");
String subpath = array[1];
Output : \css\Core.css
which is not expected output expected output is css\Core.css
Also above code is not working for below path
Path path = D:\data\test\html\bla\bla\html\css\Core.css;
String [] array = path.toString().split("html");
String subpath = array[1];
In this case I am getting something like \bla\bla\ which is not
expected.
If you only need the path in the form of a string another solution would be to use this code:
String path = "D:\\data\\test\\html\\css\\Core.css";
String keyword = "\\html";
System.out.println(path.substring(path.lastIndexOf(keyword) + keyword.length()).trim());
You can replace the path with file.getAbsolutePath() as mentioned above.
import java.io.File;
public class Main {
public static void main(String[] args) {
// Create a File object for the directory that you want to start from
File directory = new File("/path/to/starting/directory");
// Get a list of all files and directories in the directory
File[] files = directory.listFiles();
// Iterate through the list of files and directories
for (File file : files) {
// Check if the file is a directory
if (file.isDirectory()) {
// If it's a directory, recursively search for the file
findFile(file, "target-file.txt");
} else {
// If it's a file, check if it's the target file
if (file.getName().equals("target-file.txt")) {
// If it's the target file, print the file path
System.out.println(file.getAbsolutePath());
}
}
}
}
public static void findFile(File directory, String targetFileName) {
// Get a list of all files and directories in the directory
File[] files = directory.listFiles();
// Iterate through the list of files and directories
for (File file : files) {
// Check if the file is a directory
if (file.isDirectory()) {
// If it's a directory, recursively search for the file
findFile(file, targetFileName);
} else {
// If it's a file, check if it's the target file
if (file.getName().equals(targetFileName)) {
// If it's the target file, print the file path
System.out.println(file.getAbsolutePath());
}
}
}
}
}
This code uses a recursive function to search through all subdirectories of the starting directory and print the file path of the target file (in this case, "target-file.txt") if it is found.
You can modify this code to suit your specific needs, such as changing the starting directory or target file name. You can also modify the code to perform different actions on the target file, such as reading its contents or copying it to another location.
Your question lacks details.
Is the "path" a Path or a String?
How do you determine which part of the "path" you want?
Do you know the entire structure of the "path" or do you just have the delimiting part, for example the html?
Here are six different ways (without iterating, as you stated in your comment). The first two use methods of java.nio.file.Path. The next two use methods of java.lang.String. The last two use regular expressions. Note that there are probably also other ways.
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class PathTest {
public static void main(String[] args) {
// D:\data\test\html\css\Core.css
Path path = Paths.get("D:", "data", "test", "html", "css", "Core.css");
System.out.println("Path: " + path);
Path afterHtml = Paths.get("D:", "data", "test", "html").relativize(path);
System.out.println("After 'html': " + afterHtml);
System.out.println("subpath(3): " + path.subpath(3, path.getNameCount()));
String str = path.toString();
System.out.println("replace: " + str.replace("D:\\data\\test\\html\\", ""));
System.out.println("substring: " + str.substring(str.indexOf("html") + 5));
System.out.println("split: " + str.split("\\\\html\\\\")[1]);
Pattern pattern = Pattern.compile("\\\\html\\\\(.*$)");
Matcher matcher = pattern.matcher(str);
if (matcher.find()) {
System.out.println("regex: " + matcher.group(1));
}
}
}
Running the above code produces the following output:
Path: D:\data\test\html\css\Core.css
After 'html': css\Core.css
subpath(3): css\Core.css
replace: css\Core.css
substring: css\Core.css
split: css\Core.css
regex: css\Core.css
I assume you know how to modify the above in order to
I want to get file path after /test
i have a class which reads the list available in particular location,
the following is my code,
import java.io.File;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
public class ExceptionInFileHandling {
#SuppressWarnings({ "rawtypes", "unchecked" })
public static void GetDirectory(String a_Path, List a_files, List a_folders) throws IOException {
try {
File l_Directory = new File(a_Path);
File[] l_files = l_Directory.listFiles();
for (int c = 0; c < l_files.length; c++) {
if (l_files[c].isDirectory()) {
a_folders.add(l_files[c].getName());
} else {
a_files.add(l_files[c].getName());
}
}
} catch (Exception ex){
ex.printStackTrace();
}
}
#SuppressWarnings("rawtypes")
public static void main(String args[]) throws IOException {
String filesLocation = "asdfasdf/sdfsdf/";
List l_Files = new ArrayList(), l_Folders = new ArrayList();
GetDirectory(filesLocation, l_Files, l_Folders);
System.out.println("Files");
System.out.println("---------------------------");
for (Object file : l_Files) {
System.out.println(file);
}
System.out.println("Done");
}
}
in this the file path can be passed as argument and that should be taken up based on the OS,
filePath.replaceAll("\\\\|/", "\\" + System.getProperty("file.separator"))
is this correct?
There are better ways to use file paths...
// Don't do this
filePath.replaceAll("\\\\|/", "\\" + System.getProperty("file.separator"))
Use java.nio.file.path:
import java.nio.file.*;
Path path = Paths.get(somePathString);
// Here is your system independent path
path.toAbsolutePath();
// Or this works too
Paths.get(somePathString).toAbsolutePath();
Use File.seperator:
// You can also input a String that has a proper file seperator like so
String filePath = "SomeDirectory" + File.separator;
// Then call your directory method
try{
ExceptionInFileHandling.GetDirectory(filePath, ..., ...);
} catch (Exception e){}
So a simple change to your method will now work cross platform:
#SuppressWarnings({ "rawtypes", "unchecked" })
public static void GetDirectory(String a_Path, List a_files, List a_folders) throws IOException {
try {
// File object is instead constructed
// with a URI by using Path.toUri()
// Change is done here
File l_Directory = new File(Paths.get(a_Path).toUri());
File[] l_files = l_Directory.listFiles();
for (int c = 0; c < l_files.length; c++) {
if (l_files[c].isDirectory()) {
a_folders.add(l_files[c].getName());
} else {
a_files.add(l_files[c].getName());
}
}
} catch (Exception ex){
ex.printStackTrace();
}
}
You can use forward slashes as directory separators on Windows as well when calling File constructor.
why you are not adding java defined file separator instead of creating a string then replacing all.
try it like
String filesLocation = "asdfasdf"+File.separator+"sdfsdf"+File.separator;
The answer from you should be correct. There is another similiar thread with answer :
Java regex to replace file path based on OS
Platform independent paths in Java
You could generate a Path object from the passed argument. Then you would not need to handle the file separator on your own.
public static void main(String[] args) {
Path path = Paths.get(args[0]);
System.out.println("path = " + path.toAbsolutePath());
}
The code is able to handle following passed arguments.
foo\bar
foo/bar
foo\bar/baz
foo\\bar
foo//baz
foo\\bar//baz
...
First, you should not used relative path like asdfasdf/sdfsdf/. It's a big source of bugs as your path depends on your working directory.
That thing said, your replaceAll is quite good but it can be improved like this :
filePath.replaceAll(
"[/\\\\]+",
Matcher.quoteReplacement(System.getProperty("file.separator")));
Using quoteReplacement is adviced in replaceAll documentation
Returns a literal replacement String for the specified String. This method produces a String that will work as a literal replacement s in the appendReplacement method of the Matcher class. The String produced will match the sequence of characters in s treated as a literal sequence. Slashes ('\') and dollar signs ('$') will be given no special meaning.
You need to use java.io.File.separatorChar to The system-dependent default name-separator character.
String location = "usr"+java.io.File.separatorChar+"local"+java.io.File.separatorChar;
org.apache.commons.io.FilenameUtils contains a lot of useful methods, for instance separatorsToSystem(String path) converts separators in given path in accordance with OS which you are using.
Use the below method to know about OS file separator and then replaceAll previous separator with this methods.
System.getProperty("file.separator");
Why you just dont use "/". It is acceptable for both linux and windows as path seperator.
I've searched, but can't seem to figure out how to write java code that takes as input a String containing a wildcard (asterisk), and outputs a String with the wildcard resolved.
I have a special situation where I know there is either 1 or 0 matching filespecs, so I'd like to have the returned String either be a valid filespec, or null.
I've gotten some example code to work using Files.walkFileTree(), but it doesn't do exactly what I want. I want to get the resollved filename back as a String that I can use in subsequent code...
I simply want to pass some code a String filename that includes an asterisk
e.g.: input this String: filename*.tr
and get back a String with the asterisk resolved to the 1st matching filename (or null):
e.g.: get back this String: filename_201402041230.tr
The directory where these files reside contains several thousand files, so iterating over all files in the directory and parsing the names myself isn't an attractive option.
Any help or pointers would be greatly appreciated.
Apologize for the outburst... Thanks for the tip... Here's what I was trying before:
However, as I said this isn't what I want, but it's as close as I could get from my RESEARCH.
Path startDir = Paths.get("C:\\huge_dir");
String pattern = "filename*.tr";
FileSystem fs = FileSystems.getDefault();
final PathMatcher matcher = fs.getPathMatcher("glob:" + pattern);
FileVisitor<Path> matcherVisitor = new SimpleFileVisitor<Path>()
{
#Override public FileVisitResult visitFile(Path file, BasicFileAttributes attribs)
{
Path name = file.getFileName();
if (matcher.matches(name))
System.out.println(file);
return FileVisitResult.TERMINATE;
}
};
try
{
Files.walkFileTree(startDir, matcherVisitor);
}
catch (Exception e){System.out.println(e);}
You can use the nio2 Files.newDirectoryStream method with an additional pattern matcher to only list files which match the pattern. As your string already is a glob pattern, you can just pass it as the second argument:
String pattern = "filename*.tr"
try (DirectoryStream<Path> ds = Files.newDirectoryStream(dir, pattern)) {
//iterate over all matching files
List<Path> paths = new ArrayList<>();
for (Path path : ds) {
paths.add(path);
}
if (paths.isEmpty()) {
//no file found
} else if (paths.size() == 1) {
//found one result
Path result = paths.get(0) //now do whatever
} else {
//more than one match - probably an error in your case?
}
}
Using a wildcard character I want to process files in a directory. If a wildcard character is specified I want to process those files which match the wildcard char else if not specified I'll process all the files. Here's my code
List<File> fileList;
File folder = new File("Directory");
File[] listOfFiles = folder.listFiles();
if(prop.WILD_CARD!=null) {
Pattern wildCardPattern = Pattern.compile(".*"+prop.WILD_CARD+"(.*)?.csv",Pattern.CASE_INSENSITIVE);
for(File file: listOfFiles) {
Matcher match = wildCardPattern.matcher(file.getName());
while(match.find()){
String fileMatch = match.group();
if(file.getName().equals(fileMatch)) {
fileList.add(file); // doesn't work
}
}
}
}
else
fileList = new LinkedList<File>( Arrays.asList(folder.listFiles()));
I'm not able to put the files that match wildcard char in a separate file list. Pls help me to modify my code so that I can put all the files that match wildcard char in a separate file list. Here I concatenate prop.WILD_CARD in my regex, it can be any string, for instance if wild card is test, my pattern is .test(.)?.csv. And I want to store the files matching this wildcard and store it in a file list.
I just tested this code and it runs pretty well. You should check for logical errors somewhere else.
public static void main(String[] args) {
String WILD_CARD = "";
List<File> fileList = new LinkedList<File>();
File folder = new File("d:\\");
File[] listOfFiles = folder.listFiles();
if(WILD_CARD!=null) {
Pattern wildCardPattern = Pattern.compile(".*"+WILD_CARD+"(.*)?.mpp",Pattern.CASE_INSENSITIVE);
for(File file: listOfFiles) {
Matcher match = wildCardPattern.matcher(file.getName());
while(match.find()){
String fileMatch = match.group();
if(file.getName().equals(fileMatch)) {
fileList.add(file); // doesn't work
}
}
}
}
else
fileList = new LinkedList<File>( Arrays.asList(folder.listFiles()));
for (File f: fileList) System.out.println(f.getName());
}
This returns a list of all *.mpp files on my D: drive.
I'd also suggest using
for (File file : listOfFiles) {
Matcher match = wildCardPattern.matcher(file.getName());
if (match.matches()) {
fileList.add(file);
}
}
I would suggest you look into the FilenameFilter class and see if it helps simplify your code. As for your regex expression, I think you need to escape the "." character for it to work.
i am working in a desktop application for windows version using java. In my application there is a requirement to search all .doc and .docx files from the MyDocuments/Documents (as per O.S.) from local system and display there name and file size.
I am not getting the way that will help me to list out all the *.doc, *.docx, *.xls, *.xlsx, *.csv, *.txt, *.pdf, *.ppt, *.pptx files present in Documents/MyDocuments.
Please give me your valuable suggestions or suggest me any link that will help me in writing code for making a faster search and listing out with it's Name,size and Type .
You can use Apache Commons IO, in particular the FileUtils class. That would give something like:
import java.io.File;
import java.util.Collection;
import org.apache.commons.io.*;
import org.apache.commons.io.filefilter.*;
public class SearchDocFiles {
public static String[] EXTENSIONS = { "doc", "docx" };
public Collection<File> searchFilesWithExtensions(final File directory, final String[] extensions) {
return FileUtils.listFiles(directory,
extensions,
true);
}
public Collection<File> searchFilesWithCaseInsensitiveExtensions(final File directory, final String[] extensions) {
IOFileFilter fileFilter = new SuffixFileFilter(extensions, IOCase.INSENSITIVE);
return FileUtils.listFiles(directory,
fileFilter,
DirectoryFileFilter.INSTANCE);
}
public static void main(String... args) {
// Case sensitive
Collection<File> documents = new SearchDocFiles().searchFilesWithExtensions(
new File("/tmp"),
SearchDocFiles.EXTENSIONS);
for (File document: documents) {
System.out.println(document.getName() + " - " + document.length());
}
// Case insensitive
Collection<File> caseInsensitiveDocs = new SearchDocFiles().searchFilesWithCaseInsensitiveExtensions(
new File("/tmp"),
SearchDocFiles.EXTENSIONS);
for (File document: caseInsensitiveDocs) {
System.out.println(document.getName() + " - " + document.length());
}
}
}
Check this method.
public void getFiles(String path) {
File dir = new File(path);
String[] children = dir.list();
if (children != null) {
for (int i = 0; i < children.length; i++) {
// Get filename of file or directory
String filename = children[i];
File file = new File(path + File.separator + filename);
if (!file.isDirectory()) {
if (file.getName().endsWith(".doc") || file.getName().endsWith(".docx")) {
System.out.println("File Name " + filename + "(" + file.length()+" bytes)");
}
} else {
getFiles(path + File.separator + filename);
}
}
}
}
If you want to find all the files with .doc(x) extensions, you can use java.io.File.list(FileFilter) method, say:
public java.util.List mswordFiles(java.io.File dir) {
java.util.List res = new java.util.ArrayList();
_mswordFiles(dir, res);
return res;
}
protected void _mswordFiles(java.io.File dir, java.util.List res) {
java.io.File [] files = dir.listFiles(new java.io.FileFilter() {
public boolean accept(java.io.File f) {
String name = f.getName().toLowerCase();
return !f.isDirectory() && (name.endsWith(".doc") || name.endsWith(".docx"));
}
});
for(java.io.File f:files) {res.add(f);}
java.io.File [] dirs = dir.listFiles(new java.io.FileFilter() {
public boolean accept(java.io.File f) {
return f.isDirectory();
}
});
for(java.io.File d:dirs) {_mswordFiles(d, res);}
}
I don't have enough reputation to comment so have to submit this as an 'answer':
#khachik You can ignoreCase or upper/lower case as you need. – Martijn Verburg Nov 10 '10 at 12:02
This took me a bit to figure out and finally found how to ignore case with this solution:
Add
public static final IOFileFilter filter = new SuffixFileFilter(EXTENSIONS, IOCase.INSENSITIVE);
Then modify searchFilesWithExtensions method to return FileUtils.listFiles(
directory, filter, DirectoryFileFilter.DIRECTORY );
You may want to look into extracting MSWord text using Apache POI and indexing them through Lucene (for accuracy, flexibility, and speed of searching). Nutch and Solr both have helper libraries for Lucene which you can use to speed things up (that is if Lucene core is not sufficient).
[update] I have misunderstood the original question (before the update). You just need to search the filesystem using Java?? Java API can do that. Apache also has a library (Commons IO) that includes a file utility to list all files under a directory including its subdirectories given a filter. I've used it before, e.g. FileUtils.listFiles(dir, filefilter, dirfilter) or FileUtils.listFiles(dir, extensions[], recursive). Then do your search function from that list.