java code to search all .doc and .docx files from local system - java

i am working in a desktop application for windows version using java. In my application there is a requirement to search all .doc and .docx files from the MyDocuments/Documents (as per O.S.) from local system and display there name and file size.
I am not getting the way that will help me to list out all the *.doc, *.docx, *.xls, *.xlsx, *.csv, *.txt, *.pdf, *.ppt, *.pptx files present in Documents/MyDocuments.
Please give me your valuable suggestions or suggest me any link that will help me in writing code for making a faster search and listing out with it's Name,size and Type .

You can use Apache Commons IO, in particular the FileUtils class. That would give something like:
import java.io.File;
import java.util.Collection;
import org.apache.commons.io.*;
import org.apache.commons.io.filefilter.*;
public class SearchDocFiles {
public static String[] EXTENSIONS = { "doc", "docx" };
public Collection<File> searchFilesWithExtensions(final File directory, final String[] extensions) {
return FileUtils.listFiles(directory,
extensions,
true);
}
public Collection<File> searchFilesWithCaseInsensitiveExtensions(final File directory, final String[] extensions) {
IOFileFilter fileFilter = new SuffixFileFilter(extensions, IOCase.INSENSITIVE);
return FileUtils.listFiles(directory,
fileFilter,
DirectoryFileFilter.INSTANCE);
}
public static void main(String... args) {
// Case sensitive
Collection<File> documents = new SearchDocFiles().searchFilesWithExtensions(
new File("/tmp"),
SearchDocFiles.EXTENSIONS);
for (File document: documents) {
System.out.println(document.getName() + " - " + document.length());
}
// Case insensitive
Collection<File> caseInsensitiveDocs = new SearchDocFiles().searchFilesWithCaseInsensitiveExtensions(
new File("/tmp"),
SearchDocFiles.EXTENSIONS);
for (File document: caseInsensitiveDocs) {
System.out.println(document.getName() + " - " + document.length());
}
}
}

Check this method.
public void getFiles(String path) {
File dir = new File(path);
String[] children = dir.list();
if (children != null) {
for (int i = 0; i < children.length; i++) {
// Get filename of file or directory
String filename = children[i];
File file = new File(path + File.separator + filename);
if (!file.isDirectory()) {
if (file.getName().endsWith(".doc") || file.getName().endsWith(".docx")) {
System.out.println("File Name " + filename + "(" + file.length()+" bytes)");
}
} else {
getFiles(path + File.separator + filename);
}
}
}
}

If you want to find all the files with .doc(x) extensions, you can use java.io.File.list(FileFilter) method, say:
public java.util.List mswordFiles(java.io.File dir) {
java.util.List res = new java.util.ArrayList();
_mswordFiles(dir, res);
return res;
}
protected void _mswordFiles(java.io.File dir, java.util.List res) {
java.io.File [] files = dir.listFiles(new java.io.FileFilter() {
public boolean accept(java.io.File f) {
String name = f.getName().toLowerCase();
return !f.isDirectory() && (name.endsWith(".doc") || name.endsWith(".docx"));
}
});
for(java.io.File f:files) {res.add(f);}
java.io.File [] dirs = dir.listFiles(new java.io.FileFilter() {
public boolean accept(java.io.File f) {
return f.isDirectory();
}
});
for(java.io.File d:dirs) {_mswordFiles(d, res);}
}

I don't have enough reputation to comment so have to submit this as an 'answer':
#khachik You can ignoreCase or upper/lower case as you need. – Martijn Verburg Nov 10 '10 at 12:02
This took me a bit to figure out and finally found how to ignore case with this solution:
Add
public static final IOFileFilter filter = new SuffixFileFilter(EXTENSIONS, IOCase.INSENSITIVE);
Then modify searchFilesWithExtensions method to return FileUtils.listFiles(
directory, filter, DirectoryFileFilter.DIRECTORY );

You may want to look into extracting MSWord text using Apache POI and indexing them through Lucene (for accuracy, flexibility, and speed of searching). Nutch and Solr both have helper libraries for Lucene which you can use to speed things up (that is if Lucene core is not sufficient).
[update] I have misunderstood the original question (before the update). You just need to search the filesystem using Java?? Java API can do that. Apache also has a library (Commons IO) that includes a file utility to list all files under a directory including its subdirectories given a filter. I've used it before, e.g. FileUtils.listFiles(dir, filefilter, dirfilter) or FileUtils.listFiles(dir, extensions[], recursive). Then do your search function from that list.

Related

How to find specific directory and its files according to the keyword passed in java and loading in memory approach

I have a project structure like below:
Now, my problem statement is I have to iterate resources folder, and given a key, I have to find that specific folder and its files.
For that, I have written a below code with the recursive approach but I am not getting the output as intended:
public class ConfigFileReader {
public static void main(String[] args) throws Exception {
System.out.println("Print L");
String path = "C:\\...\\ConfigFileReader\\src\\resources\\";
//FileReader reader = new FileReader(path + "\\Encounter\\Encounter.properties");
//Properties p = new Properties();
//p.load(reader);
File[] files = new File(path).listFiles();
String resourceType = "Encounter";
System.out.println(navigateDirectoriesAndFindTheFile(resourceType, files));
}
public static String navigateDirectoriesAndFindTheFile(String inputResourceString, File[] files) {
String entirePathOfTheIntendedFile = "";
for (File file : files) {
if (file.isDirectory()) {
navigateDirectoriesAndFindTheFile(inputResourceString, file.listFiles());
System.out.println("Directory: " + file.getName());
if (file.getName().startsWith(inputResourceString)) {
entirePathOfTheIntendedFile = file.getPath();
}
} else {
System.out.print("Inside...");
entirePathOfTheIntendedFile = file.getPath();
}
}
return entirePathOfTheIntendedFile;
}
}
Output:
The output should return C:\....\Encounter\Encounter.properties as the path.
First of all, if it finds the string while traversing it should return the file inside that folder and without navigating the further part as well as what is the best way to iterate over suppose 1k files because every time I can't follow this method because it doesn't seem an effective way of doing it. So, how can I use an in-memory approach for this problem? Please guide me through it.
You will need to check the output of recursive call and pass that back when a match is found.
Always use File or Path to handle filenames.
Assuming that I've understood the logic of the search, try this which scans for files of form XXX\XXXyyyy
public class ConfigReader
{
public static void main(String[] args) throws Exception {
System.out.println("Print L");
File path = new File(args[0]).getAbsoluteFile();
String resourceType = "Encounter";
System.out.println(navigateDirectoriesAndFindTheFile(resourceType, path));
}
public static File navigateDirectoriesAndFindTheFile(String inputResourceString, File path) {
File[] files = path.listFiles();
File found = null;
for (int i = 0; found == null && files != null && i < files.length; i++) {
File file = files[i];
if (file.isDirectory()) {
found = navigateDirectoriesAndFindTheFile(inputResourceString, file);
} else if (file.getName().startsWith(inputResourceString) && file.getParentFile().getName().equals(inputResourceString)) {
found = file;
}
}
return found;
}
}
If this is slow especially for 1K of files re-write with Files.walkFileTree which would be much faster than File.list() in recursion.

how to search for a filename in a list of files

I need to find a file name from the list of filenames and to initiate two methods according to the found result. I tried:
FileList result = service.files().list()
.setPageSize(10)
.setFields("nextPageToken, files(id, name)")
.execute();
List<File> files = result.getFiles();
if (files == null || files.size() == 0) {
System.out.println("No files found.");
} else {
System.out.println("Files:");
for (File file : files) {
System.out.printf("%s (%s)\n", file.getName(), file.getId());
Boolean found = files.contains("XYZ");
if(found)
{
insertIntoFolder();
} else {
createFolder();
}
}
}
I need to find XYZ (the filename) from a list of file names (like sjh, jsdhf, XYZ, ASDF). Once I've found it I need to stop the search. If the name doesn't match the list of names I need to create a folder only once after checking all names from that list.
Boolean found = files.contains("XYZ");
This line is problematic. files is a list of File objects, none of which will match the String "XYX". List.contains() essentially calls Object.equals() on every element of the list, and File.equals("XYZ") will always return false.
If you're programming in an IDE like Eclipse it should show a warning on this line, since it's a bug that can be detected at compile-time.
To determine if a File in a List<File> has a filename matching a given string you need to operate on the filename itself, so the above line should instead be:
boolean found = file.getName().equals("XYZ");
Depending on what exactly you're trying to match you might want to use .getName(), .getAbsolutePath(), or .toString().
It's also a good idea to use the Path API introduced in Java 7, rather than File, which is essentially a legacy class at this point.
If you want a more elegant solution than manually looping over files looking for a match you can use Files.newDirectoryStream(Path, Filter) which allows you to define a Filter predicate that only matches certain files, e.g.
Files.newDirectoryStream(myDirectory, p -> p.getFileName().toString().equals("XYZ"))
File.list(FilenameFilter) is a similar feature for working with File objects, but again, prefer to use the Path API if possible.
Here is a example:
/**
* return true if file is in filesList else return false
*/
static boolean isFileInList(File file, List<File> filesList) {
for(File f: filesList) {
if (f.equals(file)) {
return true;
}
}
return false;
}
public static void main(String[] args) {
List<File> files;// the filelist; make sure assign these two variable.
File file; // the file you want to test.
if (isFileInList(file, files)) {
//file is presented
} else {
//file is not presented
createFolder();
}
}
package test;
import java.io.File;
import java.io.FilenameFilter;
import java.io.IOException;
public class DirectoryContents {
public static void main(String[] args) throws IOException {
File f = new File("."); // current directory
FilenameFilter textFilter = new FilenameFilter() {
public boolean accept(File dir, String name) {
String lowercaseName = name.toLowerCase();
if (lowercaseName.endsWith(".txt")) {
return true;
} else {
return false;
}
}
};
File[] files = f.listFiles(textFilter);
for (File file : files) {
if (file.isDirectory()) {
System.out.print("directory:");
} else {
System.out.print(" file:");
}
System.out.println(file.getCanonicalPath());
}
}
}

Check if file is in (sub)directory

I would like to check whether an existing file is in a specific directory or a subdirectory of that.
I have two File objects.
File dir;
File file;
Both are guaranteed to exist. Let's assume
dir = /tmp/dir
file = /tmp/dir/subdir1/subdir2/file.txt
I want this check to return true
For now i am doing the check this way:
String canonicalDir = dir.getCanonicalPath() + File.separator;
boolean subdir = file.getCanonicalPath().startsWith(canonicalDir);
This seems to work with my limited tests, but i am unsure whether this might make problems on some operating systems. I also do not like that getCanonicalPath() can throw an IOException which i have to handle.
Is there a better way? Possibly in some library?
Thanks
In addition to the asnwer from rocketboy, use getCanonicalPath() instad of getAbsolutePath() so \dir\dir2\..\file is converted to \dir\file:
boolean areRelated = file.getCanonicalPath().contains(dir.getCanonicalPath() + File.separator);
System.out.println(areRelated);
or
boolean areRelated = child.getCanonicalPath().startsWith(parent.getCanonicalPath() + File.separator);
Do not forget to catch any Exception with try {...} catch {...}.
NOTE: You can use FileSystem.getSeparator() instead of File.separator. The 'correct' way of doing this will be to get the getCanonicalPath() of the directory that you are going to check against as a String, then check if ends with a File.separator and if not then add File.separator to the end of that String, to avoid double slashes. This way you skip future odd behaviours if Java decides to return directories with a slash in the end or if your directory string comes from somewhere else than Java.io.File.
NOTE2: Thanx to #david for pointing the File.separator problem.
I would create a small utility method:
public static boolean isInSubDirectory(File dir, File file) {
if (file == null)
return false;
if (file.equals(dir))
return true;
return isInSubDirectory(dir, file.getParentFile());
}
This method looks pretty solid:
/**
* Checks, whether the child directory is a subdirectory of the base
* directory.
*
* #param base the base directory.
* #param child the suspected child directory.
* #return true, if the child is a subdirectory of the base directory.
* #throws IOException if an IOError occured during the test.
*/
public boolean isSubDirectory(File base, File child)
throws IOException {
base = base.getCanonicalFile();
child = child.getCanonicalFile();
File parentFile = child;
while (parentFile != null) {
if (base.equals(parentFile)) {
return true;
}
parentFile = parentFile.getParentFile();
}
return false;
}
Source
It is similar to the solution by dacwe but doesn't use recursion (though that shouldn't make a big difference in this case).
If you plan to works with file and filenames heavly check apache fileutils and filenameutils libraries. Are full of useful (and portale if portability is mamdatory) functions
public class Test {
public static void main(String[] args) {
File root = new File("c:\\test");
String fileName = "a.txt";
try {
boolean recursive = true;
Collection files = FileUtils.listFiles(root, null, recursive);
for (Iterator iterator = files.iterator(); iterator.hasNext();) {
File file = (File) iterator.next();
if (file.getName().equals(fileName))
System.out.println(file.getAbsolutePath());
}
} catch (Exception e) {
e.printStackTrace();
}
}
}
You can traverse File Tree starting from your specific DIR.
At Java 7, there is Files.walkFileTree method. You have only to write your own visitor
to check if current node is searched file. More doc:
http://docs.oracle.com/javase/7/docs/api/java/nio/file/Files.html#walkFileTree%28java.nio.file.Path,%20java.util.Set,%20int,%20java.nio.file.FileVisitor%29
You can do this, however it won't catch every use case e.g. dir = /somedir/../tmp/dir/etc..., unless that's how the file was defined also.
import java.nio.file.Path;
import java.nio.file.Paths;
public class FileTest {
public static void main(final String... args) {
final Path dir = Paths.get("/tmp/dir").toAbsolutePath();
final Path file = Paths.get("/tmp/dir/subdir1/subdir2/file.txt").toAbsolutePath();
System.out.println("Dir: " + dir);
System.out.println("File: " + file);
final boolean valid = file.startsWith(dir);
System.out.println("Valid: " + valid);
}
}
In order for the checks to work correctly, you really need to map these using toRealPath() or, in your example, getCanonicalPath(), but you then have to handle exceptions for these examples which is absolutely correct that you should do so.
Since Java 7+ you can just do this:
file.toPath().startsWith(dir.toPath());
How about comparing the paths?
boolean areRelated = file.getAbsolutePath().contains(dir.getAbsolutePath());
System.out.println(areRelated);
or
boolean areRelated = child.getAbsolutePath().startsWith(parent.getAbsolutePath())

Java iterative reading of Files

at the moment I'm having a problem with writing a tool for my company. I have 384 XML files that i have to read and parse with a SAX Parser into txt files.
What i got until now is the parsing of all XML-Files into one txt File, size 43 MB. With a BufferedReader and line.startsWith i want to extract all relevant information out of the textfile.
Edit: Done
(So my Problem is how to solve this more efficiently. I'm having an idea (but unfortunately not in code as you might think) but i dont know if its possible: I want to iterate through a Directory, find the XML-File i want, then parse it and create a new txt File with the parsed content. If done for all 384 XML files i want the same thing for the 384 txt files, read them with a BufferedReader to get my relevant information. Its important to read them one at a time. Another Problem is the Directory path, its a bit complex: "C:\Users\xxx\Documents\Data\ProjectName\A1\1\1SLin\wanted.xml" for each file there is a own directory. The variable is A1, it reaches from A-P and 1-24. Alternatively I have all the relevant files with thir absolute path in an arraylist, so its also okay to iterate over this list if its easier.)
Edit:
I came to a solution: Below contains the search directories method and a method to parse the xml Files of a List into the same directory with the same filename but another file extension
public List<File> searchFile(File dir, String find) {
File[] files = dir.listFiles();
List<File> matches = new ArrayList<File>();
if (files != null) {
for (int i = 0; i < files.length; i++) {
if (files[i].isDirectory()) {
matches.addAll(searchFile(files[i], find));
} else if (files[i].getName().equalsIgnoreCase(find)) {
matches.add(files[i]);
}
}
}
Collections.sort(matches);
return matches;
}
public static void main(String[] args) throws IOException {
Import_Files im = new Import_Files();
File dir = new File("C:\\Users\\xxx\\Desktop\\MS-Daten\\");
String name = "snp_result_5815.xml";
List<File> matches = im.searchFile(dir, name);
System.out.println(matches);
for (int i=0; i<matches.size(); i++) {
String j = String.valueOf(i);
String xml_name = matches.get(i).getAbsolutePath();
File f = new File(matches.get(i).getAbsolutePath().replaceFirst(".xml", ".txt"));
System.setOut(new PrintStream(new FileOutputStream(f)));
System.out.println("\nstarting File: "+ i + "\n");
xml_parse myReader = new xml_parse(xml_name);
myReader.setContentHandler(new MyContentHandler());
myReader.setErrorHandler(new MyErrorHandler());
myReader.run();
}
}
The searchFolder method below will take a path and file extension, search the path and all sub-directories, and pass any matching file types to the processFile method.
public static void main(String[] args) {
String path = "c:\\temp";
Pattern filePattern = Pattern.compile("(?i).*\\.xml$");
searchFolder(path, filePattern);
}
public static void searchFolder(String searchPath, Pattern filePattern){
File dir = new File(searchPath);
for(File item : dir.listFiles()){
if(item.isDirectory()){
//recursively search subdirectories
searchFolder(item.getAbsolutePath(), filePattern);
} else if(item.isFile() && filePattern.matcher(item.getName()).matches()){
processFile(item);
}
}
}
public static void processFile(File aFile){
String filename = aFile.getAbsolutePath();
String txtFilename = filename.substring(0, filename.lastIndexOf(".")) + ".txt";
//Do your xml file parsing and write to txtFilename
}
The complexity of the path makes no difference, just specify the root path to search (looks like C:\Users\xxx\Documents\Data\ProjectName in your case) and it will find all the files.

How to list only non hidden and non system file in jtree

File f=new File("C:/");
File fList[] = f.listFiles();
When i use this it list all system file as well as hidden files.
and this cause null pointer exception when i use it to show in jTree like this:
public void getList(DefaultMutableTreeNode node, File f) {
if(f.isDirectory()) {
DefaultMutableTreeNode child = new DefaultMutableTreeNode(f);
node.add(child);
File fList[] = f.listFiles();
for(int i = 0; i < fList.length; i++)
getList(child, fList[i]);
}
}
What should i do so that it do not give NullPointerException and show only non hidden and non system files in jTree?
Do this for hidden files:
File root = new File(yourDirectory);
File[] files = root.listFiles(new FileFilter() {
#Override
public boolean accept(File file) {
return !file.isHidden();
}
});
This will not return hidden files.
As for system files, I believe that is a Windows concept and therefore might not be supported by File interface that tries to be system independent. You can use Command line commands though, if those exist.
Or use what #Reimeus had in his answer.
Possibly like
File root = new File("C:\\");
File[] files = root.listFiles(new FileFilter() {
#Override
public boolean accept(File file) {
Path path = Paths.get(file.getAbsolutePath());
DosFileAttributes dfa;
try {
dfa = Files.readAttributes(path, DosFileAttributes.class);
} catch (IOException e) {
// bad practice
return false;
}
return (!dfa.isHidden() && !dfa.isSystem());
}
});
DosFileAttributes was introduced in Java 7.
If running under Windows, Java 7 introduced DosFileAttributes which enables system and hidden files to be filtered. This can be used in conjunction with a FileFilter
Path srcFile = Paths.get("myDirectory");
DosFileAttributes dfa = Files.readAttributes(srcFile, DosFileAttributes.class);
System.out.println("System File? " + dfa.isSystem());
System.out.println("Hidden File? " + dfa.isHidden());
If you are trying to list all files in C:/ please keep in mind that there are other files also which are neither hidden nor system files, but that still won't open because they require special privileges/permissions. So:
String[] files = file.list();
if (files!=null) {
for (String f : files) open(f);
}
So just compare if the array is null or not and design your recursion in such a way that it just skips those files whose array for the list() function is null.
private void nodes(DefaultMutableTreeNode top, File f) throws IOException {
if (f.isDirectory()) {
File[] listFiles = f.listFiles();
if (listFiles != null) {
DefaultMutableTreeNode b1[] = new DefaultMutableTreeNode[listFiles.length];
for (int i = 0; i < b1.length; i++) {
b1[i] = new DefaultMutableTreeNode(listFiles[i].toString());
top.add(b1[i]);
File g = new File(b1[i].toString());
nodes(b1[i], g);
}
}
}
Here is the code I used to create a window file explorer using jtree.

Categories