This question already has answers here:
How to split a path platform independent?
(6 answers)
How to get the path string from a java.nio.Path?
(2 answers)
Where to use resolve() and relativize() method of java.nio.file.Path class?
(5 answers)
Create a Path from String in Java7
(4 answers)
Recursively list files in Java
(30 answers)
Closed 4 years ago.
I'm currently working on a search output system that searches a directory for a specific phrase in a file, matches it, then outputs it to a log file. I have a problem snippet of code that looks like this:
int j = 0;
for(String currentMatch : lineMatch) {
String[] split = fileList.get(j).toString().split("\\\\");
match.write(split[3] + " : " + currentMatch + "\r\n");
match.flush();
j++;
}
With fileList being an arraylist of the file names with a matching result and filePath being an arraylist of the file path. I used the split[3] to return the name of the the forth folder in this directory that I'm interested in.
The output file then becomes a little funky. This directory in question has roughly 40 unique names, but the log ends up looking like this:
dir1 : matchingline
dir2 : matchingline
dir3 : matchingline
dir3 : matchingline
... (x543)
dir4 : matchingline
And so on. Directory 3 is only supposed to have 88 matching lines and ends up with an additional 455 lines that belong to other directories. Any idea on why this happens? Is it because I'm using an assignment in the middle of a PrintWriter, or am I missing something simple here?
Edit: Variables listed for clarity.
match = Printwriter object used to print to an output.
lineMatch = ArrayList() - contains the directory path of the current matched file
fileMatch = ArrayList() - contains the file name that was matched.
split[3] is used because the matched files are consistently found in the 4th directory in, ex. C:\User\Programs\Programname\
/r/n is used to keep formatting on windows.
This is a personal project, so I'm not too concerned with making it portable.
Edited to add the method used for initializing the arraylist.
public static void addFiles(String dirPath) {
File dir = new File(dirPath);
File[] files = dir.listFiles();
try {
if(files.length == 0) {
emptyFilePath.add(dirPath);
}
else {
for (File currentFile : files) {
if(currentFile.isFile()) {
fileList.add(currentFile);
filePath.add(currentFile.getPath());
}
else if (currentFile.isDirectory()) {
addFiles(currentFile.getAbsolutePath());
}
}
}
}catch(Exception e) {
e.printStackTrace();
}
}
And the code that generates lineMatch:
while(i < fileList.size()) {
File files = new File(filePath.get(i));
Scanner file = new Scanner(files);
try {
while(file.hasNextLine()) {
String currentLine = file.nextLine();
if(currentLine.contains(searchString)) {
lineMatch.add(currentLine);
}
}
}finally {
file.close();
}
i++;
}
There are a number of things that are suspicious about your code.
Are LineMatch and FileList variables? If so, then you should write them like variables, that is, lineMatch and fileList (lowerCamelCase). Doing otherwise confuses readers and syntax highlighters alike.
You use split[3], that looks suspicious.
If you are using split("\\\\") in order to get the directory path parts, beware that your code is non-portable, it will work on Windows only. If you want to split a path into its parts, it's better to use the API.
In order to understand the problem, it would be useful to see how LineMatch and FileList are generated, without that, it's not possible to understand what's going wrong in your code.
If match is a PrintWriter or PrintStream, you should use println() or format("...%n") instead of write(... + "\r\n"). Again, because your code is not portable. On Unix, line endings are \n only, not \r\n.
The actual problem is with your program logic. Your variable lineMatch contains the hits of all files found. Because you don't generate a separate lineMatch for each file, but just a single one for all files. At least that's how it looks like from the code that you've posted so far.
It looks like what you want to program is a simple version of grep (or, on DOS, find). Part of your logic is correct, for example, how you use recursion to descend in to the directory tree. Instead of collecting all matches and then printing, find and print the matches while you're traversing the directory tree.
In general, you will end up with less errors if you avoid global variables. You ran into a problem in the first place because your variables LineMatch and FileList are global variables. Avoid global variables, avoid reusing variables, and also avoid variable re-assignment.
Related
I use this piece of code to find XML files that another part of my program creates in a given directory:
String fileName;
File folder = new File(mainController.XML_FILES_LOCATION);
File[] listOfFiles = folder.listFiles();
for (int i = 0; i < listOfFiles.length; i++) {
if (listOfFiles[i].isFile()) {
fileName = listOfFiles[i].getName();
if (fileName.endsWith(".xml")) {
Document readFile = readFoundXmlFile(fileName);
if (readFile != null) {
boolean postWasSuccesful = mainController.sendXmlFile(readFile, fileName);
reproduceXmlFile(readFile, fileName, postWasSuccesful);
deleteXmlFile(fileName);
}
}
}
}
What it does is that it reads every XML file that gets placed in the given directory, it sends it to an URL and it copies it to a subdirectory (either 'sent' or 'failed' based on the boolean postWasSuccedful) and deletes the original so it won't be sent again.
In Windows this works as expected, but I've transferred the working code to a Linux machine and all of a sudden it get's in this loop of sending bla.xml and a second later sent\bla.xml and again a second later sent\sent\bla.xml followed by sent\sent\sent\bla.xml, etc.
Why is Linux deciding for itself that listFiles() is recursive?? And, better, how to prevent that? I can add an extra check in the if-statement looking for files ending with .xml that there isn't a directory-char allowed in the fileName, but that's a workaround I don't want as the amount of files in the pick-up directory will never be high whereas the amount of files in the sent subdirectory can get quite high after a while and I wouldn't want this piece of code to become slow
My psychic powers tell me that reproduceXmlFile() builds the target pathname using a hard-coded backslash ("\"), and therefore you're actually creating files with backslashes in their names.
You need to use File.separator rather than that hard-coded "\". Or use something like new File("sent", fileName).toString() to generate your output pathnames.
(Apologies if I'm wrong!)
This question already has answers here:
Parsing File name of a doc file of java
(2 answers)
Closed 9 years ago.
I am trying to parse multiple file names(doc file)in java.
How should I go about doing this?
I asked a previous post and got a answer on how to parse a file name in java.
Thanks for that.
So in a directory, I have multiple files(with different names). For instance, there are files
AA_2322_1
AA_2342_1
BB_2324_1
CC_2342_1
I want to parse the middle 4 digit-5digit numbers only.
Suppose you have a directory C:\XYZ with the files you listed above, with .doc extensions on them. Taking advantage of a FileFilter, you can get a list of the numbers you are looking for with the following code:
File directory = new File("C:/XYZ");
final ArrayList<String> innerDigits = new ArrayList<String>();
FileFilter filter = new FileFilter() {
#Override
public boolean accept(File pathname) {
if (!pathname.isFile() || !pathname.getName().endsWith("doc"))
return false;
// Extract whatever you need from the file object
String[] parts = pathname.getName().split("_");
innerDigits.add(parts[1]);
return true;
}
};
// No need to store the results, we extracted the info in the filter method
directory.listFiles(filter);
for (String data : innerDigits)
System.out.println(data);
Getting the filenames of all files in a folder - Use this question to get the name of all the files in the directory. Then use the String.split() function to parse the file names.
You can use split method
String[] parts = filename.split("_");
Now you need parts[1]
I currently have a log file(see bellow) that I need to iterate through and pull out a list of files that were scanned. Then by using that list of files copy the scanned files into a different directory.
So in this case, it would go through and pull
c:\tools\baregrep.exe
c:\tools\baretail.exe
etc etc
and then move them to a folder, say c:\folder\SafeFolder with the same file structure
I wish I had a sample of what the output was on a failed scan, but this will get me a good head start and I can probably figure the rest out
Symantec Image of Log File
Thanks in advanced, I really appreciate any help that you can lend me.
This question is tagged as Java, and as much as I love Java, this problem is something that would be easier and quicker to solve in a language such as Perl (so if you only want the end result and do not need to run in a particular environment then you may wish to use a scripting language instead).
Not a working implementation, but code along the lines of the below is all it would take in perl: (Syntax untested and likely broken as is, only serves as a guideline.. been awhile since I wrote any perl).
use File::Copy;
my $outdir = "c:/out/";
while(<>)
{
my ($path) = /Processing File\s+\'([^\']+)\'/;
my ($file) = $path =~ /(.*\\)+([^\\]+)/;
if (($file) && (-e $path))
{
copy($path,$outdir . $file);
}
}
This should do the trick. Now, just adapt for your solution!
public static void find(String logPath, String safeFolder) throws FileNotFoundException, IOException {
ArrayList<File> files = new ArrayList<File>();
BufferedReader br = new BufferedReader(new FileReader(logPath));
StringBuilder sb = new StringBuilder();
String line = null;
while ((line = br.readLine()) != null) {
Pattern pattern = Pattern.compile("'[a-zA-Z]:\\\\.+?'");
Matcher matcher = pattern.matcher(line);
if (matcher.matches()) {
}
if (matcher.find()) {
files.add(new File(matcher.group()));
System.out.println("Got a new file! " + files.get(files.size() - 1));
}
}
for (File f : files) {
// Make sure we get a file indeed
if (f.exists()) {
if (!f.renameTo(new File(safeFolder, f.getName()))) {
System.err.println("Unable to move file! " + f);
}
} else {
System.out.println("I got a wrong file! " + f);
}
}
}
Its straight forward.
Read the Log file line by line using NEW_LINE as your deliminator. If this is a small file, feel free to load it & process it via String.split("\n") or StringTokenizer
As you loop each line, you need to do a simple test to detect if that string contains 'Processing File '.
If it does, using Regular Expression (harder) or simple parsing to capture the file names. It should be within the ['], so detect the first occurrence of ['], and detect the second, and get the string in between.
If your string is valid (you may test using java.io.File) or existing, you could copy the file name to another file. I would not advise you against copying it in java for memory restrictions for starters.
Instead, copy the string of files to form a batch file to copy them at once using the OS Script like Windows BAT or Bash Script, eg cp 'filename_from' 'copy_to_dir'
Let me know of you need a working example
regards
I have some sort of batch program that should pick up a file from a directory and process it.
Since this program is supposed to:
run on JDK 5,
be small (no external libraries)
and fast! (with a bang)
...what is the best way to only pick one file from the directory - without using File.list() (might be hundreds of files)?
In Java 7 you could use a DirectoryStream, but in Java 5, the only ways to get directory entries are list() and listFiles().
Note that listing a directory with hundreds of files is not ideal but still probably no big deal compared to processing one of the files. But it would probably start to be problematic once the directory contains many thousands of files.
Use a FileFilter (or FilenameFilter) written to accept only once, for example:
File dir = new File("/some/dir");
File[] files = dir.listFiles(new FileFilter() {
boolean first = true;
public boolean accept(final File pathname) {
if (first) {
first = false;
return true;
}
return false;
}
});
It seems from what you said that you want to process every file in the directory once (including files that get added to the directory). You can do the following: set a monitor on the directory that generates notifications when files are added. you then process each file that you get notified about. Since you use JDK 5 , i suggest using jpathwatch . note that you need to make sure the file writing has finished before trying to process it. after starting the monitor to insure you will be processing every new file, make a one time usage of file listing to process the current content.
Edit: My implementation made use of .list() as you said it wouldn't but It may hold some value anyways :)
If you look at the File implementation public String[] list() method seems to have less overhead than File[] listFiles(). So fastest should be
String[] ss = myDir.list();
File toProcess = null;
for(int i = o ; i< ss.length ; i++){
toProcess = new File(myDir.list()[i], myDir));
if(toProcess.isFile())break;
}
From File.class
public File[] listFiles() {
String[] ss = list();
if (ss == null) return null;
int n = ss.length;
File[] fs = new File[n];
for (int i = 0; i < n; i++) {
fs[i] = new File(ss[i], this);
}
return fs;
}
If one look at the class class FileSystem which it boils down to for filesystem access there is only the list method so there seems to be no other way in "pure" JAVA to select a file than to list them all in a String array.
There is no good solution here on Java 1.5, you can use a filter to get only 1 file, but then java will only return one file but parse over all of them anyways. If you don't need the actual file object you could try something like Runtime.getRuntime().exec("dir") split the returned string on \n and print out the first line :-P
This question already has answers here:
Closed 12 years ago.
The community reviewed whether to reopen this question 1 year ago and left it closed:
Original close reason(s) were not resolved
Possible Duplicate:
Best way to iterate through a directory in java?
I want to process each file in a certain directory using Java.
What is the easiest (and most common) way of doing this?
If you have the directory name in myDirectoryPath,
import java.io.File;
...
File dir = new File(myDirectoryPath);
File[] directoryListing = dir.listFiles();
if (directoryListing != null) {
for (File child : directoryListing) {
// Do something with child
}
} else {
// Handle the case where dir is not really a directory.
// Checking dir.isDirectory() above would not be sufficient
// to avoid race conditions with another process that deletes
// directories.
}
I guess there are so many ways to make what you want. Here's a way that I use. With the commons.io library you can iterate over the files in a directory. You must use the FileUtils.iterateFiles method and you can process each file.
You can find the information here: http://commons.apache.org/proper/commons-io/download_io.cgi
Here's an example:
Iterator it = FileUtils.iterateFiles(new File("C:/"), null, false);
while(it.hasNext()){
System.out.println(((File) it.next()).getName());
}
You can change null and put a list of extentions if you wanna filter. Example: {".xml",".java"}
Here is an example that lists all the files on my desktop. you should change the path variable to your path.
Instead of printing the file's name with System.out.println, you should place your own code to operate on the file.
public static void main(String[] args) {
File path = new File("c:/documents and settings/Zachary/desktop");
File [] files = path.listFiles();
for (int i = 0; i < files.length; i++){
if (files[i].isFile()){ //this line weeds out other directories/folders
System.out.println(files[i]);
}
}
}
Use java.io.File.listFiles
Or If you want to filter the list prior to iteration (or any more complicated use case), use apache-commons FileUtils. FileUtils.listFiles