Is there any better file search algorithm than recursion? - java

I have used recursion to search for particular type of file (for example .pdf files is used here).
My recursion algorithm searches for all subfolder.
However I found that it lacks performance when there is too many sub-folder. sub-sub-folder, sub-sub-sub-folder.
I want to know if there is better algorithm for file searching.
Below is my recursion code for file searching. I have used .pdf file as an example
import java.io.File;
public class FInd {
public static void main(String[] args) {
File f = new File("D:/");
find(f);
}
public static void find(File f){
File []list = f.listFiles();
try{
for(int i=0;i<list.length && list.length>0;i++){
if(list[i].isFile() && (list[i].getName().contains(".pdf")) ||
list[i].getName().contains(".PDF"))
System.out.println(list[i].getAbsolutePath());
if(list[i].isDirectory()) find(list[i]);
}
}catch(Exception e){
}
}
}
This code is somewhat faster or equal to when compared to search option in file explorer. I want to know any faster algorithm than this

try the iterative way
public class Find {
public static void main(String[] args) {
File f = new File("D:/");
Stack stack = new Stack<File>();
stack.push(f);
while (!stack.empty())
{
f = (File) stack.pop();
File []list = f.listFiles();
try{
for(int i=0;i<list.length && list.length>0;i++){
if(list[i].isFile() && (list[i].getName().contains(".pdf")) ||
list[i].getName().contains(".PDF"))
System.out.println(list[i].getAbsolutePath());
if(list[i].isDirectory()) stack.push(list[i]);
}
}catch(Exception e){
}
}

the problem with threading is that launching them has a cost, so the increase in file browsing + recursion has to be better than the additional cost of N folders/threads.
This is a simple method that uses a loop (the classical replacement for recursion)
static boolean avoidRecursion(String target){
File currentDir = new File(System.getProperty("user.home"));
Stack<File> dirs = new Stack<File>();
dirs.push(currentDir);
do{
for(File f : dirs.pop().listFiles()){
if (f.isDirectory())
dirs.push(f);
else{
if (f.getName().equals(target))
return true;
}
}
}while(!dirs.isEmpty());
return false;
}
Measure both approaches and choose the option that is faster

Probaply you could use multithreading...
Each folder you enter, you start at new thread... Even if you have more threads than your CPU, it ist not a Problem since Windows Can run much more threads...

Use the Files.walk() method which returns a Java8 Stream. You can parallelize that calculation quite easily by using a parallel stream.
Use the following convenient idiom in a try with resources method:
try(Stream vals = Files.walk(rootPath)){
.... }
In the rootPath, you could use Paths.get("root location") to actually get to the root location.

Related

How to determine line number for the method with java ASM?

I need to determine line number of specific method in class using ObjectWeb ASM library. Line number of method declaration or first line in method's body are equally accepted as right answers (6 or 7 in example).
Example:
1. public class Foo {
...
6. public void bar() {
7. try {
8. try {
9. System.out.println(); //first executable line
I try to use MethodVisitor's visitLineNumber method, but it visit only first executable line (line 9 in example).
I found solution for this problem on JavaAssist library (link). But is there a way to solve this with ASM?
EDIT:
Following snippet gave same result, line 9 instead of 6 or 7.
public static int getLineNumber(String path) throws IOException {
final File f = new File(path);
try (FileInputStream fis = new FileInputStream(f)) {
ClassReader reader = new ClassReader(fis);
ClassNode clNode = new ClassNode(Opcodes.ASM5);
reader.accept(clNode, Opcodes.ASM5);
for (MethodNode mNode : (List<MethodNode>) clNode.methods) {
if (mNode.name.equals("bar")) {
ListIterator<AbstractInsnNode> it = mNode.instructions.iterator();
while (it.hasNext()) {
AbstractInsnNode inNode = it.next();
if (inNode instanceof LineNumberNode) {
return ((LineNumberNode) inNode).line;
}
}
}
}
}
return -1;
}
The line numbers provided by any bytecode processing library are based on the LineNumberTable attribute which maps executable instructions of the method to line numbers. So it’s a fundamental limitation that you can not find source code lines in the class file which do not cause the generation of executable byte code.
Sometimes it even depends on the compiler, which source code line a construct spanning multiple lines gets assigned to.
public static LineNumberNode findLineNumberForInstruction(InsnList
insnList, AbstractInsnNode insnNode) {
Validate.notNull(insnList);
Validate.notNull(insnNode);
int idx = insnList.indexOf(insnNode);
Validate.isTrue(idx != -1);
// Get index of labels and insnNode within method
ListIterator<AbstractInsnNode> insnIt = insnList.iterator(idx);
while (insnIt.hasPrevious()) {
AbstractInsnNode node = insnIt.previous();
if (node instanceof LineNumberNode) {
return (LineNumberNode) node;
}
}
return null;
}

having an if statement inside a loop without calling it in each iteration

I have the following code:
public static void main(final String[] args) {
if (args.length == 0) {
System.out.println("Please provide a correct directory path as an argument!");
} else {
System.out.println("Thanks for using our CodeMetrics\n"
+ "The process Might take a long time, please wait!\n"
+ "Please check the CSV file for the final results!");
File ad = new File(args[0]);
File[] list = ad.listFiles();
for (File f : list) {
CodeMetrics codeMetrics = new CodeMetrics();
codeMetrics.parseCommandLine(f.toString());
codeMetrics.countComplexity(codeMetrics.sourceCodeFile);
// Count LOC (Lines of Code)
codeMetrics.countLines(codeMetrics.sourceCodeFile);
codeMetrics.countTestLines(codeMetrics.testFiles);
codeMetrics.printReport();
codeMetrics.writeReport();
}
}
}
right now I would like to give the user the opprtunity to choose wither to call printReport() method or to call printReport() and writeReport() together. The problem is if i put an if statement it will be inside the for each loop and the user will have to choose for each iteration in the loop.
The only idea that I can see is to implement to different methods as the following:
public static void onlyPrint(final String args){
}
public static void printAndWrit (final String args){
}
and both methods will have the same code, except that one of them will have printReport() and the other will have both methods. But I'm not really satisfied with that solution as I believe it will have much of code redudancy! Is there a better solution?
Thanks
What Drew Kennedy said is most likely the easiest answer to implement.
public static void main(final String[] args) {
if (args.length == 0) {
System.out.println("Please provide a correct directory path as an argument!");
} else {
System.out.println("Thanks for using our CodeMetrics\n"
+ "The process Might take a long time, please wait!\n"
+ "Please check the CSV file for the final results!");
File ad = new File(args[0]);
File[] list = ad.listFiles();
//ask the user here
Scanner sc = new Scanner(System.in);
System.out.println("Please enter a number : Would you like to (1) print, or (2) print & write?");
int answer = scan.nextInt();
boolean write = false;
if (answer == 2) {
write = true;
}
for (File f : list) {
CodeMetrics codeMetrics = new CodeMetrics();
codeMetrics.parseCommandLine(f.toString());
codeMetrics.countComplexity(codeMetrics.sourceCodeFile);
// Count LOC (Lines of Code)
codeMetrics.countLines(codeMetrics.sourceCodeFile);
codeMetrics.countTestLines(codeMetrics.testFiles);
//check whether the user wants to write or not
if (write == false) {
codeMetrics.printReport();
} else {
codeMetrics.printReport();
codeMetrics.writeReport();
}
}
}
}
This should just ask the user once, and do what you need to get done.
i agree with Childishforlife that the solution with the boolean is the easiest way. I just want to mention a little optimization.
You can turn
//check whether the user wants to write or not
if (write == false) {
codeMetrics.printReport();
} else {
codeMetrics.printReport();
codeMetrics.writeReport();
}
into:
codeMetrics.printReport();
if (write) {
codeMetrics.writeReport();
}
if i understood everything right the report can be printed in both situations. Only the writing depends on the choice of the user.

Merge sort java.lang.StackOverflowError

I am working on a project for school and things are going well until i tried to perform a merge sort on my ArrayList.
It will run but then it errors out. The first error of many is Exception in thread "main" java.lang.StackOverflowError.
I have looked over the code and cant find out why the error is occurring.
It does give me a location ( line 74:first_half = mergeSort(first_half); ) but i don't see the issue.
public static void main(String[] args) throws IOException {
// URL url = new
// URL("https://www.cs.uoregon.edu/Classes/15F/cis212/assignments/phonebook.txt");
FileReader fileReader = new FileReader("TestSort.txt");
BufferedReader bufferReader = new BufferedReader(fileReader);
String entry = bufferReader.readLine();
// Scanner s = new Scanner(url.openStream());
// int count = 0;
while (entry != null) {
// String person = s.nextLine();
String phoneNum = entry.substring(0, 7);
String name = entry.substring(9);
PhonebookEntry newentry = new PhonebookEntry(name, phoneNum);
phoneBook.add(newentry);
entry = bufferReader.readLine();
}
// ********************Selection
// Sort*************************************
ArrayList<PhonebookEntry> sortList = new ArrayList<PhonebookEntry>(phoneBook);
for (int min = 0; min < sortList.size(); min++) {
for (int i = min; i < sortList.size(); i++) {
int res = sortList.get(min).getName().compareTo(sortList.get(i).getName());
if (res > 0) {
PhonebookEntry temp = sortList.get(i);
sortList.set(i, sortList.get(min));
sortList.set(min, temp);
}
}
}
for (PhonebookEntry sortentry : sortList) {
System.out.println(sortentry);
}
System.out.println(mergeSort(mergeSortList));
}
// *****************************merge sort******************************************
static int mergecounter = 0;
static ArrayList<PhonebookEntry> mergeSortList = new ArrayList<PhonebookEntry>(appMain.phoneBook);
public static ArrayList<PhonebookEntry> mergeSort(ArrayList<PhonebookEntry> mergeSortLists) {
if (mergeSortLists.size() == 1) {
return mergeSortLists;
}
int firstHalf = mergeSortLists.size() % 2 == 0 ? mergeSortLists.size() / 2 : mergeSortLists.size() / 2 + 1;
ArrayList<PhonebookEntry> first_half = new ArrayList<PhonebookEntry>(mergeSortLists.subList(0, firstHalf));
ArrayList<PhonebookEntry> mergeSortHalf2 = new ArrayList<PhonebookEntry>(
mergeSortLists.subList(first_half.size(), mergeSortLists.size()));
System.out.println(++mergecounter);
first_half = mergeSort(first_half);
mergeSortHalf2 = mergeSort(mergeSortHalf2);
return merge(first_half, mergeSortHalf2);
}
public static ArrayList<PhonebookEntry> merge(ArrayList<PhonebookEntry> first_half,
ArrayList<PhonebookEntry> mergeSortHalf2) {
ArrayList<PhonebookEntry> returnMerge = new ArrayList<PhonebookEntry>();
while (first_half.size() > 0 && mergeSortHalf2.size() > 0) {
if (first_half.get(0).getName().compareTo(mergeSortHalf2.get(0).getName()) > 0) {
returnMerge.add(mergeSortHalf2.get(0));
mergeSortHalf2.remove(0);
}
else {
returnMerge.add(first_half.get(0));
first_half.remove(first_half.get(0));
}
}
while (first_half.size() > 0) {
returnMerge.add(first_half.get(0));
first_half.remove(first_half.get(0));
}
while (mergeSortHalf2.size() > 0) {
returnMerge.add(mergeSortHalf2.get(0));
mergeSortHalf2.remove(mergeSortHalf2.get(0));
}
return returnMerge;
}
}
My opinion there is no error in code.
How so sure?
I ran you code in my environment and its executed without any error.
With the text file i found at https://www.cs.uoregon.edu/Classes/15F/cis212/assignments/phonebook.txt As input
and done a simple implementation for PhonebookEntry
Then why is this error?
First off all try to understand the error, I mean why StackOverflowError occur. As there are lots of I am not going to explain this
But please read the top answer of this two thread and i am sure you will know why this happen.
Thread 1: What is a StackOverflowError?
Thread 2: What actually causes a Stack Overflow error?
If you read those I hope you understand the summury is You Ran Out Of Memory.
Then why I didnt got that error: Possible reason is
In my environment I configured the jvm to run with a higher memory 1024m to 1556m (as eclipse parameter)
Now lets analyze your case with solution:
Input: you have big input here ( 50,000 )
To check you code try to shorten the input and test.
You have executed two algorithm in a sigle method over this big Input:
When a method execute all its varibles stay in the memory untill it complete its execution.
so when you are calling merge sort all previouly user vairables and others stay in the memory which can contribute to this situation
Now if you use separated method and call them from the main method like write an method for selection sort, all its used varible will go out of scope
and possibly be free (if GC collect them) after the selection sort is over.
So write two separated method for reading input file and selection sort.
And Please Please close() those FileReader and BufferedReader.
Get out of those static mehtod . Make them non static create and object of the class and call them from main method
So its all about code optimization
And also you can just increase the memory for jvm and test by doing like this java -Xmx1556m -Xms1024m when ruining the app in command line
BTW, Thanks for asking this this question its gives me something to think about

Threaded sort running slower than non threaded sorting

I am trying to sort a file using threading. Here is Sort.java :
This function sorts with help of threading
public static String[] threadedSort(File[] files) throws IOException {
String sortedData[] = new String[0];
int counter = 0;
boolean allThreadsTerminated = false;
SortingThread[] threadList = new SortingThread[files.length];
for (File file : files) {
String[] data = getData(file);
threadList[counter] = new SortingThread(data);
threadList[counter].start();
counter++;
}
while(!allThreadsTerminated) {
allThreadsTerminated = true;
for(counter=0; counter<files.length; counter++) {
if(threadList[counter].getState() != Thread.State.TERMINATED) {
allThreadsTerminated = false;
}
}
}
for(counter=0; counter<files.length; counter++) {
sortedData = MergeSort.merge(sortedData, threadList[counter].data);
}
return sortedData;
}
This function sorts just normally
public static String[] sort(File[] files) throws IOException {
String[] sortedData = new String[0];
for (File file : files) {
String[] data = getData(file);
data = MergeSort.mergeSort(data);
sortedData = MergeSort.merge(sortedData, data);
}
return sortedData;
}
Now when I sort using both ways the normal sorting is faster than threaded version. What can be reason for it ? Had i missed something ?
My SortingThread is something like this :
public class SortingThread extends Thread {
String[] data;
SortingThread(String[] data) {
this.data = data;
}
public void run() {
data = MergeSort.mergeSort(data);
}
}
When I analyze my threaded implementation by comparing its performance to the original non-threaded implementation I find second one faster. What can be reason for such behavior ? If we talk of relative performance improvement we expect for threaded implementation to be faster if am not wrong.
EDIT : Assume I have properly functional MergeSort. But its of no use to post its code here. Also getData() function is just to take input from file.
I think problem lies with the fact that am taking whole file in array. I think I should provide different lines to different threads :
private static String[] getData(File file) throws IOException {
ArrayList<String> data = new ArrayList<String>();
BufferedReader in = new BufferedReader(new FileReader(file));
while (true) {
String line = in.readLine();
if (line == null) {
break;
}
else {
data.add(line);
}
}
in.close();
return data.toArray(new String[0]);
}
First of all, how do you measure elapsed time? Do you execute both tests in the same program? If so, keep in mind that mergesort will probably undergo Hotspot compilation while the first test is executed. I suggest you run each method twice, measuring the time on the second run
How many CPU/cores do you have? One problem with this code is that the main thread spends CPU time in "while(!allThreadsTerminated)" loop, actively checking thread state. If you have one CPU - you are wasting it, instead of doing actual sorting.
Replace the while-loop with:
for(counter=0; counter<files.length; counter++) {
threadList[counter].join();
}
You should use Stream and standard sort:
static String[] sort(File[] files, boolean parallel) {
return (parallel ? Stream.of(files).parallel() : Stream.of(files))
.flatMap(f -> {
try {
return Files.lines(f.toPath());
} catch (Exception e) {
e.printStackTrace();
return null;
}
})
.sorted()
.toArray(String[]::new);
}
static String[] sort(File[] files) {
return sort(files, false);
}
static String[] threadSort(File[] files) {
return sort(files, true);
}
In my environmet threadSort is faster.
sort:
files=511 sorted lines=104419 elapse=4784ms
threadSort:
files=511 sorted lines=104419 elapse=3060ms
You can use java.util.concurrent.ExecutorService which will run all your tasks in specified number of threads, and once all threads have finished execution you will get a list Future object which will hold the result of each thread execution. List of Future objects will be in same order as you inserted the Callable objects into its list.
For that first thing you need is have your SortingThread implement Callable interface so that you can get the result of each thread execution.
Each Callable object have to implement the call() method and its return type would be your Future object.
public class SortingThread implements Callable<String[]> {
String[] data;
SortingThread(String[] data) {
this.data = data;
}
#Override
public String[] call() throws Exception {
data = MergeSort.mergeSort(data);
return data;
}
}
Next you need is to use ExecutorSerivce for thread management.
public static String[] sortingExampleWithMultiThreads(File[] files) throws IOException {
String sortedData[] = new String[0];
int counter = 0;
boolean allThreadsTerminated = false;
SortingThread[] threadList = new SortingThread[files.length];
ArrayList<Callable<String[]>> callableList = new ArrayList<Callable<String[]>>();
for (File file : files) {
String[] data = getData(file);
callableList.add(new SortingThread(data)); //Prepare a Callable list which would be passed to invokeAll() method.
counter++;
}
ExecutorService service = Executors.newFixedThreadPool(counter); // Create a fixed size thread pool, one thread for each file processing...
List<Future<String[]>> futureObjects = service.invokeAll(callableList); //List of what call() method of SortingThread is returning...
for(counter=0; counter<files.length; counter++) {
sortedData = MergeSort.merge(sortedData, futureObjects.get(counter));
}
return sortedData;
}
This way you can avoid using WHILE loop which is known to increase CPU utilization (hence decrease in speed), and if you have single core CPU then it can reach 100% of utilization, and if dual core then 50%.
Also, using ExecutorService for thread management is better way when dealing with multi-threading instead of dev starting and monitoring threads for results. So, you can expect performance.
I have not ran it, so you may need to do so change here and there but I have highlighted you approach.
P.S.: When measuring the performance, to get the neat and precise results, always have a new JVM instance created for each run.

How can I recursively print a File Array?

I would like to know how to recursively print a File[]. I have made a program but it seems that the program is going out of bounds and I don't know how to fix it. Can someone please give me a few pointers or hints on how to solve this problem? Thanks.
import java.io.*;
public class RecursiveDir {
static BufferedReader br = new BufferedReader(new InputStreamReader(System.in));
public static void main(String[]args) throws IOException {
System.out.print("Please enter a directory name: ");
File f = new File(br.readLine());
FileFilter filter = new FileFilter() {
public boolean accept(File f) {
if(f.isDirectory()) {
return true;
}
return false;
}
};
File[] list = f.listFiles(filter);
System.out.println(returnDir(list,list.length));
}
public static File returnDir(File[] file,int counter) {
File f = file[counter];
if(counter == 0) {
return file[0];
}else {
return f = returnDir(file,counter--);
}
}
}
EDIT: I followed the comments below and changed return f = returnDir(file,counter--); to
return f = returnDir(file,--counter); and also changed returnDir(list,list.length); to
returnDir(list,list.length-1);, my code runs fine but now nothing is printing.
You are going out of the array bound because you need to pass list.length - 1 to the method.
Even if you did that, though, you would have an infinite recursion, because counter-- will use the value of counter, and then decrement it. So that means you are calling returnDir with the current value of counter. Use either --counter, or counter - 1.
What do expect to happen here? You don't seem to be doing anything with the files as you visit them. There is no need for recursion to loop through the files in the directory, the recursion is needed when you hit a file in the list that is a directory.
You are indeed going out of bounds. You need to change
returnDir(list,list.length);
to
returnDir(list,list.length - 1 );
You seem to be missing your System.out.println() calls. You are looping through the files and not doing anything with them.
Your initial call to returnDir should be
returnDir(list,list.length-1);
Paul

Categories