Scanning for files that contain variable name - java

I have a simple piece of code that currently uses tesseract OCR to read the text in any given image and then count how many lines it produces. However, I would like to search a directory for any document containing a string (such as M000123456) and return a number of how many documents contain that in their name and compare that to the number tesseract output. The documents are named liked so: M000123456_V987654_05-07-2000.pdf. What's the best way to do this?
import java.io.File;
import net.sourceforge.tess4j.Tesseract;
import net.sourceforge.tess4j.TesseractException;
public class Main {
public static void main(String[] args) throws TesseractException {
Tesseract tesseract = new Tesseract();
tesseract.setDatapath("C:\\Users\\mmx0409\\Downloads\\Tess4J-3.4.8-src\\Tess4J\\tessdata");
// the path of your tess data folder
// inside the extracted file
String text
= tesseract.doOCR(new File("C:\\Users\\mmx0409\\Downloads\\testimage.png"));
// path of your image file
System.out.print(text);
System.out.println(text.lines().count()); // count the number of lines tesseract saw
}
}

You can use the below function to count the number of the document which is having searchString in its name.
public int countDocuments(String directoryPath, String searchString) {
File folder = new File(directoryPath);
File[] listOfFiles = folder.listFiles();
int count = 0;
for (int i = 0; i < listOfFiles.length; i++) {
if (listOfFiles[i].isFile()) {
String fileName = listOfFiles[i].getName();
if (fileName.contains(searchString)) {
count++;
}
}
}
return count;
}

Related

Filter out the text files

The code separates files from directories.
I am trying to filter out the text files(.txt) and print out the files that remain.
I don't want the text files to be printed at all. I want the code to be implemented after the if statement if (listOfFiles[i].isFile()) { so after it checks to see if a given value is an actual file and then to determine if it is a text file, and if either test fails, add it to the listOfFiles array list.
Need help
import java.io.BufferedInputStream;
import java.io.File;
import java.util.ArrayList;
import java.util.List;
public class Exc_3 {
public static void main(String[] args) {
File folder = new File("C:\\Users\\skyla\\Desktop");
File[] listOfFiles = folder.listFiles();
List<String> files = new ArrayList<>();
List<String> directories = new ArrayList<>();
for (int i = 0; i < listOfFiles.length; i++) {
if (listOfFiles[i].isFile()) {
files.add(listOfFiles[i].getName());
} else if (listOfFiles[i].isDirectory()) {
directories.add(listOfFiles[i].getName());
}
}
System.out.println("List of files :\n---------------");
for (String fName : files)
System.out.println(fName);
System.out.println("\nList of directories :\n---------------------");
for (String dName : directories)
System.out.println(dName);
}
}
Since you're only checking if the file extension is ".txt" then you can check the name with String.endsWith(".txt")
if (listOfFiles[i].isFile()) {
if (listOfFiles[i].getName().endsWith(".txt")) {
files.add(listOfFiles[i].getName());
}
}
Why don't you use the piece of code below. You can also use it into a function that checks whether your files is a .txt file by producing a boolean true if mimeType="text/plain"
Path path = FileSystems.getDefault().getPath("myFolder", "myFile");
String mimeType = Files.probeContentType(path);
Good luck

Remove certain characters from a directory's name in Java

So I wrote a little program to use on a Unix machine that has to get all the names of the files and folders in a directory where it is stored and then remove all the characters from them. These characters (or a character) will be defined by a user. Use case: I put the program in the directory containing various useless files and directories named, for example, "NaCl2!!!!!!!!!", "H2O!", "O2" and "Lithium!!!!!" and I "ask" it to get rid of all the bangs in all the directores' names so it will result in this:
ls
NaCl2 H2O O2 Lithium Unreal3.zip
Ok I guess you get it. So here's the code and it doesn't compile (
DirRename.java:18: error: method renameTo in class File cannot be applied to given types;
tempDir.renameTo(name);
). I guess this error is caused with a substantial problem in my code. Is there a way to get it working, can you tell me, please?
import java.io.*; import java.util.Scanner;
class DirRename {
public static void main(String[] s) {
//DECLARING
String name, curDir, annoyngChar;
Scanner scan = new Scanner(System.in);
//WORKING
curDir = System.getProperty("user.dir");
File dir = new File(curDir);
File[] listOfFiles = dir.listFiles();
System.out.print("Type a character (or a line of them) that you want to remove from directories' names:");
annoyngChar = scan.nextLine();
System.out.println("\nAll directories will get rid of " + annoyngChar + " in their names.");
for (int i = 0; i < listOfFiles.length; i++) {
if (listOfFiles[i].isDirectory() ) {
File tempDir = listOfFiles[i];
name = tempDir.getName().replaceAll(annoyngChar, "");
tempDir.renameTo(name);
}
}
}
}
Need to say, the program is unfinished, I am sorry for that.
File.renameTo(File dest) takes a File parameter, not a String. So, you need to create a File instance with the correct path (using the new name), and pass that instance to renameTo.
Try this (not tested):
import java.io.*; import java.util.Scanner;
class DirRename {
public static void main(String[] s) {
//DECLARING
String name, curDir, annoyngChar;
Scanner scan = new Scanner(System.in);
//WORKING
curDir = System.getProperty("user.dir");
File dir = new File(curDir);
File[] listOfFiles = dir.listFiles();
System.out.print("Type a character (or a line of them) that you want to remove from directories' names:");
annoyngChar = scan.nextLine();
System.out.println("\nAll directories will get rid of " + annoyngChar + " in their names.");
for (int i = 0; i < listOfFiles.length; i++) {
if (listOfFiles[i].isDirectory() ) {
File f = listOfFiles[i];
String oldName = f.getName();
name = oldName.replaceAll(annoyngChar, "");
if (!oldName.equals(name)) {
File newF = new File(dir, name);
f.renameTo(newF);
}
}
}
}
}
Alternatively, you can use Files.move to rename the file. The documentation has an example of how to rename a file while keeping it in the same directory.

Having a variable from another class as the file name (GUI)

One class of my GUI has a variable for the file name. I want to pass this to another class so that I can process a file without having to hard code the file's name every time. The program compiles fine but I can't seem to run it correctly.
public void run() {
WordsCounter2 fileName = new WordsCounter2();
essayName = fileName.getFileList();
File f = new File(essayName);
//other code
WordsCounter2 is the class that houses the variable fileName, I'm calling it from this class and assigning it as the file's name, but this doesn't work. Could someone help?
if (rVal == JFileChooser.APPROVE_OPTION) {
File[] selectedFile = fileChooser.getSelectedFiles();
fileList = "nothing";
if (selectedFile.length > 0)
fileList = selectedFile[0].getName();
for (int i = 1; i < selectedFile.length; i++) {
fileList += ", " + selectedFile[i].getName();
}
statusBar.setText("You chose " + fileList);
}
else {
statusBar.setText("You didn't choose a file.");
}
fileList isn't empty because I have a label on the GUI that lists whatever file I chose.
Here's my new edit: now the exception occurs at the last line with the scanner and throws a NPE. Can you help?
public void run() {
WordsCounter2 pathNamesList = new WordsCounter2();
essayName = pathNamesList.getPathNamesList();
essayTitle = new String[essayName.size()];
essayTitle = essayName.toArray(essayTitle);
for (int i = 0; i < essayTitle.length; i++) {
f = new File(essayTitle[i]);
}
try {
Scanner scanner = new Scanner(f);
Your code is failing because File will not accept comma separated file names, in fact, it needs a single file path to create the file in the mentioned path. See here: https://docs.oracle.com/javase/7/docs/api/java/io/File.html
You'll have to get complete paths in an array and put the file creation statement as follows:
File f;
for (int i=0; i<fileList.length; i++)
f = new File(fileList[i]);
where fileList is a String array holding the list of pathnames.
In case you're trying to write some content to these files as well, this should be helpful: Trying to Write Multiple Files at Once - Java

how can i take the names of the files that are present in a file and store it in a buffer using java [duplicate]

This question already has answers here:
How do I create a Java string from the contents of a file?
(35 answers)
Closed 7 years ago.
i have the below files in the folder
DLY-20150721-BOOST_UVERSE-ADJ.xls
DLY-20150721-BOOST_UVERSE-ADJ-BOOST-DISCONNECT.txt
DLY-20150721-BOOST_UVERSE-ADJ-ERR.txt
DLY-20150721-BOOST_UVERSE-ADJ-RR20150721181623+0530.xls
DLY-20150721-BOOST_UVERSE-ADJ-BOOST-DISCONNECT-RR20150721181623+0530.txt
DLY-20150721-BOOST_UVERSE-ADJ-ERR-RR20150721181623+0530.txt
i'm getting the filename='DLY-20150721-BOOST_UVERSE-ADJ.xls' from a source.
so by refering the file name i want to pick the associated .txt file names like
DLY-20150721-BOOST_UVERSE-ADJ-BOOST-DISCONNECT.txt
DLY-20150721-BOOST_UVERSE-ADJ-ERR.txt
and store it in a buffer. but i'm not getting any idea how to do.
i hope by using a regex i guess i can pic the files. but how to pic the whole file names and store in buffer ?
I am a beginner. I don't know how to store files in a buffer. However, to find files using 'DLY-20150721-BOOST_UVERSE-ADJ.xls', try:
import java.io.File;
public class X {
public static void main(String[] args) {
File file = new File("FOLDER_PATH_HERE");
File[] files = file.listFiles();
String filename1 = "DLY-20150721-BOOST_UVERSE-ADJ.xls"; // File to search for
String filename2 = removeExtension(filename); // filename without extension
for(int i = 0; i<files.length; i++) {
if( files[i].getName().matches(filename2 + ".*")
&& getExtension(files[i]).equals(".txt")
&& (files[i].getName().indexOf("RR") == -1) ) {
//Store file in a buffer
}
}
}
public static String getExtension(File file) {
String fileName = file.getName();
int lastDot = fileName.lastIndexOf('.');
return fileName.substring(lastDot);
}
}
public static String removeExtension(File file) {
String fileName = file.getName();
int lastDot = fileName.lastIndexOf('.');
return fileName.substring(0,lastDot);
}

Java iterative reading of Files

at the moment I'm having a problem with writing a tool for my company. I have 384 XML files that i have to read and parse with a SAX Parser into txt files.
What i got until now is the parsing of all XML-Files into one txt File, size 43 MB. With a BufferedReader and line.startsWith i want to extract all relevant information out of the textfile.
Edit: Done
(So my Problem is how to solve this more efficiently. I'm having an idea (but unfortunately not in code as you might think) but i dont know if its possible: I want to iterate through a Directory, find the XML-File i want, then parse it and create a new txt File with the parsed content. If done for all 384 XML files i want the same thing for the 384 txt files, read them with a BufferedReader to get my relevant information. Its important to read them one at a time. Another Problem is the Directory path, its a bit complex: "C:\Users\xxx\Documents\Data\ProjectName\A1\1\1SLin\wanted.xml" for each file there is a own directory. The variable is A1, it reaches from A-P and 1-24. Alternatively I have all the relevant files with thir absolute path in an arraylist, so its also okay to iterate over this list if its easier.)
Edit:
I came to a solution: Below contains the search directories method and a method to parse the xml Files of a List into the same directory with the same filename but another file extension
public List<File> searchFile(File dir, String find) {
File[] files = dir.listFiles();
List<File> matches = new ArrayList<File>();
if (files != null) {
for (int i = 0; i < files.length; i++) {
if (files[i].isDirectory()) {
matches.addAll(searchFile(files[i], find));
} else if (files[i].getName().equalsIgnoreCase(find)) {
matches.add(files[i]);
}
}
}
Collections.sort(matches);
return matches;
}
public static void main(String[] args) throws IOException {
Import_Files im = new Import_Files();
File dir = new File("C:\\Users\\xxx\\Desktop\\MS-Daten\\");
String name = "snp_result_5815.xml";
List<File> matches = im.searchFile(dir, name);
System.out.println(matches);
for (int i=0; i<matches.size(); i++) {
String j = String.valueOf(i);
String xml_name = matches.get(i).getAbsolutePath();
File f = new File(matches.get(i).getAbsolutePath().replaceFirst(".xml", ".txt"));
System.setOut(new PrintStream(new FileOutputStream(f)));
System.out.println("\nstarting File: "+ i + "\n");
xml_parse myReader = new xml_parse(xml_name);
myReader.setContentHandler(new MyContentHandler());
myReader.setErrorHandler(new MyErrorHandler());
myReader.run();
}
}
The searchFolder method below will take a path and file extension, search the path and all sub-directories, and pass any matching file types to the processFile method.
public static void main(String[] args) {
String path = "c:\\temp";
Pattern filePattern = Pattern.compile("(?i).*\\.xml$");
searchFolder(path, filePattern);
}
public static void searchFolder(String searchPath, Pattern filePattern){
File dir = new File(searchPath);
for(File item : dir.listFiles()){
if(item.isDirectory()){
//recursively search subdirectories
searchFolder(item.getAbsolutePath(), filePattern);
} else if(item.isFile() && filePattern.matcher(item.getName()).matches()){
processFile(item);
}
}
}
public static void processFile(File aFile){
String filename = aFile.getAbsolutePath();
String txtFilename = filename.substring(0, filename.lastIndexOf(".")) + ".txt";
//Do your xml file parsing and write to txtFilename
}
The complexity of the path makes no difference, just specify the root path to search (looks like C:\Users\xxx\Documents\Data\ProjectName in your case) and it will find all the files.

Categories