I am working in a desktop application for windows using Java. In my application, there is a requirement to search all .php. To do this, I use recursive methods.
import java.io.File;
public class Copier {
public static void find(String source,String rep) {
File src = new File(rep);
if (src!= null && src.exists() && src.isDirectory()) {
String[] tab = src.list();
if (tab != null) {
for(String s : tab) {
File srcc = new File(rep+"\\"+s);
if (srcc.isFile()) {
if (srcc.getName().matches(".*"+source+"$")) {
System.out.println(s);
}
} else {
find(source,srcc.getAbsolutePath());
}
}
} else {
//System.out.println(" list is null");
}
}
}
public static void main(String[] args) {
try {
find(".java", "C:\\");
} catch (Exception e) {
e.printStackTrace();
}
}
}
Is it possible to do this with an iterative algorithm?
Of course. Use breadth-first-search with queue. You start from C:\ and at every step you pop the top folder from the queue and push all subfolders to the end of the queue.
Pseudocode follows:
queue.push("C:\");
while (!queue.empty()) {
String topFolder = queue.pop();
foreach (subFolder of topFolder) {
queue.push(subFolder);
}
}
I can't see why you want to get rid of recursion although theoretically what you are looking for is possible.
But a good way to get a faster program could be to use a filefilter when you list the children of a directory. One for directories and one for matching files (this one should use a java.util.regexp.Pattern).
-updated
You can find the doc for the overload of File.list to use here. And for the pattern, you could something like a local variable (outside your loop or a data member if you use recursion).
Pattern p = Pattern.compile( ".*"+source+".*" );
boolean found = p.matcher( p.matcher( srcc.getName() ).matches() );
Oh, and by the way, don't convert srcc into a file ! Work with strings and build as few objects as you can.
You can always use a queue in place of recursion. In this case, I think it makes the code look a little bit easier to read. Often you'll get better performance from an iterative implementation than a recursive one though in this case, they both run at nearly the same speed (at least on my machine).
public static List<String> find(final String source, final String directory)
{
List<String> results = new LinkedList<String>();
Stack<String> stack = new Stack<String>();
stack.add(directory);
String rep;
while (!stack.isEmpty()) {
rep = stack.pop();
File src = new File(rep);
if (src != null && src.exists() && src.isDirectory()) {
String[] tab = src.list();
if (tab != null) {
for (String s : tab) {
File srcc = new File(rep + File.separatorChar + s);
if (srcc.isFile()) {
if (srcc.getName().matches(".*" + source + "$")) {
// System.out.println(s);
results.add(s);
}
} else {
stack.add(srcc.getAbsolutePath());
}
}
} else {
// System.out.println(" list is null");
}
}
}
return results;
}
Related
I want to create a Java application to identify duplicates. So far I can find duplicates only by name, but I also need size, file type, and maybe content. This is my code so far, using a HashMap:
public static void find(Map<String, List<String>> lists, File dir) {
for (File f : dir.listFiles()) {
if (f.isDirectory()) {
find(lists, f);
} else {
String hash = f.getName() + f.length();
List<String> list = lists.get(hash);
if (list == null) {
list = new LinkedList<String>();
lists.put(hash, list);
}
list.add(f.getAbsolutePath());
}
}
}
I used MessageDigest and checked some files and find the duplicates according to all the criteria I have listed in the title and description. Thank you all.
private static MessageDigest messageDigest;
static {
try {
messageDigest = MessageDigest.getInstance("SHA-512");
} catch (NoSuchAlgorithmException e) {
throw new RuntimeException("cannot initialize SHA-512 hash function", e);
}
}
and this is the result after implementation in the search code for duplicates
public static void find(Map<String, List<String>> lists, File dir) {
for (File f : dir.listFiles()) {
if (f.isDirectory()) {
find(lists, f);
} else {
try{
FileInputStream fi = new FileInputStream(f);
byte fileData[] = new byte[(int) f.length()];
fi.read(fileData);
fi.close();
//Crearea id unic hash pentru fisierul curent
String hash = new BigInteger(1, messageDigest.digest(fileData)).toString(16);
List<String> list = lists.get(hash);
if (list == null) {
list = new LinkedList<String>();
}
//Adăugați calea către listă
list.add(f.getAbsolutePath());
//Adauga lista actualizată la tabelul Hash
lists.put(hash, list);
}catch (IOException e) {
throw new RuntimeException("cannot read file " + f.getAbsolutePath(), e);
}
}
}
}
Considering 2 files equal if they have the same extension and the same file size is simply a matter of creating an object that represents this 'equality'. So, you'd make something like:
public class FileEquality {
private final String fileExtension;
private final long fileSize;
// constructor, toString, equals, hashCode, and getters here.
}
(and fill in all the missing boilerplate: Constructor, toString, equals, hashCode, and getters. See Project Lombok's #Value to make this easy if you like). You can get the file extension from a file name by using fileName.lastIndexOf('.') and fileName.substring(lastIndex). With lombok all you'd have to write is:
#lombok.Value public class FileEquality {
String fileExtension;
long fileSize;
}
Then use FileEquality objects as keys in your hashmap instead of strings. However, just because you have, say, 'foo.txt' and 'bar.txt' that both happen to be 500 bytes in size doesn't mean these 2 files are duplicates. So, you want content involved too, but, if you extend your FileEquality class to include the content of the file, then 2 things come up:
If you're checking content anyway, what does the size and file extension matter? If the content of foo.txt and bar.jpg are precisely the same, they are duplicates, no? Why bother. You can convey the content as a byte[], but note that writing a proper hashCode() and equals() implementation (which are required if you want to use this object as a key for hashmaps) becomes a little trickier. Fortunately, lombok's #Value will get it right, so I suggest you use that.
This implies the entirety of the file content is in your JVM's process memory. Unless you're doing a check on very small files, you'll just run out of memory. You can abstract this away somewhat by not storing the file's entire content, but storing a hash of the content. Google around for how to calculate the sha-256 hash of a file in java. Put this hash value in your FileEquality and now you avoid the memory issue. It is theoretically possible to have 2 files with different contents which nevertheless hash to the exact same sha-256 value but the chances of that are astronomical, and more to the point, sha-256 is designed such that it is not mathematically feasible to intentionally make 2 such files to mess with your application. Therefore, I suggest you just trust the hash :)
Note, of course, that hashing an entire file requires reading the entire file, so if you run your duplicate finder on a directory containing, say, 500GB worth of files, then your application will require at the very least reading of 500GB, which will take some time.
I made this application long ago I found some of its source code for you if you want to learn.
this method works by comparing both of files bytes.
public static boolean checkBinaryEquality(File file1, File file2) {
if(file1.length() != file2.length()) return false;
try(FileInputStream f1 = new FileInputStream(file1); FileInputStream f2 = new FileInputStream(file2)){
byte bus1[] = new byte[1024],
bus2[] = new byte[1024];
// comparing files bytes one by one if we found unmatched results that means they are not equal
while((f1.read(bus1)) >= 0) {
f2.read(bus2);
for(int i = 0; i < 1024;i++)
if(bus1[i] != bus2[i])
return false;
}
// passed
return true;
} catch (IOException exp) {
// problems occurred so let's consider them not equal
return false;
}
}
combine this method with name and extension checking and you are ready to go.
copy-paste-example
create a class that extends File
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.util.Arrays;
public class MyFile extends File {
private static final long serialVersionUID = 1L;
public MyFile(final String pathname) {
super(pathname);
}
#Override
public boolean equals(final Object obj) {
if (this == obj) {
return true;
}
if (this.getClass() != obj.getClass()) {
return false;
}
final MyFile other = (MyFile) obj;
if (!Arrays.equals(this.getContent(), other.getContent())) {
return false;
}
if (this.getName() == null) {
if (other.getName() != null) {
return false;
}
} else if (!this.getName().equals(other.getName())) {
return false;
}
if (this.length() != other.length()) {
return false;
}
return true;
}
#Override
public int hashCode() {
final int prime = 31;
int result = prime;
result = (prime * result) + Arrays.hashCode(this.getContent());
result = (prime * result) + ((this.getName() == null) ? 0 : this.getName().hashCode());
result = (prime * result) + (int) (this.length() ^ (this.length() >>> 32));
return result;
}
private byte[] getContent() {
try (final FileInputStream fis = new FileInputStream(this)) {
return fis.readAllBytes();
} catch (final IOException e) {
e.printStackTrace();
return new byte[] {};
}
}
}
read base directory
import java.io.File;
import java.util.HashMap;
import java.util.Iterator;
import java.util.List;
import java.util.Map;
import java.util.Map.Entry;
import java.util.Vector;
public class FileTest {
public FileTest() {
super();
}
public static void main(final String[] args) {
final Map<MyFile, List<MyFile>> duplicates = new HashMap<>();
FileTest.handleDirectory(duplicates, new File("[path to base directory]"));
final Iterator<Entry<MyFile, List<MyFile>>> iterator = duplicates.entrySet().iterator();
while (iterator.hasNext()) {
final Entry<MyFile, List<MyFile>> next = iterator.next();
if (next.getValue().size() == 0) {
iterator.remove();
} else {
System.out.println(next.getKey().getName() + " - " + next.getKey().getAbsolutePath());
for (final MyFile file : next.getValue()) {
System.out.println(" ->" + file.getName() + " - " + file.getAbsolutePath());
}
}
}
}
private static void handleDirectory(final Map<MyFile, List<MyFile>> duplicates, final File directory) {
final File dir = directory;
if (dir.isDirectory()) {
final File[] files = dir.listFiles();
for (final File file : files) {
if (file.isDirectory()) {
FileTest.handleDirectory(duplicates, file);
continue;
}
final MyFile myFile = new MyFile(file.getAbsolutePath());
if (!duplicates.containsKey(myFile)) {
duplicates.put(myFile, new Vector<>());
} else {
duplicates.get(myFile).add(myFile);
}
}
}
}
}
I'm building a small application in Java, small game mechanics but nothing serious. I have a class which purpose is to fetch data from a file. But when I declare the two classes to read from it the program justs ignore everything and continues. As a result, when I try to access the respective lists it gives me null pointer exception. Code of the method that fetches data below:
public void getData(int l, player tmp, level le) {
String[] dataPlayer;
String[] dataLevel;
try {
//FileReader f = new FileReader(this.levelPath.concat(Integer.toString(l)));
File f = new File(this.levelPath.concat(Integer.toString(l)));
BufferedReader buff = new BufferedReader(new FileReader(f));
System.out.println("Reached");
boolean eof = false;
while (!eof) {
String b = buff.readLine();
if (b == null)
eof = true;
else {
if (b.contains("player")) {
dataPlayer = b.split("-");
for (int i = 0; i < dataPlayer.length; i++) {
if (i == 0)
continue;
items it = new items(dataPlayer[i]);
tmp.setInventory1(it);
}
}else if (b.contains("level")) {
dataLevel = b.split("-");
for (int i = 0; i < dataLevel.length; i++) {
if (i == 0)
continue;
items it = new items(dataLevel[i]);
le.setSpecific(it);
}
}
}
}
}catch (IOException i) {
i.getMessage();
}
}
File contents of the file "levelData1":
player-hat
player-flashlight
level-flower
level-rock
player-adz
The problem with this particular problem was the path, it needed the absolute like that /home/toomlg4u/IdeaProjects/javaProject/src/Data/levelData.
You're doing a lot of things inside that try/catch that may not throw an IOException. If you get any other exception, it's not going to be caught. Depending on what other exception handling you have in place, that may cause weird behavior. For debugging, you could catch all exceptions, and see if you're getting something else.
If you want to remain to your loop code then you can refactor your code to look like this one:
public void getData(int l, player tmp, level le) {
try (BufferedReader buff = new BufferedReader(new FileReader(new File(this.levelPath + l)))) {
String b;
while ((b = buff.readLine()) != null) {
if (b.contains("player")) {
String[] dataPlayer = b.split("-");
items it = new items(dataPlayer[1]); //because you know that you will have an array with only 2 elements
tmp.setInventory1(it);
}else if (b.contains("level")) {
String[] dataLevel = b.split("-");
items it = new items(dataLevel[1]); //because you know that you will have an array with only 2 elements
le.setSpecific(it);
}
}
}catch (IOException e) {
e.printStackTrace();
}
}
It is a little bit better than that you have, easier to debug and to read. I advice you to read about try with resources.
As a rule of thumb, each time when you open a stream you have to close it. When you don't open it yourself then don't close it.
This is how it should look like a decent program in Java:
private Stream<Items> asStreamOfItems(String line){
return Stream.of(line.split("-")).skip(1).map(Items::new);
}
public void parseFile(String pathToTheFile) throws IOException {
List<String> lines = Files.readAllLines(Paths.get(pathToTheFile));
List<Items> players = lines.stream().filter(line -> line.contains("player")).flatMap(this::asStreamOfItems).collect(Collectors.toList());
List<Items> levels = lines.stream().filter(line -> line.contains("level")).flatMap(this::asStreamOfItems).collect(Collectors.toList());
........
}
In this case all your weird errors will vanish.
After you edited the post I saw your file content. In this case the code should look like this one:
class Items {
private final String name;
public Items(String name) {
this.name = name;
}
public String getName() {
return name;
}
public static Items parse(String line) {
return new Items(line.split("-")[1]);
}
}
public void parseFile(String pathToTheFile) throws IOException {
List<String> lines = Files.readAllLines(Paths.get(pathToTheFile));
List<Items> players = lines.stream().filter(line -> line.contains("player")).map(Items::parse).collect(Collectors.toList());
List<Items> levels = lines.stream().filter(line -> line.contains("level")).map(Items::parse).collect(Collectors.toList());
..............
}
Btw, you broke a lot of Java and general programming rules like:
using continue is a bad practice. It should be used only in extreme cases because it makes the code difficult to read.
the class name in Java should be in the CamelCase notation
one method should have only one responsibility
DON'T mutate the object inside of a method (example: tmp.setInventory1(it);) very very very bad practice
when you work with streams use try with resource or try/catch/finally to close your stream after you finish the reading.
Before jumping to write code explore the JAVA IO SDK to look for better methods to read from files
im implementing external merge sort using Java.
So given a file I split it into smaller ones , then sort the smaller portions and finally merge the sorted (smaller) files.
So , the last step is what im having trouble with.
I have a list of files and I want at each step , take the minimum value of the first rows of each file and then remove that line.
So , it is supposed to be something like this:
public static void mergeSortedFiles(List<File> sorted, File output) throws IOException {
BufferedWriter wf = new BufferedWriter(new FileWriter(output));
String curLine = "";
while(!sorted.isEmpty()) {
curLine = findMinLine(sorted);
wf.write(curLine);
}
}
public static String findMinLine(List<File> sorted) throws IOException {
List<BufferedReader> brs = new ArrayList<>();
for(int i =0; i<sorted.size() ; i++) {
brs.add(new BufferedReader(new FileReader(sorted.get(i))));
}
List<String> lines = new ArrayList<>();
for(BufferedReader br : brs) {
lines.add(br.readLine());
}
Collections.sort(lines);
return lines.get(0);
}
Im not sure how to update the files, anyone can help with that?
Thanks for helping!
You can create a Comparable wrapper around each file and then place the wrappers in a heap (for example a PriorityQueue).
public class ComparableFile<T extends Comparable<T>> implements Comparable<ComparableFile<T>> {
private final Deserializer<T> deserializer;
private final Iterator<String> lines;
private T buffered;
public ComparableFile(File file, Deserializer<T> deserializer) {
this.deserializer = deserializer;
try {
this.lines = Files.newBufferedReader(file.toPath()).lines().iterator();
} catch (IOException e) {
// deal with it differently if you want, I'm just providing a working example
// and wanted to use the constructor in a lambda function
throw new UncheckedIOException(e);
}
}
#Override
public int compareTo(ComparableFile<T> that) {
T mine = peek();
T theirs = that.peek();
if (mine == null) return theirs == null ? 0 : -1;
if (theirs == null) return 1;
return mine.compareTo(theirs);
}
public T pop() {
T tmp = peek();
if (tmp != null) {
buffered = null;
return tmp;
}
throw new NoSuchElementException();
}
public boolean isEmpty() {
return peek() == null;
}
private T peek() {
if (buffered != null) return buffered;
if (!lines.hasNext()) return null;
return buffered = deserializer.deserialize(lines.next());
}
}
Then, you can merge them this way:
public class MergeFiles<T extends Comparable<T>> {
private final PriorityQueue<ComparableFile<T>> files;
public MergeFiles(List<File> files, Deserializer<T> deserializer) {
this.files = new PriorityQueue<>(files.stream()
.map(file -> new ComparableFile<>(file, deserializer))
.filter(comparableFile -> !comparableFile.isEmpty())
.collect(toList()));
}
public Iterator<T> getSortedElements() {
return new Iterator<T>() {
#Override
public boolean hasNext() {
return !files.isEmpty();
}
#Override
public T next() {
if (!hasNext()) throw new NoSuchElementException();
ComparableFile<T> head = files.poll();
T next = head.pop();
if (!head.isEmpty()) files.add(head);
return next;
}
};
}
}
And here's some code to demonstrate it works:
public static void main(String[] args) throws IOException {
List<File> files = Arrays.asList(
newTempFile(Arrays.asList("hello", "world")),
newTempFile(Arrays.asList("english", "java", "programming")),
newTempFile(Arrays.asList("american", "scala", "stackoverflow"))
);
Iterator<String> sortedElements = new MergeFiles<>(files, line -> line).getSortedElements();
while (sortedElements.hasNext()) {
System.out.println(sortedElements.next());
}
}
private static File newTempFile(List<String> words) throws IOException {
File tempFile = File.createTempFile("sorted-", ".txt");
Files.write(tempFile.toPath(), words);
tempFile.deleteOnExit();
return tempFile;
}
Output:
american
english
hello
java
programming
scala
stackoverflow
world
So what you want to do is to swap two lines in a text file? You can do it by using a RandomAccessFile however this will be horrible slow since everytime when you swap two lines you have to wait for the next IO burst.
So i highly recommend you to use the following code to be able to do the merge sort on the heap:
List<String> lines1 = Files.readAllLines(youFile1);
List<String> lines2 = Files.readAllLines(youFile2);
//use merge sort on theese lines
List<String> merged;
FileWriter writer = new FileWriter(yourOutputFile);
for(String str: merged) {
writer.write(str + System.lineSeparator());
}
writer.close();
The standard merge technique between a fixed number of files (say, 2) is :
have a variable for the value of the ordering key of the current record of each file (for java, make that variable Comparable).
start the process by reading the first record of each file (and fill in the corresponding variable)
loop (until end-of-file on both) through a code block that says essentially
if (key_1.compareTo(key_2) == 0) { process both files ; then read both files}
else if (key_1.compareTo(key_2) == -1) { process file 1 ; then read file 1}
else { process file 2 ; then read file 2}
Note how this code does essentially nothing more than determine the file with the lowest key, and process that.
If your number of files is variable, then your number of key variables is variable too, and "determining the file with the lowest current key" cannot be done as per above. Instead, have as many current_key_value objects as there are files, and store them all in a TreeSet. Now, the first element of the TreeSet will be the lowest current key value of all the files and if you make sure that you maintain a link between your key variable and the file number you just process that file (and delete the just processed key value from the TreeSet and read a new record from the processed file and add its key value to the TreeSet).
i want to count the total numer of variables from text file in java for this purpose we use this code
try {
BufferedReader reader = new BufferedReader(new FileReader(fn));
String line = reader.readLine();
while(line !=null)
{
Scanner fs = new Scanner(reader);
while(fs.hasNext())
{
String s = fs.next();
if( s.startsWith("int")) {
s1 = ";" ;
while(!(s1.equals(s2))){
Scanner fd = new Scanner(reader);
while(fd.hasNext()){
c = fd.next();
if(c.contains(","))
cint++;
else
cint++;
if(c.startsWith(";"))
break;
}
s2 = c ;
}
}
if(s.startsWith("short")) {
cshort++;
}
if(s.startsWith("byte")) {
cbyte++;
}
if(s.startsWith("long")) {
clong++;
}
if(s.startsWith("float")) {
cfloat++;
}
if(s.startsWith("boolean")) {
cboolean++;
}
if(s.startsWith("double")) {
cdouble++;
}
if(s.startsWith("char")) {
cchar++;
}
if(s.startsWith("abstract")) {
cabstract++;
}
if(s.startsWith("continue")) {
ccontinue++;
}
if(s.startsWith("switch")) {
cswitch++;
}
if(s.startsWith("assert")) {
cassert++;
}
if(s.startsWith("default")) {
cdefault++;
}
if(s.startsWith("goto")) {
cgoto++;
}
if(s.startsWith("package")) {
cpackage++;
}
if(s.startsWith("synchronized")) {
csync++;
}
if(s.startsWith("do")) {
cdo++;
}
if(s.startsWith("if")) {
cif++;
}
if(s.startsWith("private")) {
cprivate++;
}
if(s.startsWith("this")) {
cthis++;
}
if(s.startsWith("break")) {
cbreak++;
}
if(s.startsWith("implements")) {
cimplements++;
}
if(s.startsWith("protected")) {
cprotected++;
}
if(s.startsWith("catch")) {
ccatch++;
}
if(s.startsWith("extends")) {
cextends++;
}
if(s.startsWith("try")) {
ctry++;
}
if(s.startsWith("final")) {
cfinal++;
}
if(s.startsWith("interface")) {
cinterface++;
}
if(s.startsWith("static")) {
cstatic++;
}
if(s.startsWith("void")) {
cvoid++;
}
if(s.startsWith("instanceof")) {
cinstanceof++;
}
if(s.startsWith("class")) {
cclass++;
}
if(s.startsWith("finally")) {
cfinally++;
}
if(s.startsWith("strictfp")) {
cstrictfp++;
}
if(s.startsWith("volatile")) {
cvolatile++;
}
if(s.startsWith("const")) {
cconst++;
}
if(s.startsWith("native")) {
cnative++;
}
if(s.startsWith("super")) {
csuper++;
}
if(s.startsWith("while")) {
cwhile++;
}
if(s.startsWith("for")) {
cfor++;
}
}
line = reader.readLine();
}
} catch (Exception ex) {
System.out.println(ex.getMessage());
}
insert();
The problem is this it gives wrong number of integer variable
please can help me any on for this
Try adding some else if's instead of a million ifs. If a variable starts with one thing its not going to start with another. For example:
if(s.startsWith("short")) {
cshort++;
}
else if(s.startsWith("byte")) {
cbyte++;
}
That will also cut down on the compile time for your program. If you have access to the file you could make each variable on a separate line to make it easier to debug and read in.
Lets just look at the code that attempts to count int variables:
if( s.startsWith("int")) {
s1 = ";" ;
while(!(s1.equals(s2))){
Scanner fd = new Scanner(reader);
while(fd.hasNext()){
c = fd.next();
if(c.contains(","))
cint++;
else
cint++;
if(c.startsWith(";"))
break;
}
s2 = c ;
}
}
Why is that going to give incorrect counts?
You are treating every word that starts with "int" as an int keyword. But it isn't. What about internal or international ...
What about int in comments? Comments should be ignored.
What about int[] myarray;? That's not int variable. It is a int[] variable.
What about return (int) someVariable; ? That is a typecast, not a declaration.
What about public int someMethod() { ... } ? The return type is not a variable.
What about public void method(int a, char b) { ... } ? That's declaring an int variable ... but your code will (I think) incorrectly count it as two int variables.
And so on.
Your approach would be best described as crude one-pass pattern matching, implemented directly as code. Basically, this approach to "analysing" source code is doomed to fail.
What you really need to do is to parse the Java source code properly using a parser that recognizes the Java grammar. You could:
write your own Java parser, or
make use of an existing Java parser, or
look for an existing Java grammar that is suitable for input to ANTLR or JavaCC or some other Java parser generator system (PGS).
Once you've done that, you could either walk an AST data structure emitted by the parser, or embed your "variable counting" code into the grammar in the PGS input file.
There is another approach (from the linked Q&A). Compile the source file, use Class.forName() to load it, and then use reflection to find Field objects for the static and instance variables. At a pinch, you could even count the parameters in the method signatures via the Method objects. But local variables are not exposed by reflection ...
I have made a class to extract subtrees using Tregex. I used some code snips from "TregexPattern.java", as i don't want to let the program use the console commands.
In general, having a tree for a sentence, I want to extract certain sub tree (no user interaction).
what I did so far is the following:
package edu.stanford.nlp.trees.tregex;
import edu.stanford.nlp.ling.StringLabelFactory;
import edu.stanford.nlp.trees.*;
import java.io.*;
import java.util.*;
public abstract class Test {
abstract TregexMatcher matcher(Tree root, Tree tree, Map<String, Tree> namesToNodes, VariableStrings variableStrings);
public TregexMatcher matcher(Tree t) {
return matcher(t, t, new HashMap<String, Tree>(), new VariableStrings());
}
public static void main(String[] args) throws ParseException, IOException {
String encoding = "UTF-8";
TregexPattern p = TregexPattern.compile("NP < NN & <<DT"); //"/^MWV/" or "NP < (NP=np < NNS)"
TreeReader r = new PennTreeReader(new StringReader("(VP (VP (VBZ Try) (NP (NP (DT this) (NN wine)) (CC and) (NP (DT these) (NNS snails)))) (PUNCT .))"), new LabeledScoredTreeFactory(new StringLabelFactory()));
Tree t = r.readTree();
treebank = new MemoryTreebank();
treebank.add(t);
TRegexTreeVisitor vis = new TRegexTreeVisitor(p, encoding);
**treebank.apply(vis); //line 26**
if (TRegexTreeVisitor.printMatches) {
System.out.println("There were " + vis.numMatches() + " matches in total.");
}
}
private static Treebank treebank; // used by main method, must be accessible
static class TRegexTreeVisitor implements TreeVisitor {
private static boolean printNumMatchesToStdOut = false;
static boolean printNonMatchingTrees = false;
static boolean printSubtreeCode = false;
static boolean printTree = false;
static boolean printWholeTree = false;
static boolean printMatches = true;
static boolean printFilename = false;
static boolean oneMatchPerRootNode = false;
static boolean reportTreeNumbers = false;
static TreePrint tp;
PrintWriter pw;
int treeNumber = 0;
TregexPattern p;
//String[] handles;
int numMatches;
TRegexTreeVisitor(TregexPattern p, String encoding) {
this.p = p;
//this.handles = handles;
try {
pw = new PrintWriter(new OutputStreamWriter(System.out, encoding), true);
} catch (UnsupportedEncodingException e) {
System.err.println("Error -- encoding " + encoding + " is unsupported. Using ASCII print writer instead.");
pw = new PrintWriter(System.out, true);
}
// tp.setPrintWriter(pw);
}
public void visitTree(Tree t) {
treeNumber++;
if (printTree) {
pw.print(treeNumber + ":");
pw.println("Next tree read:");
tp.printTree(t, pw);
}
TregexMatcher match = p.matcher(t);
if (printNonMatchingTrees) {
if (match.find()) {
numMatches++;
} else {
tp.printTree(t, pw);
}
return;
}
Tree lastMatchingRootNode = null;
while (match.find()) {
if (oneMatchPerRootNode) {
if (lastMatchingRootNode == match.getMatch()) {
continue;
} else {
lastMatchingRootNode = match.getMatch();
}
}
numMatches++;
if (printFilename && treebank instanceof DiskTreebank) {
DiskTreebank dtb = (DiskTreebank) treebank;
pw.print("# ");
pw.println(dtb.getCurrentFile());
}
if (printSubtreeCode) {
pw.println(treeNumber + ":" + match.getMatch().nodeNumber(t));
}
if (printMatches) {
if (reportTreeNumbers) {
pw.print(treeNumber + ": ");
}
if (printTree) {
pw.println("Found a full match:");
}
if (printWholeTree) {
tp.printTree(t, pw);
} else {
**tp.printTree(match.getMatch(), pw); //line 108**
}
// pw.println(); // TreePrint already puts a blank line in
} // end if (printMatches)
} // end while match.find()
} // end visitTree
public int numMatches() {
return numMatches;
}
} // end class TRegexTreeVisitor
}
but it give the following error:
Exception in thread "main" java.lang.NullPointerException
at edu.stanford.nlp.trees.tregex.Test$TRegexTreeVisitor.visitTree(Test.java:108)
at edu.stanford.nlp.trees.MemoryTreebank.apply(MemoryTreebank.java:376)
at edu.stanford.nlp.trees.tregex.Test.main(Test.java:26)
Java Result: 1
Any modifications or ideas?
NullPointerException is usually an indicator of bug in software.
I had the same task in the past. Sentence was parsed with dependency parser.
I decided to put resulting parse tree in XML(DOM) and perform XPath queries over it.
To enhance performance you don't need to put xml in String, just keep all XML structure as DOM in memory (e.g. http://www.ibm.com/developerworks/xml/library/x-domjava/).
Using XPath for querying tree-like data structure gave me the following benefits:
Load/Save/Transfer results of sentence parsing easily.
Robust syntax/capabilities of XPath.
Many people know XPath (everyone can customize your query).
XML and XPath are cross platform.
Plenty of stable implementations of XPath and XML/DOM libraries.
Ability to use XSLT.
Integration with existing XML-based pipeline XSLT+XPath -> XSD -> Do actions (e.g. users have specified their email address and action what to do with it somewhere inside of free-text complaint).