Extracting Sub-Trees Using Stanford Tregex

Extracting Sub-Trees Using Stanford Tregex - java

I have made a class to extract subtrees using Tregex. I used some code snips from "TregexPattern.java", as i don't want to let the program use the console commands.
In general, having a tree for a sentence, I want to extract certain sub tree (no user interaction).
what I did so far is the following:
package edu.stanford.nlp.trees.tregex;
import edu.stanford.nlp.ling.StringLabelFactory;
import edu.stanford.nlp.trees.*;
import java.io.*;
import java.util.*;
public abstract class Test {
abstract TregexMatcher matcher(Tree root, Tree tree, Map<String, Tree> namesToNodes, VariableStrings variableStrings);
public TregexMatcher matcher(Tree t) {
return matcher(t, t, new HashMap<String, Tree>(), new VariableStrings());
}
public static void main(String[] args) throws ParseException, IOException {
String encoding = "UTF-8";
TregexPattern p = TregexPattern.compile("NP < NN & <<DT"); //"/^MWV/" or "NP < (NP=np < NNS)"
TreeReader r = new PennTreeReader(new StringReader("(VP (VP (VBZ Try) (NP (NP (DT this) (NN wine)) (CC and) (NP (DT these) (NNS snails)))) (PUNCT .))"), new LabeledScoredTreeFactory(new StringLabelFactory()));
Tree t = r.readTree();
treebank = new MemoryTreebank();
treebank.add(t);
TRegexTreeVisitor vis = new TRegexTreeVisitor(p, encoding);
**treebank.apply(vis); //line 26**
if (TRegexTreeVisitor.printMatches) {
System.out.println("There were " + vis.numMatches() + " matches in total.");
}
}
private static Treebank treebank; // used by main method, must be accessible
static class TRegexTreeVisitor implements TreeVisitor {
private static boolean printNumMatchesToStdOut = false;
static boolean printNonMatchingTrees = false;
static boolean printSubtreeCode = false;
static boolean printTree = false;
static boolean printWholeTree = false;
static boolean printMatches = true;
static boolean printFilename = false;
static boolean oneMatchPerRootNode = false;
static boolean reportTreeNumbers = false;
static TreePrint tp;
PrintWriter pw;
int treeNumber = 0;
TregexPattern p;
//String[] handles;
int numMatches;
TRegexTreeVisitor(TregexPattern p, String encoding) {
this.p = p;
//this.handles = handles;
try {
pw = new PrintWriter(new OutputStreamWriter(System.out, encoding), true);
} catch (UnsupportedEncodingException e) {
System.err.println("Error -- encoding " + encoding + " is unsupported. Using ASCII print writer instead.");
pw = new PrintWriter(System.out, true);
}
// tp.setPrintWriter(pw);
}
public void visitTree(Tree t) {
treeNumber++;
if (printTree) {
pw.print(treeNumber + ":");
pw.println("Next tree read:");
tp.printTree(t, pw);
}
TregexMatcher match = p.matcher(t);
if (printNonMatchingTrees) {
if (match.find()) {
numMatches++;
} else {
tp.printTree(t, pw);
}
return;
}
Tree lastMatchingRootNode = null;
while (match.find()) {
if (oneMatchPerRootNode) {
if (lastMatchingRootNode == match.getMatch()) {
continue;
} else {
lastMatchingRootNode = match.getMatch();
}
}
numMatches++;
if (printFilename && treebank instanceof DiskTreebank) {
DiskTreebank dtb = (DiskTreebank) treebank;
pw.print("# ");
pw.println(dtb.getCurrentFile());
}
if (printSubtreeCode) {
pw.println(treeNumber + ":" + match.getMatch().nodeNumber(t));
}
if (printMatches) {
if (reportTreeNumbers) {
pw.print(treeNumber + ": ");
}
if (printTree) {
pw.println("Found a full match:");
}
if (printWholeTree) {
tp.printTree(t, pw);
} else {
**tp.printTree(match.getMatch(), pw); //line 108**
}
// pw.println(); // TreePrint already puts a blank line in
} // end if (printMatches)
} // end while match.find()
} // end visitTree
public int numMatches() {
return numMatches;
}
} // end class TRegexTreeVisitor
}
but it give the following error:
Exception in thread "main" java.lang.NullPointerException
at edu.stanford.nlp.trees.tregex.Test$TRegexTreeVisitor.visitTree(Test.java:108)
at edu.stanford.nlp.trees.MemoryTreebank.apply(MemoryTreebank.java:376)
at edu.stanford.nlp.trees.tregex.Test.main(Test.java:26)
Java Result: 1
Any modifications or ideas?

NullPointerException is usually an indicator of bug in software.
I had the same task in the past. Sentence was parsed with dependency parser.
I decided to put resulting parse tree in XML(DOM) and perform XPath queries over it.
To enhance performance you don't need to put xml in String, just keep all XML structure as DOM in memory (e.g. http://www.ibm.com/developerworks/xml/library/x-domjava/).
Using XPath for querying tree-like data structure gave me the following benefits:
Load/Save/Transfer results of sentence parsing easily.
Robust syntax/capabilities of XPath.
Many people know XPath (everyone can customize your query).
XML and XPath are cross platform.
Plenty of stable implementations of XPath and XML/DOM libraries.
Ability to use XSLT.
Integration with existing XML-based pipeline XSLT+XPath -> XSD -> Do actions (e.g. users have specified their email address and action what to do with it somewhere inside of free-text complaint).

Related

ObjectOutputStream.writeObject() freezing when trying to write object between classes

I am a beginner programmer so please excuse any technically incorrect statements/incorrect use of terminology.
I am trying to make a program that reduces CNF SAT in DIMACS format to 3SAT, and then from 3SAT to 3Graph Coloring, and then 3Graph coloring to SAT again. The idea is to make it circular so that the output from one reduction can be piped straight into the input of another, AKA if you reduce a CNF to 3SAT, the program should automatically reduce the 3SAT to Graph coloring after if the use specifies it to.
I have chosen to represent CNFs in a LinkedHashMap in a class called CNFHandler. The LinkedHashMap is where File = the DIMACS cnf formatted file and the CNF object is the CNF (which contains an ArrayList of Clause objects) that corresponds to the CNF.
In my CNFHandler class, I have a reduce object, and it's in this object that I am trying to initiate my piping functionality:
package CNFHandler;
import SAT_to_3SAT_Reducer.Reducer;
import java.io.*;
import java.util.LinkedHashMap;
import java.util.Map;
import java.util.Optional;
public class CNFHandler {
private Map<File, CNF> allCNFs = new LinkedHashMap<>();
private CNFReader reader;
private Reducer reducer = new Reducer();
// PIPES
private Optional<ObjectInputStream> inputPipe;
private Optional<ObjectOutputStream> outputPipe;
// Instantiate Pipes
public void setInputPipe(ObjectInputStream inputStream) {
this.inputPipe = Optional.of(inputStream);
}
public void setOutputPipe(ObjectOutputStream outputStream) {
this.outputPipe = Optional.of(outputStream);
}
//...
// Skipping lines for brevity
//...
public void reduce(String filePath) {
File path = new File(filePath);
addCNF(filePath);
CNF result = reducer.reduce(allCNFs.get(path));
if (!outputPipe.isPresent()) {
System.out.println(result.toDIMACS());
} else {
try {
outputPipe.get().writeObject(result);
outputPipe.get().close();
} catch (Exception e){
e.printStackTrace();
}
}
}
}
When I try to run "writeObject" (within the try block in the reduce() method) the program doesn't seem to go past that point. I've tried using breakpoints in IntelliJ to see what's going on, but the best I could figure out was as follows:
A Native method called waitForReferencePendingList() seems to be stuck waiting for something, and that's why it won't go past the writeObject method
IntelliJ tells me "Connected to the target VM, address: '127.0.0.1:51236', transport: 'socket'" but I'm not sure why because I'm not using Sockets anywhere in my program
Here is the code for my Main method where I instantiate the ObjectOutputStreams :
import CNFHandler.CNFHandler;
import GraphHandler.GraphHandler;
import java.io.*;
public class Main {
public static void main(String[] args) {
try {
String inFile = "short_cnf.cnf";
PipedOutputStream _S_3S_OUT_PIPE_STREAM = new PipedOutputStream();
PipedInputStream _S_3S_IN_PIPE_STREAM = new PipedInputStream();
_S_3S_IN_PIPE_STREAM.connect(_S_3S_OUT_PIPE_STREAM);
ObjectOutputStream _S_3S_OUT_OBJECT_STREAM = new ObjectOutputStream(_S_3S_OUT_PIPE_STREAM);
ObjectInputStream _S_3S_IN_OBJEECT_STREAM = new ObjectInputStream(_S_3S_IN_PIPE_STREAM);
CNFHandler handler = new CNFHandler();
handler.setOutputPipe(_S_3S_OUT_OBJECT_STREAM);
handler.reduce(inFile);
PipedOutputStream _3S_G_OUT = new PipedOutputStream();
PipedInputStream _3S_G_IN = new PipedInputStream();
_3S_G_IN.connect(_3S_G_OUT);
ObjectOutputStream _3S_G_OUT_STREAM = new ObjectOutputStream(_3S_G_OUT);
ObjectInputStream _3S_G_IN_STREAM = new ObjectInputStream(_3S_G_IN);
GraphHandler graphHandler = new GraphHandler();
graphHandler.setInputPipe(_S_3S_IN_OBJEECT_STREAM);
graphHandler.setOutputPipe(_3S_G_OUT_STREAM);
graphHandler.reduce();
} catch (IOException e) {
e.printStackTrace();
}
}
}
The other weird thing is that the writeObject() method seems to work if I use a different kind of object, for example, if I instantiate a String within the writeObject() method in the same place it's being called in reduce(), or if I instantiate a new CNF object in the same place, it WILL write the object. But I can't do it this way because I have to pass along the values of the object as well (the clauses, etc.) so I don't know what to do.
This is my CNF class, in brief:
package CNFHandler;
import java.io.Serializable;
import java.util.*;
public class CNF implements Serializable {
protected int numVars;
protected int numClauses;
protected String fileName;
// store all variables with no duplicates
protected Set<String> allLiterals = new HashSet<>();
protected ArrayList<Clause> clauses = new ArrayList<>();
/*
for printing to DIMACS: keep track of the max # of
literals that are needed to print a clause
for example if all clauses in the CNF file contain
2 literals, and only one contains 3 literals
then the literalsize will be 3 to ensure things
are printed with proper spacing
*/
protected int literalSize = -20;
/*
keep track of the label referring to the highest #ed literal
just in case they are not stored in order -- this way when we perform
reductions we can just add literals to the end and be sure we are not
duplicating any
*/
protected int highestLiteral = -10;
public CNF(String fileName) {
this.fileName = fileName;
}
protected void addClause(String[] inputs) {
try {
Clause clauseToAdd = new Clause();
// add literals to the hashset, excluding dashes that indicate negative literals
for (int i = 0; i < inputs.length - 1; i++) {
// removing whitespace from the input
String toAdd = inputs[i].replaceAll("\\s+", "");;
// in case the variable is false (has a dash before the int), remove the dash to standardize storage
String moddedToAdd = inputs[i].replaceAll("[-]*", "");
/*
if an unknown variable is in the stream, reject it.
(we're basically checking here if the variable set is full,
and if it is and the variable we're trying to add is new,
then it can't be added)
*/
if ((!allLiterals.contains(moddedToAdd)) && (allLiterals.size() == numVars) && (moddedToAdd.trim().length() > 0)) {
throw new FailedCNFException();
}
// add the original input (so not the regex'd one but the one that would be false if it had been input as false
clauseToAdd.addLiteral(toAdd);
if (!allLiterals.contains(moddedToAdd) && !moddedToAdd.equalsIgnoreCase("")) {
allLiterals.add(moddedToAdd);
/*
change the highestLiteral value if the literal being added is "bigger" than the others that have been seen
*/
if(highestLiteral < Integer.parseInt(moddedToAdd)) {
highestLiteral = Integer.parseInt(moddedToAdd);
}
}
}
if (clauseToAdd.getNumberOfLiterals() > literalSize) {
literalSize = clauseToAdd.getNumberOfLiterals();
}
clauses.add(clauseToAdd);
} catch (FailedCNFException e) {
System.out.println("The number of variables that have been introduced is too many!");
}
}
public void makeClause(String[] inputs) {
try {
if (inputs[inputs.length - 1].equals("0")) {
addClause(inputs);
} else throw new FailedCNFException();
} catch (FailedCNFException f) {
System.out.println("There is no 0 at the end of this line: ");
for (String s : inputs ) {
System.out.print(s + " ");
}
System.out.println();
}
}
public void initializeClauses (String[] inputs) {
setNumVars(inputs[2]);
setNumClauses(inputs[3]);
}
public String toDIMACS () {
String toReturn = "p cnf " + getNumVars() + " " + getNumClauses() + "\n";
for(int i = 0; i < clauses.size()-1; i++){
Clause c = clauses.get(i);
toReturn += c.toDIMACS(literalSize) + "\n";
}
toReturn += clauses.get(clauses.size()-1).toDIMACS(literalSize);
return toReturn;
}
/*
Override tostring method to print clauses in human-readable format
*/
#Override
public String toString () {
if(highestLiteral != -10) {
String toReturn = "(";
for (int i = 0; i < clauses.size() - 1; i++) {
Clause c = clauses.get(i);
toReturn += c + "&&";
}
toReturn += clauses.get(clauses.size() - 1).toString() + ")";
return toReturn;
} else {
return "Add some clauses!";
}
}
public String toString (boolean addFile) {
String toReturn = "";
if (addFile) {
toReturn += "src/test/ExampleCNFs/" + fileName + ".cnf: \n";
}
toReturn += "(";
for(int i = 0; i < clauses.size()-1; i++){
Clause c = clauses.get(i);
toReturn += c + "&&";
}
toReturn += clauses.get(clauses.size()-1).toString() + ")";
return toReturn;
}
//=============================================================================
// HELPER FUNCTIONS
//=============================================================================
public void setNumVars(String vars) {
numVars = Integer.parseInt(vars);
}
public void setNumClauses(String clauses) {
numClauses = Integer.parseInt(clauses);
}
public Clause getClause(int index) {
return clauses.get(index);
}
public void addLiteral(int newLiteral) {
allLiterals.add(String.valueOf(newLiteral));
}
public void addLiterals(Set<String> newLiterals) {
allLiterals.addAll(newLiterals);
}
public void addClauses(ArrayList<Clause> toAdd, int maxLiterals) {
clauses.addAll(toAdd);
numClauses += toAdd.size();
// update literalsize if need be
if (maxLiterals > literalSize) {
literalSize = maxLiterals;
}
}
//=============================================================================
// GETTERS AND SETTERS
//=============================================================================
public void setNumVars(int numVars) {
this.numVars = numVars;
}
public void setNumClauses(int numClauses) {
this.numClauses = numClauses;
}
public int getNumVars() {
return numVars;
}
public int getNumClauses() {
return numClauses;
}
public ArrayList<Clause> getClauses() {
return clauses;
}
public Set<String> getAllLiterals() {
return allLiterals;
}
//
// LITERAL SIZE REPRESENTS THE MAXIMUM NUMBER OF LITERALS A CLAUSE CAN CONTAIN
//
public int getLiteralSize() {
return literalSize;
}
public void setLiteralSize(int literalSize) {
this.literalSize = literalSize;
}
public String getFilePath() {
return "src/test/ExampleCNFs/" + fileName + ".cnf";
}
public String getFileName() {
return fileName;
}
public void setFileName(String fileName) {
this.fileName = fileName;
}
//
// HIGHEST LITERAL REPRESENTS THE HIGHEST NUMBER USED TO REPRESENT A LITERAL
// IN THE DIMACS CNF FORMAT
//
public int getHighestLiteral() {
return highestLiteral;
}
public void setHighestLiteral(int highestLiteral) {
this.highestLiteral = highestLiteral;
}
public void setHighestLiteral(String highestLiteral) {
this.highestLiteral = Integer.parseInt(highestLiteral);
}
}
Can someone give me some insight as to what's going on here, please? Thank you very much.

First of all, neither of the symptoms are actually relevant to your question:
A Native method called waitForReferencePendingList() seems to be stuck waiting for something.
You appear to have found an internal thread that is dealing with the processing of Reference objects following a garbage collection. It is normal for it to be waiting there.
IntelliJ tells me "Connected to the target VM, address: '127.0.0.1:51236', transport: 'socket'"
That is Intellij saying that it has connected to the debug agent in the JVM that is running your application. Again, this is normal.
If you are trying to find the cause of a problem via a debugger, you need to find the application thread that is stuck. Then drill down to the point where it is actually stuck and look at the corresponding source code to figure out what it is doing. In this case, you need to look at the standard Java SE library source code for your platform. Randomly looking for clues rarely works ...
Now to your actual problem.
Without a stacktrace or a minimal reproducible example, it is not possible to say with certainty what is happening.
However, I suspect that writeObject is simply stuck waiting for something to read from the "other end" of the pipeline. It looks like you have set up a PipedInputStream / PipedOutputStream pair. This has only a limited amount of buffering. If the "writer" writes too much to the output stream, it will block until the "reader" has read some data from the input stream.
The other weird thing is that the writeObject() method seems to work if I use a different kind of object ...
The other kind of object probably has a smaller serialization which fits into the available buffer space.

Unit Testing for JMenuItem

I am new to JUnit testing. I am trying to test a method that exports a report. Basically, this method pops up a save menu to select where to save the file at and also gets the report from another class. I am not sure what I need to test here or how to even test it. I have added my JMenuItem and my actionEvent as well. Any ideas or help would be greatly accepted.
Here is my JMenuItem:
JMenuItem jMenuFileexportProjectReport = new JMenuItem(exportProjectReportAction);
Here is my Action event for the JMenuItem:
public Action exportProjectReportAction =
new AbstractAction(Local.getString("Export Project Report")) {
public void actionPerformed(ActionEvent e) {
reportExportAction(e);
}
};
Here is my method to export the report:
public void reportExportAction(ActionEvent e) {
JFileChooser chooser = new JFileChooser();
chooser.setFileHidingEnabled(false);
chooser.setDialogTitle(Local.getString("Export Project Report"));
chooser.setAcceptAllFileFilterUsed(false);
chooser.setFileSelectionMode(JFileChooser.FILES_AND_DIRECTORIES);
chooser.addChoosableFileFilter(
new AllFilesFilter(AllFilesFilter.XHTML));
chooser.addChoosableFileFilter(new AllFilesFilter(AllFilesFilter.HTML));
String lastSel = (String) Context.get("LAST_SELECTED_EXPORT_FILE");
if (lastSel != null) {
chooser.setCurrentDirectory(new File(lastSel));
}
ProjectExportDialog dlg =
new ProjectExportDialog(
App.getFrame(),
Local.getString("Export Project Report"),
chooser);
String enc = (String) Context.get("EXPORT_FILE_ENCODING");
if (enc != null) {
dlg.encCB.setSelectedItem(enc);
}
Dimension dlgSize = new Dimension(550, 500);
dlg.setSize(dlgSize);
Dimension frmSize = App.getFrame().getSize();
Point loc = App.getFrame().getLocation();
dlg.setLocation(
(frmSize.width - dlgSize.width) / 2 + loc.x,
(frmSize.height - dlgSize.height) / 2 + loc.y);
dlg.setVisible(true);
if (dlg.CANCELLED) {
return;
}
Context.put(
"LAST_SELECTED_EXPORT_FILE",
chooser.getSelectedFile().getPath());
int ei = dlg.encCB.getSelectedIndex();
enc = null;
if (ei == 1) {
enc = "UTF-8";
}
boolean nument = (ei == 2);
File f = chooser.getSelectedFile();
boolean xhtml =
chooser.getFileFilter().getDescription().indexOf("XHTML") > -1;
CurrentProject.save();
ReportExporter.export(CurrentProject.get(), chooser.getSelectedFile(), enc, xhtml,
nument);
}
Class the creates a HTML report:
public class ReportExporter {
static boolean _chunked = false;
static boolean _num = false;
static boolean _xhtml = false;
static boolean _copyImages = false;
static File output = null;
static String _charset = null;
static boolean _titlesAsHeaders = false;
static boolean _navigation = false;
static String charsetString = "\n";
public static void export(Project prj, File f, String charset, boolean xhtml, boolean chunked) {
_chunked = chunked;
_charset = charset;
_xhtml = xhtml;
if (f.isDirectory()) {
output = new File(f.getPath() + "/Project Report.html");
}
else {
output = f;
}
NoteList nl = CurrentStorage.get().openNoteList(prj);
Vector notes = (Vector) nl.getAllNotes();
//Creates Labels for the HTML output for each section.
String notesLabelHTML = "Notes";
String tasksLabelHTML = "Tasks";
String eventsLabHTML = "Events";
//NotesVectorSorter.sort(notes);
Collections.sort(notes);
Writer fw;
if (output.getName().indexOf(".htm") == -1) {
String dir = output.getPath();
String ext = ".html";
String nfile = dir + ext;
output = new File(nfile);
}
try {
if (charset != null) {
fw = new OutputStreamWriter(new FileOutputStream(output),
charset);
charsetString = "<meta http-equiv=\"Content-Type\" content=\"text/html; charset="
+ charset + "\" />";
}
else
fw = new FileWriter(output);
}
catch (Exception ex) {
new ExceptionDialog(ex, "Failed to write to " + output, "");
return;
}
//Writes the title and the notes section of the HTMl Report
write(fw, "<html>\n<head>\n" + charsetString + "<title>"
+ prj.getTitle()
+ "</title>\n</head>\n<body>\n<h1 class=\"projecttitle\">"
+ prj.getTitle() + "</h1>\n" +"\n<br>\n"
+ "</title>\n</head>\n<body>\n<h2 class=\"projecttitle\">"
+ notesLabelHTML + "</h2>\n" );
generateChunks(fw, notes);
//Writes the Task section of the HTML Report
write(fw, "\n<hr></hr><a" +"</title>\n</head>\n<body>\n<h2 class=\"projecttitle\">" + "\n<br>\n"
+ tasksLabelHTML + "</h2>\n" );
//writes the Events section of the HTML Report
write(fw, "\n<hr></hr><a" +"</title>\n</head>\n<body>\n<h2 class=\"projecttitle\">" + "\n<br>\n"
+ eventsLabHTML + "</h2>\n" );
//Writes the ending of the report with the data and time
write(fw, "\n<hr></hr><a "
+ "\n<br></br>\n" + new Date().toString()
+ "\n</body>\n</html>");
try {
fw.flush();
fw.close();
}
catch (Exception ex) {
new ExceptionDialog(ex, "Failed to write to " + output, "");
}
}
public static String getNoteHTML(Note note) {
String text = "";
StringWriter sw = new StringWriter();
AltHTMLWriter writer = new AltHTMLWriter(sw,
(HTMLDocument) CurrentStorage.get().openNote(note), _charset,
_num);
try {
writer.write();
sw.flush();
sw.close();
}
catch (Exception ex) {
new ExceptionDialog(ex);
}
text = sw.toString();
if (_xhtml) {
text = HTMLFileExport.convertToXHTML(text);
}
text = Pattern
.compile("<body(.*?)>", java.util.regex.Pattern.DOTALL
+ java.util.regex.Pattern.CASE_INSENSITIVE).split(text)[1];
text = Pattern
.compile("</body>", java.util.regex.Pattern.DOTALL
+ java.util.regex.Pattern.CASE_INSENSITIVE).split(text)[0];
text = "<div class=\"note\">" + text + "</div>";
if (_titlesAsHeaders) {
text = "\n\n<div class=\"date\">"
+ note.getDate().getFullDateString()
+ ":</div>\n<h1 class=\"title\">" + note.getTitle()
+ "</h1>\n" + text;
}
return text;
}
private static String generateNav(Note prev, Note next) {
String s = "<hr></hr><div class=\"navigation\"><table border=\"0\" width=\"100%\" cellpadding=\"2\"><tr><td width=\"33%\">";
if (prev != null) {
s += "<div class=\"navitem\"><a href=\"" + prev.getId() + ".html\">"
+ Local.getString("Previous") + "</a><br></br>"
+ prev.getDate().getMediumDateString() + " "
+ prev.getTitle() + "</div>";
}
else {
s += " ";
s += "</td><td width=\"34%\" align=\"center\"><a href=\""
+ output.getName()
+ "\">Up</a></td><td width=\"33%\" align=\"right\">";
}
if (next != null) {
s += "<div class=\"navitem\"><a href=\"" + next.getId() + ".html\">"
+ Local.getString("Next") + "</a><br></br>"
+ next.getDate().getMediumDateString() + " "
+ next.getTitle() + "</div>";
}
else {
s += " ";
}
s += "</td></tr></table></div>\n";
return s;
}
private static void generateChunks(Writer w, Vector notes) {
Object[] n = notes.toArray();
for (int i = 0; i < n.length; i++) {
Note note = (Note) n[i];
CalendarDate d = note.getDate();
if (_chunked) {
File f = new File(output.getParentFile().getPath() + "/"
+ note.getId()
+ ".html");
Writer fw = null;
try {
if (_charset != null) {
fw = new OutputStreamWriter(new FileOutputStream(f),
_charset);
}
else {
fw = new FileWriter(f);
}
String s = "<html>\n<head>\n"+charsetString+"<title>" + note.getTitle()
+ "</title>\n</head>\n<body>\n" + getNoteHTML(note);
if (_navigation) {
Note nprev = null;
if (i > 0) {
nprev = (Note) n[i - 1];
}
Note nnext = null;
if (i < n.length - 1) {
nnext = (Note) n[i + 1];
}
s += generateNav(nprev, nnext);
}
s += "\n</body>\n</html>";
fw.write(s);
fw.flush();
fw.close();
}
catch (Exception ex) {
new ExceptionDialog(ex, "Failed to write to " + output, "");
}
}
else {
write(w, "<a name=\"" + "\">" + note.getDate() +"</a>\n" + getNoteHTML(note) + "</a>\n");
}
}
}
private static void write(Writer w, String s) {
try {
w.write(s);
}
catch (Exception ex) {
new ExceptionDialog(ex, "Failed to write to " + output, "");
}
}

As it has already been mentioned, basically you should move out the application logic from the GUI code.
Unit testing usually doesn't work like write working code then add tests to it. First you should think about
what is the testable part of the logic and what isn't
how will you separate them
how will you test the testable part
Directly testing GUI code usually does not work (except with some rare and well-designed UI frameworks), since your test would have to deal with a lot of technical problems (like framework initialization, instantiation of UI objects, triggering events properly, etc). So you should create a more abstract layer of the app. In a bigger picture, it will lead to the well-known Model-View-Controller pattern (de-facto standard design pattern for user interfaces since Smalltalk), where the model layer is independent from the UI framework, therefore it is simply testable.
So as some kind of case study, lets walk through the above concepts:
First lets check what is testable, what isn't and what stops you from writing tests:
chooser.setDialogTitle(Local.getString("Export Project Report"));
This single line deals with both rendering and i18n, not really a good idea. Furthermore, the use of static methods is a top blocker antipattern for writing tests.
String lastSel = (String) Context.get("LAST_SELECTED_EXPORT_FILE");
Static method call here again, also it would be hard to determine the responsibility of that Context class.
Local.getString("Export Project Report")
also a static call, kind of duplicate code too
...and so on (static calls everywhere). Now lets see what kind of more abstract model we can create for this. First lets start with
some textual description of the requirements:
There is a title (used both in the JFileChooser and the ProjectExportDialog) to be internationalized by key "Export Project Report"
there is a previously selected directory (lastSel) which value we take granted at the beginning
the encoding (enc) is similar to enc, nullable value too
the dialog location (positioning) contains some arithmetic, we should test it
if the user selects a file, then we should store it as the last selected directory
there is an opened (current) project too, which we will save at the end
there is something you call "nument", I don't understand what it is, but it should be true if the user selects the 2nd entry from dlg.encCB
Untestable parts:
chooser... calls: UI-specific configuration of the JFileChooser, also it doesn't contain any control structures or calculation, so we won't test it
Now we are going to design a testable model class. While doing it, we will keep two principles in mind:
we are going to put as much logic into the model as we can
we don't want to rewrite your entire app for this time. So instead of getting rid of all the static calls, we abstract them away as simply as possible.
So now lets create some sort of abstract model for this (summary after code):
public class ProjectExportModel {
// see the reasoning below
public static ProjectExportModel create() {
return new ProjectExportModel(Local::getString,
(String) Context.get("LAST_SELECTED_EXPORT_FILE"),
Context::put);
}
private final Function<String, String> i18n;
private final File lastSelectedExportFile;
private final Consumer<File> lastSelectedFileSaver;
private String encoding;
private boolean nument;
private boolean xhtml;
public ProjectExportModel(final Function<String, String> i18n, final File lastSelectedExportFile,
final Consumer<File> lastSelectedFileSaver) {
this.i18n = i18n;
this.lastSelectedExportFile = lastSelectedExportFile;
this.lastSelectedFileSaver = lastSelectedFileSaver;
}
/**
* Called after a file has been selected from the JFileChooser
*
* Things to test:
* - lastSelectedFileSaver.accept(file.getPath()) should be called - you may use a
* mocking library to test
* - the xhtml flag should be changed - testing is easy
*
*/
public void fileSelected(final File file) {
// TODO
}
/**
* At this point we break a bit the concept of the UI-independent model layer, since Point and Dimension
* are UI-framework-related classes. But these 2 classes are easy to instantiate and easy to assert on the
* returned value, so good-enough solution this time.
*/
public Point getDialogLocation(final Dimension frameSize, final Point frameLocation) {
return null; // TODO implement the positioning
}
public String getFrameTitle() {
// TODO test if it calls and returns i18n.get("Export Project Model") - you need mocking here too
return null;
}
/**
* Two things to be tested here:
* - if CurrentProject.save() is called
* - if ReportExporter.export(...) is called with the right parameters
*
* You are quite stuck here, since static methods cannot be mocked. Instead it would be better to change your APIs to make
* these instance methods, since in the current way it is untestable. After changing these to instance methods, you should add
* 2 more parameters to the constructor: a Project instance and a ReportExporter instance.
* You can use mockito or easymock for mocking.
*/
public void save() {
}
/**
* You may call it from the view layer after calling fileSelected().
*
* To be tested:
* - the proper change of the encoding member
* - the proper change of the nument member
*/
public void selectedEncodingChanged(final int selectedIndex) {
// TODO implemenent the change of encoding and nument member
}
}
Summary:
this class is easy to instantiate and test
in the tests you will use its explicit constructor to create instances
for "production" usage, you will have to create a View class, which handles the swing-related code, accepts a ProjectExportModel instance as its parameter, and calls it methods, so you wire the tested model into the untestable UI-related code, while keeping the latter one minimal. Also, in this case you will create the model instance with ProjectExportModel.create() , since that methods wires the further dependencies in a way that it will more or less nicely interact with the other static methods of your app. This is a good technique for extracting testable parts while you don't necessarily have to remove all static methods from the app, we have just separated them away.

One whole exception would be splitted into 2 maps while using hadoop to catch exceptions from raw logs

I want to use hadoop to fetch and parse exceptions from raw logs.
I encounter a problem that some exceptions (spanning multiple lines) will be part of 2 different splits, and thus 2 different mappers.
I have an idea to avoid this problem. I could override the getSplits() method to make every split have a little redundant data. I think this solution will come a too high a cost for me.
So does anyone have a better solution for this problem?

I would go for a preprocessing job where you tag the exceptions with XML tags. Next you can use XMLInputformat to process the files. (this is only the start to a solution, based on your feedback we might make things more concrete)
This link provides a tutorial to write your own XMLinputformat, which you can customize to look for 'exception' characteristics. The main point of this tutorial is this sentence:
In the event that a record spans a InputSplit boundary, the record
reader will take care of this so we will not have to worry about this.
I will copy paste the information of the website, since it might go offline in the future, which could be very frustrating for people reviewing this in the future:
The inputformat:
package org.undercloud.mapreduce.example3;
import java.io.IOException;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapred.*;
public class XmlInputFormat extends FileInputFormat {
public RecordReader getRecordReader(InputSplit input, JobConf job, Reporter reporter)
throws IOException {
reporter.setStatus(input.toString());
return new XmlRecordReader(job, (FileSplit)input);
}
The record reader:
NOTE: The logic for reading past the end of the split is in readUntilMatch function which reads past the end of the split if the there is an open tag. This is really what you are looking for I think!
package org.undercloud.mapreduce.example3;
import java.io.IOException;
import org.apache.hadoop.fs.FSDataInputStream;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapred.*;
public class XmlRecordReader implements RecordReader {
private String startTagS = "";
private String endTagS = "";
private byte[] startTag;
private byte[] endTag;
private long start;
private long end;
private FSDataInputStream fsin;
private DataOutputBuffer buffer = new DataOutputBuffer();
private LineRecordReader lineReader;
private LongWritable lineKey;
private Text lineValue;
public XmlRecordReader(JobConf job, FileSplit split) throws IOException {
lineReader = new LineRecordReader(job, split);
lineKey = lineReader.createKey();
lineValue = lineReader.createValue();
startTag = startTagS.getBytes();
endTag = endTagS.getBytes();
// Open the file and seek to the start of the split
start = split.getStart();
end = start + split.getLength();
Path file = split.getPath();
FileSystem fs = file.getFileSystem(job);
fsin = fs.open(split.getPath());
fsin.seek(start);
}
public boolean next(Text key, XmlContent value) throws IOException {
// Get the next line
if (fsin.getPos() < end) {
if (readUntilMatch(startTag, false)) {
try {
buffer.write(startTag);
if (readUntilMatch(endTag, true)) {
key.set(Long.toString(fsin.getPos()));
value.bufferData = buffer.getData();
value.offsetData = 0;
value.lenghtData = buffer.getLength();
return true;
}
}
finally {
buffer.reset();
}
}
}
return false;
}
private boolean readUntilMatch(byte[] match, boolean withinBlock) throws IOException {
int i = 0;
while (true) {
int b = fsin.read(); // End of file -> T
if (b == -1) return false;
// F-> Save to buffer:
if (withinBlock) buffer.write(b);
if (b == match[i]) {
i++;
if (i >= match.length) return true;
} else i = 0;
// see if we’ve passed the stop point:
if(!withinBlock && i == 0 && fsin.getPos() >= end) return false;
}
}
public Text createKey() {
return new Text("");
}
public XmlContent createValue() {
return new XmlContent();
}
public long getPos() throws IOException {
return lineReader.getPos();
}
public void close() throws IOException {
lineReader.close();
}
public float getProgress() throws IOException {
return lineReader.getProgress();
}
}
And finally the writable:
package org.undercloud.mapreduce.example3;
import java.io.*;
import org.apache.hadoop.io.*;
public class XmlContent implements Writable{
public byte[] bufferData;
public int offsetData;
public int lenghtData;
public XmlContent(byte[] bufferData, int offsetData, int lenghtData) {
this.bufferData = bufferData;
this.offsetData = offsetData;
this.lenghtData = lenghtData;
}
public XmlContent(){
this(null,0,0);
}
public void write(DataOutput out) throws IOException {
out.write(bufferData);
out.writeInt(offsetData);
out.writeInt(lenghtData);
}
public void readFields(DataInput in) throws IOException {
in.readFully(bufferData);
offsetData = in.readInt();
lenghtData = in.readInt();
}
public String toString() {
return Integer.toString(offsetData) + ", "
+ Integer.toString(lenghtData) +", "
+ bufferData.toString();
}
}
This looks like a really useful tutorial, addressing the issue of records spanning multiple splits. Let me know if you are able to adapt this example to your problem.

The classes TextInputFormat and NLineInputFormat might be helpful. The TextInputFormat will split the file by line, so if the exception ends with a newline (and contains none within it) this should work. If the exceptions contain a fixed number of lines the NLineInputFormat class should be what you want as you can set the number of lines to take.
Unfortunately if the exception(s) can contain a variable number of newlines within them this won't work.
In that case I recommend looking for Mahout's XmlInputFormat. It crosses split boundaries, so will work for most stuff. just run a pre-processor to put the Exceptions inside of an <exception></exception> tag, and specify that as start/end tags.
Example pre-processor, using regex to identify exceptions
String input; //code this to the input string
String regex; //make this equal to the exception regex
BufferedWriter bw; //make this go to file where output will be stored
String toProcess = input;
boolean continueLoop = true;
while(continueLoop){
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(toProcess);
if(m.find()){
bw.write("<exception>"+toProcess.substring(m.start(),m.end())+"</exception>");
toProcess = toProcess.substring(m.end());
}else{
continueLoop = false;
}
}

Thanks for all your solution. I think it is useful for me
Especially notice the above comment
"In the event that a record spans a InputSplit boundary, the record
reader will take care of this so we will not have to worry about
this."
Then I look into the source code about how LineRecordReader to read the data form split. then I find actually the LineRecordReader has alreadly had some logic to read record spaning a InputSplit boundary cause line records in the bottom of the split always be splitted into 2 different split due to size limitation of block.
so I think what i need to do is to add the data size that LineRecordReader read spaning split boundary.
Now my solution is: override the Method "nextKeyValue()" in LineRecordReader.
public boolean nextKeyValue() throws IOException {
if (key == null) {
key = new LongWritable();
}
key.set(pos);
if (value == null) {
value = new Text();
}
int newSize = 0;
while (pos < end) {
newSize = in.readLine(value, maxLineLength,
Math.max((int)Math.min(Integer.MAX_VALUE, end-pos),
maxLineLength));
change the line“ while (pos < end) ” to “ while (pos < end + {param}) ”
the {param} means the size of redundant data which readRecorder read across split boundary.

recursion to iteration

I am working in a desktop application for windows using Java. In my application, there is a requirement to search all .php. To do this, I use recursive methods.
import java.io.File;
public class Copier {
public static void find(String source,String rep) {
File src = new File(rep);
if (src!= null && src.exists() && src.isDirectory()) {
String[] tab = src.list();
if (tab != null) {
for(String s : tab) {
File srcc = new File(rep+"\\"+s);
if (srcc.isFile()) {
if (srcc.getName().matches(".*"+source+"$")) {
System.out.println(s);
}
} else {
find(source,srcc.getAbsolutePath());
}
}
} else {
//System.out.println(" list is null");
}
}
}
public static void main(String[] args) {
try {
find(".java", "C:\\");
} catch (Exception e) {
e.printStackTrace();
}
}
}
Is it possible to do this with an iterative algorithm?

Of course. Use breadth-first-search with queue. You start from C:\ and at every step you pop the top folder from the queue and push all subfolders to the end of the queue.
Pseudocode follows:
queue.push("C:\");
while (!queue.empty()) {
String topFolder = queue.pop();
foreach (subFolder of topFolder) {
queue.push(subFolder);
}
}

I can't see why you want to get rid of recursion although theoretically what you are looking for is possible.
But a good way to get a faster program could be to use a filefilter when you list the children of a directory. One for directories and one for matching files (this one should use a java.util.regexp.Pattern).
-updated
You can find the doc for the overload of File.list to use here. And for the pattern, you could something like a local variable (outside your loop or a data member if you use recursion).
Pattern p = Pattern.compile( ".*"+source+".*" );
boolean found = p.matcher( p.matcher( srcc.getName() ).matches() );
Oh, and by the way, don't convert srcc into a file ! Work with strings and build as few objects as you can.

You can always use a queue in place of recursion. In this case, I think it makes the code look a little bit easier to read. Often you'll get better performance from an iterative implementation than a recursive one though in this case, they both run at nearly the same speed (at least on my machine).
public static List<String> find(final String source, final String directory)
{
List<String> results = new LinkedList<String>();
Stack<String> stack = new Stack<String>();
stack.add(directory);
String rep;
while (!stack.isEmpty()) {
rep = stack.pop();
File src = new File(rep);
if (src != null && src.exists() && src.isDirectory()) {
String[] tab = src.list();
if (tab != null) {
for (String s : tab) {
File srcc = new File(rep + File.separatorChar + s);
if (srcc.isFile()) {
if (srcc.getName().matches(".*" + source + "$")) {
// System.out.println(s);
results.add(s);
}
} else {
stack.add(srcc.getAbsolutePath());
}
}
} else {
// System.out.println(" list is null");
}
}
}
return results;
}

Regex Working on the test program but not on WebSprinx crwaler

Here is my code for Regex matching which worked for a webpage:
public class RegexTestHarness {
public static void main(String[] args) {
File aFile = new File("/home/darshan/Desktop/test.txt");
FileInputStream inFile = null;
try {
inFile = new FileInputStream(aFile);
} catch (FileNotFoundException e) {
e.printStackTrace(System.err);
System.exit(1);
}
BufferedInputStream in = new BufferedInputStream(inFile);
DataInputStream data = new DataInputStream(in);
String string = new String();
try {
while (data.read() != -1) {
string += data.readLine();
}
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
Pattern pattern = Pattern
.compile("<div class=\"rest_title\">.*?<h1>(.*?)</h1>");
Matcher matcher = pattern.matcher(string);
boolean found = false;
while (matcher.find()) {
System.out.println("Name: " + matcher.group(1) );
found = true;
}
if(!found){
System.out.println("Pattern Not found");
}
}
}
But the same code doesn't work on the crwaler code for which I'm testing the regex, my crawler code is:(I'm using Websphinx)
// Our own Crawler class extends the WebSphinx Crawler
public class MyCrawler extends Crawler {
MyCrawler() {
super(); // Do what the parent crawler would do
}
// We could choose not to visit a link based on certain circumstances
// For now we always visit the link
public boolean shouldVisit(Link l) {
// String host = l.getHost();
return false; // always visit a link
}
// What to do when we visit the page
public void visit(Page page) {
System.out.println("Visiting: " + page.getTitle());
String content = page.getContent();
System.out.println(content);
Pattern pattern = Pattern.compile("<div class=\"rest_title\">.*?<h1>(.*?)</h1>");
Matcher matcher = pattern.matcher(content);
boolean found = false;
while (matcher.find()) {
System.out.println("Name: " + matcher.group(1) );
found = true;
}
if(!found){
System.out.println("Pattern Not found");
}
}
}
This is my code for running the crawler:
public class WebSphinxTest {
public static void main(String[] args) throws MalformedURLException, InterruptedException {
System.out.println("Testing Websphinx. . .");
// Make an instance of own our crawler
Crawler crawler = new MyCrawler();
// Create a "Link" object and set it as the crawler's root
Link link = new Link("http://justeat.in/restaurant/spices/5633/indian-tandoor-chinese-and-seafood/sarjapur-road/bangalore");
crawler.setRoot(link);
// Start running the crawler!
System.out.println("Starting crawler. . .");
crawler.run(); // Blocking function, could implement a thread, etc.
}
}
A little detail about the crawler code. shouldvisit(Link link) filters whether to visit a link or not. visit(Page page) decides what to do when we get the page.
In the above example, test.txt and content contains the same String

In your RegexTestHarness you're reading in lines from a file and concatenating the lines without line breaks after which you do your matching (readLine() returns the contents of the line without the line breaks!).
So in the input of your MyCrawler class, there probably are line break characters in the input. And since the regex meta-char . by default does not match line break chars, it doesn't work in MyCrawler.
To fix this, append (?s) in from of all your patterns that contain a . meta char. So:
Pattern.compile("<div class=\"rest_title\">.*?<h1>(.*?)</h1>")
would become:
Pattern.compile("(?s)<div class=\"rest_title\">.*?<h1>(.*?)</h1>")
The DOT-ALL flag, (?s), will cause the . to match any character, including line break chars.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.