I have java9 source code and I need to extract fields of some classes using antlr4. This is my listener:
private static class FieldListener extends Java9BaseListener {
#Override
public void enterFieldDeclaration(Java9Parser.FieldDeclarationContext ctx) {
for (ParseTree subTree : ctx.children) {
System.out.println(subTree.getText());
}
String fieldName = ??????;
}
}
And this is the code
//Now, let's do some testing. First, we construct the lexer:
Java9Lexer java9Lexer = new Java9Lexer(CharStreams.fromString(classContent));
//Then, we instantiate the parser:
CommonTokenStream tokens = new CommonTokenStream(java9Lexer);
Java9Parser parser = new Java9Parser(tokens);
ParseTree tree = parser.compilationUnit();
//And then, the walker and the listener:
ParseTreeWalker walker = new ParseTreeWalker();
walker.walk(new FieldListener(), tree);
However, I can only iterate all field tokens but can't get specific token - class fieldName. Could anyone say how to get it?
There's no match bullet here. Static semantic analysis is not easy because you have to run around a tree. But it is straight forward. You're going to have to extract the names of the fields by a tree walk from the variableDeclaratorList. If you print out a parse tree for some input, you can see that a variableDeclaratorList contains a list of variableDeclarator, each variableDeclarator contains a variableDeclaratorId, each variableDeclaratorId containing an identifier with optional dims, and each identifier a subtree of tokens JavaLetter and more JavaLetterOrDigit. How should you compute the attributes of the parse tree? That's up to you. Usually, people compute the synthesized attributes with this Antlr listener framework. You can define a mapping from the node to a data structure containing the attributes, so then when you want the attributes for nodes lower in the tree, you can do a lookup (in this case, maybe resulting in a list of strings for the variableDeclaratorList?). You can access a particular node using the accessor function for that child so you don't have to compute attributes from immediate descendants. Look at the generated parser code. You can see what is available, e.g., FieldDeclarationContext.variableDeclaratorList(). Or, if you like, you could call a function to derive the attributes for this specific variableDeclaratorList rather than use the Antlr listener framework.
The Antlr Listener does not have a short circuit to the walk, so be aware that if you write your own function to walk the variableDeclaratorList, you might not want to use a listener ParseTreeWalker.walk() and instead use the ParseTreeVisitor.visit().
Related
I have a 3-level nested Java POJO that looks like this in the schema file:
struct FPathSegment {
originIata:ushort;
destinationIata:ushort;
}
table FPathConnection {
segments:[FPathSegment];
}
table FPath {
connections:[FPathConnection];
}
When I try to serialize a Java POJO to the Flatbuffer equivalent I pretty much get "nested serialzation is not allowed" error every time I try to use a common FlatBufferBuilder to build this entire object graph.
There is no clue in the docs to state if I have a single builder for the entire graph? A separate one for every table/struct? If separate, how do you import the child objects into the parent?
There are all these methods like create/start/add various vectors, but no explanation what builders go in there. Painfully complicated.
Here is my Java code where I attempt to serialize my Java POJO into Flatbuffers equivalent:
private FPath convert(Path path) {
FlatBufferBuilder bld = new FlatBufferBuilder(1024);
// build the Flatbuffer object
FPath.startFPath(bld);
FPath.startConnectionsVector(bld, path.getConnections().size());
for(Path.PathConnection connection : path.getConnections()) {
FPathConnection.startFPathConnection(bld);
for(Path.PathSegment segment : connection.getSegments()) {
FPathSegment.createFPathSegment(bld,
stringCache.getPointer(segment.getOriginIata()),
stringCache.getPointer(segment.getDestinationIata()));
}
FPathConnection.endFPathConnection(bld);
}
FPath.endFPath(bld);
return FPath.getRootAsFPath(bld.dataBuffer());
}
Every start() method throws a "FlatBuffers: object serialization must not be nested" exception, can't figure out what is the way to do this.
You use a single FlatBufferBuilder, but you must finish serializing children before starting the parents.
In your case, that requires you to move FPath.startFPath to the end, and FPath.startConnectionsVector to just before that. This means you need to store the offsets for each FPathConnection in a temp array.
This will make the nesting error go away.
The reason for this inconvenience is to allow the serialization process to proceed without any temporary data structures.
Say I have an object that I've created to further simplify reading an XML document using the DOM parser. In order to "step into" a node or element, I'd like to use a single line to go from the start of the document to my target data, buried somewhere within the document, while bypassing the extra "fluff" of the DOM parser (such as doc.getElementsByTagName("data").item(0) when there is only one item inside the "data" element).
For the sake of this question, let's just assume there are no duplicate element tags and I know where I need to navigate to to get the data I need from the document, of which the data is a simple string. The idea is to set the simplified reader up so that it can be used for other data in other locations in the document, as well, without having to write new methods all the time. Below is some example code I've tried:
public class SimplifiedReader {
Document doc;
Element ele;
public SimplifiedReader(Document doc) {
this.doc = doc;
ele = doc.getDocumentElement();
}
public SimplifiedReader fromRoot() {
ele = doc.getDocumentElement();
return this;
}
public SimplifiedReader withEle(String elementName) {
ele = ele.getElementsByTagName(elementName).item(0);
return this;
}
public String getTheData(String elementName) {
return ele.getTextContent();
}
}
Example XML File:
<?xml version="1.0" encoding="UTF-8"?>
<fileData>
<subData>
<targetData>Hello World!</targetData>
<otherData>FooBar!</otherData>
</subData>
</fileData>
This results in me being able to navigate the XML file, and retrieve the Strings "Hello World!" and "FooBar!" using this code:
SimplifiedReader sr = new SimplifiedReader(doc);
String hwData = sr.withEle("fileData").withEle("subData").getTheData("targetData");
String fbData = sr.getTheData("otherData");
Or, if I had to go to another thread to get the data "FooBar!", I would just do:
String fbData = sr.fromRoot().withEle("fileData2").withEle("subData2").getTheData("otherData");
Is there a better/more correct way to do this? Edit: Note: This question is more about the method of returning an object from a method inside of it (return this;) in order to reduce the amount of code written to access specific data stored within a tree format and not so much about how to read an XML file. (I originally thought this was the Singleton Pattern until William corrected me... thank you William).
Thanks in advance for any help.
I don't see any trace of the Singleton pattern here. It mostly resembles the Builder pattern, but isn't it, either. It just implements a fluent API.
Your approach seems very nice and practical.
I would perhaps advise not using fromRoot() but instead constructing a new instance each time. The instance is quite lightweight since all the heavyweight stuff resides in the Document instance it wraps.
You could even go immutable all the way, returning a new instance from withEle(). This buys you many cool properties, like the freedom to share the object around, each code path being free to use it as a starting point to fetch something specific relative to it, share it across threads, etc. The underlying Document is mutable, but usually this doesn't create real-life problems when the code is all about reading.
Is there a better/more correct way to do this?
Yes, there are many better ways to extract values from XML.
One would be to use XPath, for example with XMLBeam.
import java.io.IOException;
import org.xmlbeam.XBProjector;
import org.xmlbeam.annotation.XBDocURL;
import org.xmlbeam.annotation.XBRead;
public class App {
public static void main(String[] args) throws IOException {
FileDate fileDate = new XBProjector().io().fromURLAnnotation(FileDate.class);
System.out.println(fileDate.getTargetDate());
// Hello World!
System.out.println(fileDate.getOtherDate());
// FooBar!
}
#XBDocURL("resource://filedate.xml")
public interface FileDate {
#XBRead("/fileData/subData/targetData")
String getTargetDate();
#XBRead("/fileData/subData/otherData")
String getOtherDate();
}
}
I'm trying to read in a csv in the hdfs, parse it with cascading, and then use the resulting tuple stream to form the basis of regex expressions in another tuple stream using RegexParser. As far as I can tell, the only way to do this would be to write a custom Function of my own, and I was wondering if anybody knew how to use the Java API to do this instead.
Pointers on how to write my own function to do this inside the cascading framework would be welcome, too.
I'm running Cascading 2.5.1
The best resource for this question is the Palo Alto cascading example tutorial. It's in java and provides examples of a lot of use cases, including writing custom functions.
https://github.com/Cascading/CoPA/wiki
And yes, writing a function that allows an input regex that references other argument inputs is your best option.
public class SampleFunction extends BaseOperation implements Function
{
public void operate( FlowProcess flowProcess, FunctionCall functionCall )
{
TupleEntry argument = functionCall.getArguments();
String regex = argument.getString( 0 );
String argument = argument.getString( 1 );
String parsed = someRegexOperation();
Tuple result = new Tuple();
result.add( parsed );
functionCall.getOutputCollector().add( result );
}
}
Okay, to clarify, I have an XML/RDF file that describes data with a natural categorical tree structure (like folders and files). The data is not structured in a tree, rather, there is information that explains how to rebuild the tree (namely the nested set values of each node). I am starting with no knowledge other than the assumption that some statement in the file has a RootTree property who's object is the URI of the statement describing the root node of the tree.
Obtaining that object is easy, I simply use:
// Obtain the node describing the root of the Pearltree.
mRootProp = mModel.createProperty(Pearltree.RDF.PearlTreeNS, "rootTree");
NodeIterator roots = mModel.listObjectsOfProperty(mRootProp);
Now, I am further able to list all statements which have the property pt:parentTree and the object roots.nextNode():
StmtIterator sit = mModel.listStatements(null, RDF.ParentTree, rootNode);
This gives me a list of all such statements. These statements are part of elements that look like such in the RDF/XML file (note these have a different parentTree value but appear in the same context):
<pt:RootPearl rdf:about="http://www.pearltrees.com/dcow/pearltrees-videos/id5296268#rootPearl">
<dcterms:title><![CDATA[Pearltrees videos]]></dcterms:title>
<pt:parentTree rdf:resource="http://www.pearltrees.com/dcow/pearltrees-videos/id5296268" />
<pt:inTreeSinceDate>2012-06-11T20:25:55</pt:inTreeSinceDate>
<pt:leftPos>1</pt:leftPos>
<pt:rightPos>8</pt:rightPos>
</pt:RootPearl>
<pt:PagePearl rdf:about="http://www.pearltrees.com/dcow/pearltrees-videos/id5296268#pearl46838293">
<dcterms:title><![CDATA[why Pearltrees?]]></dcterms:title>
<dcterms:identifier>http://www.youtube.com/watch?v%3di4rDqMMFx8g</dcterms:identifier>
<pt:parentTree rdf:resource="http://www.pearltrees.com/dcow/pearltrees-videos/id5296268" />
<pt:inTreeSinceDate>2012-06-11T20:25:55</pt:inTreeSinceDate>
<pt:leftPos>2</pt:leftPos>
<pt:rightPos>3</pt:rightPos>
</pt:PagePearl>
...
Now, what I would like to do is obtain a reference to all statements with subject sit.nextStatement()'s subject. In this example:
"http://www.pearltrees.com/dcow/pearltrees-videos/id5296268#rootPearl"
and
"http://www.pearltrees.com/dcow/pearltrees-videos/id5296268#pearl46838293"
My goal is to obtain the content of each element including its rightPos and leftPos so I can reconstruct the tree.
You can simplify your code somewhat as follows:
mRootProp = mModel.createProperty(Pearltree.RDF.PearlTreeNS, "rootTree");
Resource root = mModel.listResourcesWithProperty( mRootProp ).next();
This assumes you know you have exactly one root per model. If that might not be true, modify the code accordingly.
The method:
getSubject()
of a Statement will return the Subject as a Resource. You can then use the
getProperty(Property p)
method of the returned Resource to obtain the Statements that include the property in question.
So, in my case, I use:
Resource r;
Statement title, id, lpos, rpos;
while(sit.hasNext()) {
r = sit.nextStatement().getSubject();
title = r.getProperty(DCTerms.title);
id = r.getProperty(DCTerms.identifier);
lpos = r.getProperty(PearlTree.RDF.leftPos);
rpos = r.getProperty(PearlTree.RDF.rightPos);
...
}
I'm trying to use Antlr for some text IDE-like functions -- specifically parsing a file to identify the points for code folding, and for applying syntax highlighting.
First question - is Antlr suitable for this requirement, or is it overkill? This could be achieved using regex and/or a hand-rolled parser ... but it seems that Antlr is out there to do this work for me.
I've had a look through the ... and the excellent tutorial resource here.
I've managed to get a Java grammar built (using the standard grammar), and get everything parsed neatly into a tree. However, I'd have expected to see elements nested within the tree. In actual fact, everything is a child of the very top element.
Eg. Given:
package com.example
public class Foo {
String myString = "Hello World"
// etc
}
I'd have expected the tree node for Foo to be a child of the node for the package declaration. Likewise, myString would be a child of Foo.
Instead, I'm finding that Foo and myString (and everything else for that matter) are all children of package.
Here's the relevant excerpt doing the parsing:
public void init() throws Exception {
CharStream c = new ANTLRFileStream(
"src/com/inversion/parser/antlr/Test.code");
Lexer lexer = new JavaLexer(c);
CommonTokenStream tokens = new CommonTokenStream(lexer);
JavaParser parser = new JavaParser(tokens);
parser.setTreeAdaptor(adaptor);
compilationUnit_return result = parser.compilationUnit();
}
static final TreeAdaptor adaptor = new CommonTreeAdaptor() {
public Object create(Token payload) {
if (payload != null)
{
System.out.println("Create " + JavaParser.tokenNames[payload.getType()] + ": L" + payload.getLine() + ":C" + payload.getCharPositionInLine() + " " + payload.getText());
}
return new CommonTree(payload);
}
};
Examining result.getTree() returns a CommonTree instance, whose children are the result of the parsing.
Expected value (perhaps incorrectly)
package com.example (4 tokens)
|
+-- public class Foo (3 tokens)
|
+--- String myString = "Hello World" (4 tokens)
+--- Comment "// etc"
(or something similar)
Actual value (All values are children of the root node of result.getTree() )
package
com
.
example
public
class
Foo
String
myString
=
"Hello World"
Is my understanding of how this should be working correct?
I'm a complete noob at Antlr so far, and I'm finding the learning curve quite steep.
The Java-6 grammar at the top of the file sharing section on antlr.org does not include tree building. You'll need to do two things. First, tell ANTLR you want to build an AST:
options {
output=AST;
}
Second, you need to tell it what the tree should look like by either using the tree operators or by using the rewrite rules. See the documentation on tree construction. I usually end up doing a combination of both.
To build tree, you should set output=AST. (Abstract syntax tree)
As far as I know, in an ANTLR only 1 token can be the root of a tree, so you can't get exactly what you're looking for, but you can get close.
Check out:
http://www.antlr.org/wiki/display/ANTLR3/Tree+construction