Way to view a parsed file output? - java

I am Vinod and am interested to use an ANTLR v3.3 for C parser generation in a Java project and generate the parsed tree in some viewable form. I got help to write grammar from this tutorial
ANTLR generates lexer and parser files for the grammar but I don't exactly get how these generated files are viewed. e.g. in few examples from above article, author has generated output using ASTFrame. I found only an interpreter option in ANTLRWorks which shows some tree but it gives error if predicates are more.
Any good reference book or article would be really helpful.

There's only one book you need:
The Definitive ANTLR Reference: Building Domain-Specific Languages.
After that, many more excellent books exist (w.r.t. DSL creation), but this is the book for getting started with ANTLR.

As you saw, ANTLRWorks will print both parse trees and the AST but won't work with predicates and the C target. While not a nice picture like ANTLRWorks, this will print a text version of the AST when you pass it the root of your tree.
void printNodes(pANTLR3_BASE_TREE thisNode, int level)
{
ANTLR3_UINT32 numChildren = thisNode->getChildCount(thisNode);
//printf("Child count %d\n",numChildren);
pANTLR3_BASE_TREE loopNode;
for(int i=0;i<numChildren;i++)
{
//Need to cast since these can hold anything
loopNode = (pANTLR3_BASE_TREE)thisNode->getChild(thisNode,i);
//Print this node
pANTLR3_STRING thisText = loopNode->getText(loopNode);
for(int j=0;j<level;j++)
printf(" ");
printf("%s\n",thisText->chars);
//If this node has a child
if(loopNode->getChildCount(loopNode) > 0)
printNodes(loopNode, level + 2);
}
}

Related

How to correlate similar messages using NLP

I have couple of tweets which needs to be processed. I am trying to find occurrences of messages where it mean some harm to a person. How do I go about achieving this via NLP
I bought my son a toy gun
I shot my neighbor with a gun
I don't like this gun
I would love to own this gun
This gun is a very good buy
Feel like shooting myself with a gun
In the above sentences, the 2nd, 6th one is what I would like to find.
If the problem is restricted only to guns and shooting, then you could use a dependency parser (like the Stanford Parser) to find verbs and their (prepositional) objects, starting with the verb and tracing its dependants in the parse tree. For example, in both 2 and 6 these would be "shoot, with, gun".
Then you can use a list of (near) synonyms for "shoot" ("kill", "murder", "wound", etc) and "gun" ("weapon", "rifle", etc) to check if they occur in this pattern (verb - preposition - noun) in each sentence.
There will be other ways to express the same idea, e.g. "I bought a gun to shoot my neighbor", where the dependency relation is different, and you'd need to detect these types of dependencies too.
All of vpekar's suggestions are good. Here is some python code that will at least parse the sentences and see if they contain verbs in a user defined set of harm words. Note: most 'harm words' probably have multiple senses, many of which could have nothing to do with harm. This approach does not attempt to disambiguate word sense.
(This code assumes you have NLTK and Stanford CoreNLP)
import os
import subprocess
from xml.dom import minidom
from nltk.corpus import wordnet as wn
def StanfordCoreNLP_Plain(inFile):
#Create the startup info so the java program runs in the background (for windows computers)
startupinfo = None
if os.name == 'nt':
startupinfo = subprocess.STARTUPINFO()
startupinfo.dwFlags |= subprocess.STARTF_USESHOWWINDOW
#Execute the stanford parser from the command line
cmd = ['java', '-Xmx1g','-cp', 'stanford-corenlp-1.3.5.jar;stanford-corenlp-1.3.5-models.jar;xom.jar;joda-time.jar', 'edu.stanford.nlp.pipeline.StanfordCoreNLP', '-annotators', 'tokenize,ssplit,pos', '-file', inFile]
output = subprocess.Popen(cmd, stdout=subprocess.PIPE, startupinfo=startupinfo).communicate()
outFile = file(inFile[(str(inFile).rfind('\\'))+1:] + '.xml')
xmldoc = minidom.parse(outFile)
itemlist = xmldoc.getElementsByTagName('sentence')
Document = []
#Get the data out of the xml document and into python lists
for item in itemlist:
SentNum = item.getAttribute('id')
sentList = []
tokens = item.getElementsByTagName('token')
for d in tokens:
word = d.getElementsByTagName('word')[0].firstChild.data
pos = d.getElementsByTagName('POS')[0].firstChild.data
sentList.append([str(pos.strip()), str(word.strip())])
Document.append(sentList)
return Document
def FindHarmSentence(Document):
#Loop through sentences in the document. Look for verbs in the Harm Words Set.
VerbTags = ['VBN', 'VB', 'VBZ', 'VBD', 'VBG', 'VBP', 'V']
HarmWords = ("shoot", "kill")
ReturnSentences = []
for Sentence in Document:
for word in Sentence:
if word[0] in VerbTags:
try:
wordRoot = wn.morphy(word[1],wn.VERB)
if wordRoot in HarmWords:
print "This message could indicate harm:" , str(Sentence)
ReturnSentences.append(Sentence)
except: pass
return ReturnSentences
#Assuming your input is a string, we need to put the strings in some file.
Sentences = "I bought my son a toy gun. I shot my neighbor with a gun. I don't like this gun. I would love to own this gun. This gun is a very good buy. Feel like shooting myself with a gun."
ProcessFile = "ProcFile.txt"
OpenProcessFile = open(ProcessFile, 'w')
OpenProcessFile.write(Sentences)
OpenProcessFile.close()
#Sentence split, tokenize, and part of speech tag the data using Stanford Core NLP
Document = StanfordCoreNLP_Plain(ProcessFile)
#Find sentences in the document with harm words
HarmSentences = FindHarmSentence(Document)
This outputs the following:
This message could indicate harm: [['PRP', 'I'], ['VBD', 'shot'], ['PRP$', 'my'], ['NN', 'neighbor'], ['IN', 'with'], ['DT', 'a'], ['NN', 'gun'], ['.', '.']]
This message could indicate harm: [['NNP', 'Feel'], ['IN', 'like'], ['VBG', 'shooting'], ['PRP', 'myself'], ['IN', 'with'], ['DT', 'a'], ['NN', 'gun'], ['.', '.']]
I would have a look at SenticNet
http://sentic.net/sentics
It provides an open source knowledge base and parser that assigns emotional value to text fragments. Using the library, you could train it to recognize statements that you're interested in.

Find Universe meta Data Information in BO SDK R4

I am new to BO, I need to find universe name and the corresponding metadata information like(Table name, column names, join conditions etc...). I am unable to find proper way to start. I looked with Data Access SDK, Semantic SDk.
Can any one please provide me the sample code or procedure for starting..
I googled a lot but i am unable to find any sample examples
I looked into this link but that code will work only on R2 Server.
http://www.forumtopics.com/busobj/viewtopic.php?t=67088
Help is Highly Apprecitated.....
Assuming you're talking about IDT based universes, you'll need to code some Java. The JavaDoc for the API is available here.
In a nutshell, you do something like this:
SlContext context = SlContext.create() ;
LocalResourceService service = context.getService(LocalResourceService.class) ;
String blxFile = service.retrieve("universe.unx","output directory") ;
RelationalBusinessLayer businessLayer = (RelationalBusinessLayer)service.load(blxFile);
RootFolder rootFolder = businessLayer.getRootFolder() ;
Once you have a hook on the rootFolder, you can use the getChildren() method to drill into the folder structure and access the various subfolders/business objects available.
You may also want to check the CmsResourceService class to access universes stored on the repository.
To get the information you are after will require a 2 part solution. Part 1 use the Rebean SDK looking at WebI reports for the Universe and object names being used with in it.
Part 2 is to break out your favorite COM programming tool, since I try to avoid COM I use the Excel Macro editor, and access the BusinessObjects Designer library. Main code snippets that I currently have are:
Dim boUniv As Designer.Universe
Dim tbl As Designer.Table
For Each tbl In boUniv.Tables
Debug.Print tbl.Name
Next tbl
This prints all of the tables in a universe.
You will need to combine the 2 parts on your own for a dependency list between WebI reports and Universes.

How to read xml file with attributes in java?

I am aware of SO question Failing to get element values using Element.getAttribute() but because I am java begginer, I have additional questions. What I am trying to build is simple application, which will read XML file and then compare it against "golden master." My problem is:
I have lots of different XML files, which differ in attributes
The XML files are relatively big. (810 lines of filed - hard to check it by human eye)
Example of file:
<DocumentIdentification v="Unique_ID"/>
<DocumentVersion v="1"/>
<DocumentType v="P81"/>
<SenderIdentification v="TEST-001--123456" codingScheme="A01"/>
<CreationDateTime v="2012-10-15T13:00:00Z"/>
<InArea v="10STS-TST------W" codingScheme="A01"/>
<OutArea v="10YWT-AYXOP01--8" codingScheme="A01"/>
<TimeSeries>
<Period>
<TimeInterval v="2012-10-14T22:00Z/2012-10-15T22:00Z"/>
<Resolution v="PT15M"/>
<Interval>
<Pos v="1"/>
<Qty v="500"/>
</Interval>
<Interval>
<Pos v="2"/>
<Qty v="500"/>
</Interval>
<Interval>
<Pos v="3"/>
<Qty v="452"/>
</Interval>
...
...
<Interval>
<Pos v="96"/>
<Qty v="891"/>
</Interval>
</Period>
</TimeSeries>
Applying solution from the question mentioned above does not get me much further... I realised that I can cast attributes to NamedNodeMap but I dont know how to iterate through it programatically
Yes, I know it sounds much like "do my homework" but what I really need is at least small kick to butt, moving me in correct direction. Thanks for help
The method item(int index) should help iterating through the attributes:
NamedNodeMap map = getItFromSomeWhere();
int i = 0;
while ((Node node = map.item(i++)) != null) {
// node is ith node in the named map
}

How to write a Ruby-regex pattern in Java (includes recursive named-grouping)?

well... i have a file containing tintin-script. Now i already managed to grab all actions and substitutions from it to show them properly ordered on a website using Ruby, which helps me to keep an overview.
Example TINTIN-script
#substitution {You tell {([a-zA-Z,\-\ ]*)}, %*$}
{<279>[<269> $sysdate[1]<279>, <269>$systime<279> |<219> Tell <279>] <269>to <219>%2<279> : <219>%3}
{4}
#substitution {{([a-zA-Z,\-\ ]*)} tells you, %*$}
{<279>[<269> $sysdate[1]<279>, <269>$systime<279> |<119> Tell <279>] <269>from <119>%2<279> : <119>%3}
{2}
#action {Your muscles suddenly relax, and your nimbleness is gone.}
{
#if {$sw_keepaon}
{
aon;
};
} {5}
#action {xxxxx}
{
#if {$sw_keepfamiliar}
{
familiar $familiar;
};
} {5}
To grab them in my Ruby-App i read my script-file into a varibable 'input' and then use the following pattern to scan the 'input'
pattern = /(?<braces>{([^{}]|\g<braces>)*}){0}^#(?<type>action|substitution)\s*(?<b1>\g<braces>)\s*(?<b2>\g<braces>)\s*(?<b3>\g<braces>)/im
input = ""
File.open("/home/igambin/lmud/lmud.tt") { |file| input = file.read }
input.scan(pattern) { |prio, type, pattern, code|
## here i usually create objects, but for simplicity only output now
puts "Type : #{type}"
puts "Pattern : #{pattern}"
puts "Priority: #{prio}"
puts "Code :\n#{code}"
puts
}
Now my idea was to use the netbeans platform to write a module to not only keep an overview but also to assist editing the tintin script file. So opening the file in an Editor-Window I still need to parse the tintin-file and have all 'actions' and 'substitutions' from the file grabbed and displayed in an eTable, in wich I could dbl-click on one item to open a modification-window.
I've setup the module and got everything ready so far, i just can't figure out how to translate the ruby-regex pattern i've written to a working java-regex-pattern. It seems named-group-capturing and especially the recursive application of these groups is not supported in Java. Without that I seem to be unable to find a working solution...
Here's the ruby pattern again...
pattern = /(?<braces>{([^{}]|\g<braces>)*}){0}^#(?<type>action|substitution)\s*(?<b1>\g<braces>)\s*(?<b2>\g<braces>)\s*(?<b3>\g<braces>)/im
Can anyone help me to create a java pattern that matches the same?
Many thanks in advance for tips/hints/ideas and especially for solutions or (close-to-solution comments)!
Your text format seems pretty simple; it's possible you don't really need recursive matching. This Java-compatible regex matches your sample data correctly, as far as I can tell:
(?s)#(substitution|action)\s*\{(.*?)\}\s*\{(.*?)\}\s*\{(\d+)\}
Would that work for you? If you run Java 7, you can even name the groups. ;)
Can anyone help me to create a java pattern that matches the same?
No, no one can: Java's regex engine does not support recursive patterns (as Ruby 1.9 does).

Understanding trees in ANTLR

I'm trying to use Antlr for some text IDE-like functions -- specifically parsing a file to identify the points for code folding, and for applying syntax highlighting.
First question - is Antlr suitable for this requirement, or is it overkill? This could be achieved using regex and/or a hand-rolled parser ... but it seems that Antlr is out there to do this work for me.
I've had a look through the ... and the excellent tutorial resource here.
I've managed to get a Java grammar built (using the standard grammar), and get everything parsed neatly into a tree. However, I'd have expected to see elements nested within the tree. In actual fact, everything is a child of the very top element.
Eg. Given:
package com.example
public class Foo {
String myString = "Hello World"
// etc
}
I'd have expected the tree node for Foo to be a child of the node for the package declaration. Likewise, myString would be a child of Foo.
Instead, I'm finding that Foo and myString (and everything else for that matter) are all children of package.
Here's the relevant excerpt doing the parsing:
public void init() throws Exception {
CharStream c = new ANTLRFileStream(
"src/com/inversion/parser/antlr/Test.code");
Lexer lexer = new JavaLexer(c);
CommonTokenStream tokens = new CommonTokenStream(lexer);
JavaParser parser = new JavaParser(tokens);
parser.setTreeAdaptor(adaptor);
compilationUnit_return result = parser.compilationUnit();
}
static final TreeAdaptor adaptor = new CommonTreeAdaptor() {
public Object create(Token payload) {
if (payload != null)
{
System.out.println("Create " + JavaParser.tokenNames[payload.getType()] + ": L" + payload.getLine() + ":C" + payload.getCharPositionInLine() + " " + payload.getText());
}
return new CommonTree(payload);
}
};
Examining result.getTree() returns a CommonTree instance, whose children are the result of the parsing.
Expected value (perhaps incorrectly)
package com.example (4 tokens)
|
+-- public class Foo (3 tokens)
|
+--- String myString = "Hello World" (4 tokens)
+--- Comment "// etc"
(or something similar)
Actual value (All values are children of the root node of result.getTree() )
package
com
.
example
public
class
Foo
String
myString
=
"Hello World"
Is my understanding of how this should be working correct?
I'm a complete noob at Antlr so far, and I'm finding the learning curve quite steep.
The Java-6 grammar at the top of the file sharing section on antlr.org does not include tree building. You'll need to do two things. First, tell ANTLR you want to build an AST:
options {
output=AST;
}
Second, you need to tell it what the tree should look like by either using the tree operators or by using the rewrite rules. See the documentation on tree construction. I usually end up doing a combination of both.
To build tree, you should set output=AST. (Abstract syntax tree)
As far as I know, in an ANTLR only 1 token can be the root of a tree, so you can't get exactly what you're looking for, but you can get close.
Check out:
http://www.antlr.org/wiki/display/ANTLR3/Tree+construction

Categories