I am trying out the new NN Dependency Parser from Stanford. According to the demo they have provided, this is how the parsing is done:
import edu.stanford.nlp.process.DocumentPreprocessor;
import edu.stanford.nlp.trees.GrammaticalStructure;
import edu.stanford.nlp.parser.nndep.DependencyParser;
...
GrammaticalStructure gs = null;
DocumentPreprocessor tokenizer = new DocumentPreprocessor(new StringReader(sentence));
for (List<HasWord> sent : tokenizer) {
List<TaggedWord> tagged = tagger.tagSentence(sent);
gs = parser.predict(tagged);
// Print typed dependencies
System.out.println(Grammatical structure: " + gs);
}
Now, what I want to do is this object gs, which is of class GrammaticalStructure, to be casted as a Tree object from edu.stanford.nlp.trees.Tree.
I naively tried out with simple casting:
Tree t = (Tree) gs;
but, this is not possible (the IDE gives an error: Cannot cast from GrammaticalStructure to Tree).
How do I do this?
You should be able to get the Tree using gs.root().
According to the documentation, that method returns a Tree (actually, a TreeGraphNode) which represents the grammatical structure.
You could print that tree in a human-friendly way with gs.root().pennPrint().
Related
I have a scikit model that I'm using in my java app using JPMML. I'm trying to set the InputFields using the name of the column that was used during training, but "inField.getName().getValue()" is obfuscated to "x{#}". Is there anyway i could map "x{#}" back to the original feature/attribute name?
Map<FieldName, FieldValue> arguments = new LinkedHashMap<>();
or (InputField inField : patternEvaluator.getInputFields()) {
int value = activeFeatures.contains(inField.getName().getValue()) ? 1 : 0;
FieldValue inputFieldValue = inField.prepare(value);
arguments.put(inField.getName(), inputFieldValue);
}
Map<FieldName, ?> results = patternEvaluator.evaluate(arguments);
Here's how I'm generating the modal
from sklearn2pmml import PMMLPipeline
from sklearn2pmml import PMMLPipeline
import os
import pandas as pd
from sklearn.pipeline import Pipeline
import numpy as np
data = pd.read_csv('/pydata/training.csv')
X = data[data.keys()[:-1]].as_matrix()
y = data['classname'].as_matrix()
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.3,random_state=0)
estimators = [("read", RandomForestClassifier(n_jobs=5,n_estimators=200, max_features='auto'))]
pipe = PMMLPipeline(estimators)
pipe.fit(X_train,y_train)
pipe.active_fields = np.array(data.columns)
sklearn2pmml(pipe, "/pydata/model.pmml", with_repr = True)
Thanks
Does the PMML document contain actual field names at all? Open it in a text editor, and see what are the values of /PMML/DataDictionary/DataField#name attributes.
Your question indicates that the conversion from Scikit-Learn to PMML was incomplete, because it didn't include information about active field (aka input field) names. In that case they are assumed to be x1, x2, .., xn.
Your pipeline only includes the estimator, that is why the names are lost. You have to include all the preprocessing steps as well in order to get them into the PMML.
Let's assume you do not do any preprocessing at all, then that is probably what you need (I do not repeat parts of your code which are required in this snippet):
nones = [(d, None) for d in data.columns]
mapper = DataFrameMapper(nones,df_out=True)
lm = PMMLPipeline([
("mapper", mapper),
("estimator", estimators)
])
lm.fit(X_train,y_train)
sklearn2pmml(lm, "ScikitLearnNew.pmml", with_repr=True)
In case you do require some preprocessing on your data, instead of None you can use any other transformator (e.g. LabelBinarizer). But the preprocessing has to be happening inside the pipeline in order to be included in the PMML.
I experience issues with ANTLR 4, using the visitor classes.
I try to write the following code:
import bla.gen.InputLexer;
import bla.gen.InputParser;
import org.antlr.v4.runtime.ANTLRFileStream;
import org.antlr.v4.runtime.CommonTokenStream;
import org.antlr.v4.runtime.tree.ParseTree;
public class Main {
public static void main(String[] args) throws Exception {
InputLexer lexer = new InputLexer(new ANTLRFileStream("pl_example.lp"));
InputParser parser = new InputParser(new CommonTokenStream(lexer));
parser.setBuildParseTree(true);
ParseTree tree = parser.prog();
ParserVisitor visitor = new ParserVisitor();
visitor.visit();
}
}
I try to mimic the code found in the book example here:
https://pragprog.com/titles/tpantlr2/source_code
(I have no access to the book, just the examples).
But I've got an error because the method parser.prog() does not exists...
I use ANTLR 4.5.
Do you know how to generate ParseTree with this version?
The name of the method used to retrieve the parse tree is the same as the entry parse rule chosen. If you used a different name for the entry parse rule, the method will be called that.
The problem is that you deleted the initial symbol from you grammatic the LabeledExpr.g4 in the book, that is 'prog' and have one or more 'stats' 'stat+'
prog: stat+ ;
Then, not find the initial symbol to go through the tree.
I'm writing a program in Java that exploits the OWL API version 3.1.0. I have a String that represents an axiom using the Manchester OWL Syntax, I would like to convert this string in a OWLAxiom object, because I need to add the resulting axiom into an ontology using the method addAxiom(OWLOntology owl, OWLAxiom axiom) (It's a method of OWLOntologyManager). How can I do that?
How about something like the following Java code? Note that I'm parsing a complete, but small, ontology. If you're actually expecting just some Manchester text that won't be parsable as a complete ontology, you may need to prepend some standard prefix to everything. That's more of a concern for the particular application though. You'll also need to make sure that you're getting the kinds of axioms that you're interested in. There will, necessarily, be declaration axioms (e.g., that Person is a class), but you're more likely interested in TBox and ABox axioms, so I've added some notes about how you can get those.
One point to note is that if you're only trying to add the axioms to an existing ontology, that's what the OWLParser methods do, although the Javadoc doesn't make this particularly clear (in my opinion). The documentation about OWLParser says that
An OWLParser parses an ontology document into an OWL API object representation of an ontology.
and that's not strictly true. If the ontology argument to parse() already has content, and parse() doesn't remove it, then the ontology argument ends up being an object representation of a superset of the ontology document (it's the ontology document plus the prior content). Fortunately, though, this is exactly what you want in your case: you want to read a snippet of text and add it to an existing ontology.
import java.io.ByteArrayInputStream;
import java.io.IOException;
import java.io.InputStream;
import org.coode.owlapi.manchesterowlsyntax.ManchesterOWLSyntaxParserFactory;
import org.semanticweb.owlapi.apibinding.OWLManager;
import org.semanticweb.owlapi.io.OWLParser;
import org.semanticweb.owlapi.io.StreamDocumentSource;
import org.semanticweb.owlapi.model.OWLAxiom;
import org.semanticweb.owlapi.model.OWLOntology;
import org.semanticweb.owlapi.model.OWLOntologyCreationException;
import org.semanticweb.owlapi.model.OWLOntologyManager;
public class ReadManchesterString {
public static void main(String[] args) throws OWLOntologyCreationException, IOException {
// Get a manager and create an empty ontology, and a parser that
// can read Manchester syntax.
final OWLOntologyManager manager = OWLManager.createOWLOntologyManager();
final OWLOntology ontology = manager.createOntology();
final OWLParser parser = new ManchesterOWLSyntaxParserFactory().createParser( manager );
// A small OWL ontology in the Manchester syntax.
final String content = "" +
"Prefix: so: <http://stackoverflow.com/q/21005908/1281433/>\n" +
"Class: so:Person\n" +
"Class: so:Young\n" +
"\n" +
"Class: so:Teenager\n" +
" SubClassOf: (so:Person and so:Young)\n" +
"";
// Create an input stream from the ontology, and use the parser to read its
// contents into the ontology.
try ( final InputStream in = new ByteArrayInputStream( content.getBytes() ) ) {
parser.parse( new StreamDocumentSource( in ), ontology );
}
// Iterate over the axioms of the ontology. There are more than just the subclass
// axiom, because the class declarations are also axioms. All in all, there are
// four: the subclass axiom and three declarations of named classes.
System.out.println( "== All Axioms: ==" );
for ( final OWLAxiom axiom : ontology.getAxioms() ) {
System.out.println( axiom );
}
// You can iterate over more specific axiom types, though. For instance,
// you could just iterate over the TBox axioms, in which case you'll just
// get the one subclass axiom. You could also iterate over
// ontology.getABoxAxioms() to get ABox axioms.
System.out.println( "== ABox Axioms: ==" );
for ( final OWLAxiom axiom : ontology.getTBoxAxioms( false ) ) {
System.out.println( axiom );
}
}
}
The output is:
== All Axioms: ==
SubClassOf(<http://stackoverflow.com/q/21005908/1281433/Teenager> ObjectIntersectionOf(<http://stackoverflow.com/q/21005908/1281433/Person> <http://stackoverflow.com/q/21005908/1281433/Young>))
Declaration(Class(<http://stackoverflow.com/q/21005908/1281433/Person>))
Declaration(Class(<http://stackoverflow.com/q/21005908/1281433/Young>))
Declaration(Class(<http://stackoverflow.com/q/21005908/1281433/Teenager>))
== ABox Axioms: ==
SubClassOf(<http://stackoverflow.com/q/21005908/1281433/Teenager> ObjectIntersectionOf(<http://stackoverflow.com/q/21005908/1281433/Person> <http://stackoverflow.com/q/21005908/1281433/Young>))
I got a small problem with the output of ANTLR.
Ive a realy small grammar which looks like this:
test : states;
states : '.states' state+;
state : stateID=ID {
System.out.println("state: " + $stateID.text);
| stateID=ID '{' state* '}' {
System.out.println("SubState: " + $stateID.text);};
And what I want to parse looks like this:
a{
b
c{
d
}
}
Well, the problem is, the first token I'll get is 'b' followed by 'd' and then 'c'.
But my intention is to parse it into my datastructure and I need to know their parents.
What I know by this order is, c is the parent of d, but what about b?
If I rewrote the example to this form:
a{
c{
d
}
b
}
Everything is fine. So is there a way to know who is the parent of b, without having the constraint to write it in the last example?
In ANTLR 4 using grammar-actions is no longer recommended. The parser may visit and test different rules and alternatives in unexpected orders, so unless you're adding error-handling code it's better to let the process run normally and then inspect the result.
So you let the parser create its tree, and then write a custom listener that will emit your println calls at each step. For example, suppose you're working with a grammar called Foo, so that and ANTLR autogenerates a FooBaseListener class.
So first you'd make something like:
public class PrintingFooListener extends FooBaseListener {
#Override
public void enterState(FooParser.StateContext ctx)
{
// It is possible to get all sorts of token/subrule/text
// information from the ctx input, especially if you labeled
// the parser/lexer rules.
System.out.println("I entered State");
}
}
Then use the ParseTreeWalker utility class to navigate through the parse tree with your visitor in-tow:
// Assume lexing, etc. already done before this point
ParserRuleContext<Token> tree = parser.myMainRule(); // Do parse
ParseTreeWalker walker = new ParseTreeWalker(); // Premade utility class
PrintingFooListener listener = new PrintingFooListener(); // Your customized subclass
walker.walk(listener, tree);
I have some data coming in from a RabbitMQ. The data is formatted as triples, so a message from the queue could look something like this:
:Tom foaf:knows :Anna
where : is the standard namespace of the ontology into which I want to import the data, but other prefixes from imports are also possible. The triples consist of subject, property/predicate and object and I know in each message which is which.
On the receiving side, I have a Java program with an OWLOntology object that represents the ontology where the newly arriving triples should be stored temporarily for reasoning and other stuff.
I kind of managed to get the triples into a Jena OntModel but that's where it ends. I tried to use OWLRDFConsumer but I could not find anything about how to apply it.
My function looks something like this:
public void addTriple(RDFTriple triple) {
//OntModel model = ModelFactory.createOntologyModel();
String subject = triple.getSubject().toString();
subject = subject.substring(1,subject.length()-1);
Resource s = ResourceFactory.createResource(subject);
String predicate = triple.getPredicate().toString();
predicate = predicate.substring(1,predicate.length()-1);
Property p = ResourceFactory.createProperty(predicate);
String object = triple.getObject().toString();
object = object.substring(1,object.length()-1);
RDFNode o = ResourceFactory.createResource(object);
Statement statement = ResourceFactory.createStatement(s, p, o);
//model.add(statement);
System.out.println(statement.toString());
}
I did the substring operations because the RDFTriple class adds <> around the arguments of the triple and the constructor of Statement fails as a consequence.
If anybody could point me to an example that would be great. Maybe there's a much better way that I haven't thought of to achieve the same thing?
It seems like the OWLRDFConsumer is generally used to connect the RDF parsers with OWL-aware processors. The following code seems to work, though, as I've noted in the comments, there are a couple of places where I needed an argument and put in the only available thing I could.
The following code: creates an ontology; declares two named individuals, Tom and Anna; declares an object property, likes; and declares a data property, age. Once these are declared we print the ontology just to make sure that it's what we expect. Then it creates an OWLRDFConsumer. The consumer constructor needs an ontology, an AnonymousNodeChecker, and an OWLOntologyLoaderConfiguration. For the configuration, I just used one created by the no-argument constructor, and I think that's OK. For the node checker, the only convenient implementer is the TurtleParser, so I created one of those, passing null as the Reader. I think this will be OK, since the parser won't be called to read anything. Then the consumer's handle(IRI,IRI,IRI) and handle(IRI,IRI,OWLLiteral) methods are used to process triples one at a time. We add the triples
:Tom :likes :Anna
:Tom :age 35
and then print out the ontology again to ensure that the assertions got added. Since you've already been getting the RDFTriples, you should be able to pull out the arguments that handle() needs. Before processing the triples, the ontology contained:
<NamedIndividual rdf:about="http://example.org/Tom"/>
and afterward this:
<NamedIndividual rdf:about="http://example.org/Tom">
<example:age rdf:datatype="http://www.w3.org/2001/XMLSchema#integer">35</example:age>
<example:likes rdf:resource="http://example.org/Anna"/>
</NamedIndividual>
Here's the code:
import java.io.Reader;
import org.coode.owlapi.rdfxml.parser.OWLRDFConsumer;
import org.semanticweb.owlapi.apibinding.OWLManager;
import org.semanticweb.owlapi.model.IRI;
import org.semanticweb.owlapi.model.OWLDataFactory;
import org.semanticweb.owlapi.model.OWLDataProperty;
import org.semanticweb.owlapi.model.OWLEntity;
import org.semanticweb.owlapi.model.OWLNamedIndividual;
import org.semanticweb.owlapi.model.OWLObjectProperty;
import org.semanticweb.owlapi.model.OWLOntology;
import org.semanticweb.owlapi.model.OWLOntologyCreationException;
import org.semanticweb.owlapi.model.OWLOntologyLoaderConfiguration;
import org.semanticweb.owlapi.model.OWLOntologyManager;
import org.semanticweb.owlapi.model.OWLOntologyStorageException;
import uk.ac.manchester.cs.owl.owlapi.turtle.parser.TurtleParser;
public class ExampleOWLRDFConsumer {
public static void main(String[] args) throws OWLOntologyCreationException, OWLOntologyStorageException {
// Create an ontology.
OWLOntologyManager manager = OWLManager.createOWLOntologyManager();
OWLDataFactory factory = manager.getOWLDataFactory();
OWLOntology ontology = manager.createOntology();
// Create some named individuals and an object property.
String ns = "http://example.org/";
OWLNamedIndividual tom = factory.getOWLNamedIndividual( IRI.create( ns+"Tom" ));
OWLObjectProperty likes = factory.getOWLObjectProperty( IRI.create( ns+"likes" ));
OWLDataProperty age = factory.getOWLDataProperty( IRI.create( ns+"age" ));
OWLNamedIndividual anna = factory.getOWLNamedIndividual( IRI.create( ns+"Anna" ));
// Add the declarations axioms to the ontology so that the triples involving
// these are understood (otherwise the triples will be ignored).
for ( OWLEntity entity : new OWLEntity[] { tom, likes, age, anna } ) {
manager.addAxiom( ontology, factory.getOWLDeclarationAxiom( entity ));
}
// Print the the ontology to see that the entities are declared.
// The important result is
// <NamedIndividual rdf:about="http://example.org/Tom"/>
// with no properties
manager.saveOntology( ontology, System.out );
// Create an OWLRDFConsumer for the ontology. TurtleParser implements AnonymousNodeChecker, so
// it was a candidate for use here (but I make no guarantees about whether it's appropriate to
// do this). Since it won't be reading anything, we pass it a null InputStream, and this doesn't
// *seem* to cause any problem. Hopefully the default OWLOntologyLoaderConfiguration is OK, too.
OWLRDFConsumer consumer = new OWLRDFConsumer( ontology, new TurtleParser((Reader) null), new OWLOntologyLoaderConfiguration() );
// The consumer handles (IRI,IRI,IRI) and (IRI,IRI,OWLLiteral) triples.
consumer.handle( tom.getIRI(), likes.getIRI(), anna.getIRI() );
consumer.handle( tom.getIRI(), age.getIRI(), factory.getOWLLiteral( 35 ));
// Print the ontology to see the new object and data property assertions. The import contents is
// still Tom:
// <NamedIndividual rdf:about="http://example.org/Tom">
// <example:age rdf:datatype="http://www.w3.org/2001/XMLSchema#integer">35</example:age>
// <example:likes rdf:resource="http://example.org/Anna"/>
// </NamedIndividual>
manager.saveOntology( ontology, System.out );
}
}
In ONT-API, which is an extended Jena-based implementation of OWL-API, it is quite simple:
OWLOntologyManager manager = OntManagers.createONT();
OWLOntology ontology = manager.createOntology(IRI.create("http://example.com#test"));
((Ontology)ontology).asGraphModel().createResource("http://example.com#clazz1").addProperty(RDF.type, OWL.Class);
ontology.axioms(AxiomType.DECLARATION).forEach(System.out::println);
For more information see ONT-API wiki, examples