I need to perform coreference resolution using Stanford core nlp. i downloaded version 3.3.0.the code i used is shown below.there is an error at getCorefMentions() as it is not found. i dont know wat jar files nee to include to remove this error.
package unlpro;
import edu.stanford.nlp.ling.CoreAnnotations.NamedEntityTagAnnotation;
import edu.stanford.nlp.ling.CoreAnnotations.PartOfSpeechAnnotation;
import edu.stanford.nlp.ling.CoreAnnotations.SentencesAnnotation;
import edu.stanford.nlp.ling.CoreAnnotations.TextAnnotation;
import edu.stanford.nlp.ling.CoreAnnotations.TokensAnnotation;
import edu.stanford.nlp.ling.CoreLabel;
import edu.stanford.nlp.pipeline.Annotation;
import edu.stanford.nlp.pipeline.StanfordCoreNLP;
import edu.stanford.nlp.semgraph.SemanticGraph;
import edu.stanford.nlp.semgraph.SemanticGraphCoreAnnotations.CollapsedCCProcessedDependenciesAnnotation;
import edu.stanford.nlp.trees.Tree;
import edu.stanford.nlp.trees.TreeCoreAnnotations.TreeAnnotation;
import edu.stanford.nlp.util.CoreMap;
import java.io.IOException;
import java.util.*;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import edu.stanford.nlp.dcoref.CorefCoreAnnotations.CorefChainAnnotation;
import edu.stanford.nlp.trees.TreeCoreAnnotations.TreeAnnotation;
import edu.stanford.nlp.dcoref.CorefChain;
import edu.stanford.nlp.dcoref.CorefChain.CorefMention;
import edu.stanford.nlp.dcoref.CorefCluster;
import edu.stanford.nlp.dcoref.Document;
import edu.stanford.nlp.dcoref.Mention;
/**
*
* #author Soundri
*/
public class NewClass {
public static void main(String[] args) throws IOException
{
Properties props = new Properties();
props.put("annotators", "tokenize, ssplit, pos, lemma, ner, parse, dcoref");
; // The path for a file that includes a list of demonyms
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
// read some text in the text variable
// String text ="Ram and Lakshman went to the market he purchased";
String text ="The Revolutionary War occurred during the 1700s and it was the first war in the United States";
// create an empty Annotation just with the given text
Annotation document = new Annotation(text);
// run all Annotators on this text
pipeline.annotate(document);
// these are all the sentences in this document
// a CoreMap is essentially a Map that uses class objects as keys and has values with custom types
List<CoreMap> sentences = document.get(SentencesAnnotation.class);
for(CoreMap sentence: sentences) {
// traversing the words in the current sentence
// a CoreLabel is a CoreMap with additional token-specific methods
for (CoreLabel token: sentence.get(TokensAnnotation.class)) {
// this is the text of the token
String word = token.get(TextAnnotation.class);
// this is the POS tag of the token
String pos = token.get(PartOfSpeechAnnotation.class);
// this is the NER label of the token
String ne = token.get(NamedEntityTagAnnotation.class);
}
// this is the parse tree of the current sentence
Tree tree = sentence.get(TreeAnnotation.class);
// this is the Stanford dependency graph of the current sentence
SemanticGraph dependencies = sentence.get(CollapsedCCProcessedDependenciesAnnotation.class);
System.out.println("Dependencies "+dependencies);
}
// This is the coreference link graph
// Each chain stores a set of mentions that link to each other,
// along with a method for getting the most representative mention
// Both sentence and token offsets start at 1!
Map <Integer, CorefChain> graph = document.get(CorefChainAnnotation.class);
System.out.println("Graph "+graph);
// for(int i=1;i<graph.size();i++)
// {
// System.out.println(graph.get(i));
// }
for(Map.Entry<Integer, CorefChain> entry : graph.entrySet()) {
CorefChain c = entry.getValue();
//this is because it prints out a lot of self references which aren't that useful
if(c.getCorefMentions().size() <= 1)
continue;
CorefMention cm = c.getRepresentativeMention();
String clust = "";
List<CoreLabel> tks = document.get(SentencesAnnotation.class).get(cm.sentNum-1).get(TokensAnnotation.class);
for(int i = cm.startIndex-1; i < cm.endIndex-1; i++)
clust += tks.get(i).get(TextAnnotation.class) + " ";
clust = clust.trim();
System.out.println("representative mention: \"" + clust + "\" is mentioned by:");
for(CorefMention m : c.getCorefMentions()){
String clust2 = "";
tks = document.get(SentencesAnnotation.class).get(m.sentNum-1).get(TokensAnnotation.class);
for(int i = m.startIndex-1; i < m.endIndex-1; i++)
clust2 += tks.get(i).get(TextAnnotation.class) + " ";
clust2 = clust2.trim();
//don't need the self mention
if(clust.equals(clust2))
continue;
System.out.println("\t" + clust2);
//
i included all jar files in that package. i am new to this i dont know if some changes need to be done in code
The problem is that you are calling getCorefMentions() on a variable whose class doesn't declare such a method.
The javadoc for CorefChain is here.
I don't claim to understand either Stanford NLP, or what you are trying to do with it, but one possible fix might be to replace
c.getCorefMentions()
with
c.getCoreMentionMap().values()
I'll leave it to you to decide if that makes sense ...
Related
I am trying to run my java project from cmd and taking back this error:
**Exception in thread "main" java.lang.ClassCastException: class [B cannot be cast to class [C ([B and [C are in module java.base of loader 'bootstrap')
at jodd.util.UnsafeUtil.getChars(UnsafeUtil.java:67)
at jodd.json.JsonParser.parse(JsonParser.java:201)
at IndexTester.main(IndexTester.java:78)**
import java.io.File;
import java.io.IOException;
import java.nio.charset.StandardCharsets;
import java.nio.file.Files;
import java.util.Map;
import jodd.json.JsonParser;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.TextField;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.index.LeafReader;
import org.apache.lucene.index.SlowCompositeReaderWrapper;
import org.apache.lucene.index.Terms;
import org.apache.lucene.index.TermsEnum;
import org.apache.lucene.queryparser.classic.ParseException;
import org.apache.lucene.queryparser.classic.QueryParser;
import org.apache.lucene.queryparser.simple.SimpleQueryParser;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
public class IndexTester {
public static void main(String[] args) throws IOException, ParseException {
if (args.length != 3) {
System.err.println("Incorrect number of arguments! Usage:");
System.err.println("");
System.err.println("java IndexTester should_clear_index path_to_data path_to_index ");
System.err.println("\tif should_clear_index is \"1\", the index will be rebuilt. Otherwise, it will try and use an existing index.");
System.err.println("\tpath_to_index should point to an empty directory somewhere.");
System.exit(-1);
}
String shouldClearIndex = args[0];
String inputPath = args[1]; // where to find the file containing the JSON to index
String idxDirPath = args[2]; // where to put/find the Lucene index we want to search
File inputFile = new File(inputPath);
// set up analyzer:
StandardAnalyzer analyzer = new StandardAnalyzer();
// set up the index
File idxDir = new File(idxDirPath);
Directory dir = FSDirectory.open(idxDir.toPath());
if (shouldClearIndex.compareTo("1") == 0) {
System.out.println("Rebuilding index...");
IndexWriterConfig idxConfig = new IndexWriterConfig(analyzer);
idxConfig.setOpenMode(IndexWriterConfig.OpenMode.CREATE);
IndexWriter idxWriter = new IndexWriter(dir, idxConfig);
// Now, populate the index:
int idx = 0;
JsonParser jParser = new JsonParser();
for (String line : Files.readAllLines(inputFile.toPath(), StandardCharsets.UTF_8)) {
// On large amounts of data, this can take a while
if (idx % 10000 == 0) {
System.out.println(idx);
}
idx++;
// each line of the input file is a serialized JSON object
Map j = jParser.parse(line);
// simple types (strings, numbers, etc.) are handled like so:
String title = (String)j.get("title");
// complex types (lists or dicts) get turned into instances of
// java.util.Map and java.util.List.
String ab = (String)j.get("abstract");
// Look at the docs for TextField to see about other types- Lucene can index numbers, dates, etc.
Field tiField = new Field("title", title, TextField.TYPE_STORED);
// The TYPE_STORED directive tells Lucene to actually store the original token in the index. This is handy
// for all sorts of reasons!
// set up any additional fields here
Document thisDoc = new Document();
thisDoc.add(tiField);
// add our document to the index
idxWriter.addDocument(thisDoc);
}
System.out.println("Done!");
System.out.println(idx + " documents indexed.");
idxWriter.close();
}
do {
// Open up the index for querying:
DirectoryReader reader = DirectoryReader.open(dir);
// Tell me about the index (comment in/out as needed- this may be useful for debugging):
// LeafReader slowC = SlowCompositeReaderWrapper.wrap(reader);
// Terms idxTerms = slowC.terms("title"); // change to a different field as needed
// TermsEnum tEnum = idxTerms.iterator(null);
// System.out.println("Terms in the index for the title field:");
// while (tEnum.next() != null) {
// String s = tEnum.term().utf8ToString();
// System.out.println(s + "\t" + tEnum.docFreq());
// }
// Now search
IndexSearcher searcher = new IndexSearcher(reader);
// Things to note re: QueryParser:
// 1. The first argument is the "default" field to search-
// if nothing else is specified, in the query, this is what
// will be searched.
// 2. You always want to make sure to use the same Analyzer for your
// query as you did when you built the index!
//
// Other query parser classes will behave similarly, but may have different argument ordering.
QueryParser qParser = new QueryParser("title", analyzer);
System.out.print("Query: ");
String queryText = System.console().readLine();
if (queryText.compareTo("") != 0) {
Query q = qParser.parse(queryText);
TopDocs results = searcher.search(q, 10);
System.out.println("Got " + results.totalHits + " hits!");
for (ScoreDoc d : results.scoreDocs) {
System.out.println(d.doc + "\t" + d.score);
Document res = reader.document(d.doc);
System.out.println(res.getField("title").stringValue());
}
}
} while (true); // keep querying until user hits ctrl-C
}
}
This is my code and thiis is my .txt file:
[https://openeclass.uom.gr/modules/document/file.php/DAI148/%CE%94%CE%B9%CE%AC%CE%BB%CE%B5%CE%BE%CE%B7%2006%20-%20%CE%94%CE%B9%CE%B1%CE%B2%CE%B1%CE%B8%CE%BC%CE%B9%CF%83%CE%BC%CE%AD%CE%BD%CE%B7%20%CE%91%CE%BD%CE%AC%CE%BA%CF%84%CE%B7%CF%83%CE%B7%2C%20%CE%9C%CE%BF%CE%BD%CF%84%CE%AD%CE%BB%CE%BF%20%CE%94%CE%B9%CE%B1%CE%BD%CF%85%CF%83%CE%BC%CE%B1%CF%84%CE%B9%CE%BA%CE%BF%CF%8D%20%CE%A7%CF%8E%CF%81%CE%BF%CF%85/Using%20Lucene/data.txt.zip]
Please switch to the latest Jodd JSON v6.
There is probably an issue with the UnsafeUtil.getChars. What you can do is the following:
jParser.parse(line.toCharArray());
i.e. to skip using the UnsafeUtil.getChars().
The new version of Jodd is not using the Unsafe class anymore.
How could I use Stanford Core NLP to generate the dependency of a Chinese Sentence? It can only work greatly with English
public class DemoChinese { public static void main(String[] args) {
Properties props = PropertiesUtils.asProperties("props", "StanfordCoreNLP-chinese.properties");
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
Annotation document = new Annotation("我喜欢吃苹果");
pipeline.annotate(document);
List<CoreMap> sentence = document.get(SentencesAnnotation.class);
#SuppressWarnings("deprecation")
// Produce a dependency of this sentence.
SemanticGraph dp= sentence.get(0).get(SemanticGraphCoreAnnotations
.CollapsedCCProcessedDependenciesAnnotation.class);
String s = dp.typedDependencies().toString();
System.out.println(s);
}
}
Setting up the Properties as you did doesn’t work. This is maybe confusing, but the StanfordCoreNLP constructor needs a “real” properties list and it won’t process a props key expanding it out with its contents. (But doing things as you did appears in some examples – I initially assumed that it used to work and there was a regression, but it doesn’t seem like it worked in any of 3.6. 3.7, or 3.8, so maybe those examples never worked.) Also, in the example below, I get the dependencies in the non-deprecated way.
import java.io.IOException;
import java.io.PrintWriter;
import java.util.List;
import java.util.Map;
import java.util.Properties;
import edu.stanford.nlp.coref.CorefCoreAnnotations;
import edu.stanford.nlp.coref.data.CorefChain;
import edu.stanford.nlp.io.IOUtils;
import edu.stanford.nlp.ling.CoreAnnotations;
import edu.stanford.nlp.ling.CoreLabel;
import edu.stanford.nlp.pipeline.Annotation;
import edu.stanford.nlp.pipeline.StanfordCoreNLP;
import edu.stanford.nlp.semgraph.SemanticGraph;
import edu.stanford.nlp.semgraph.SemanticGraphCoreAnnotations;
import edu.stanford.nlp.util.CoreMap;
/**
* #author Christopher Manning
*/
public class StanfordCoreNlpDemoChinese {
private StanfordCoreNlpDemoChinese() { } // static main
public static void main(String[] args) throws IOException {
// set up optional output files
PrintWriter out;
if (args.length > 1) {
out = new PrintWriter(args[1]);
} else {
out = new PrintWriter(System.out);
}
Properties props = new Properties();
props.load(IOUtils.readerFromString("StanfordCoreNLP-chinese.properties"));
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
Annotation document;
if (args.length > 0) {
document = new Annotation(IOUtils.slurpFileNoExceptions(args[0]));
} else {
document = new Annotation("我喜欢吃苹果");
}
pipeline.annotate(document);
List<CoreMap> sentences = document.get(CoreAnnotations.SentencesAnnotation.class);
int sentNo = 1;
for (CoreMap sentence : sentences) {
out.println("Sentence #" + sentNo + " tokens are:");
for (CoreMap token : sentence.get(CoreAnnotations.TokensAnnotation.class)) {
out.println(token.toShorterString("Text", "CharacterOffsetBegin", "CharacterOffsetEnd", "Index", "PartOfSpeech", "NamedEntityTag"));
}
out.println("Sentence #" + sentNo + " basic dependencies are:");
out.println(sentence.get(SemanticGraphCoreAnnotations.BasicDependenciesAnnotation.class).toString(SemanticGraph.OutputFormat.LIST));
sentNo++;
}
// Access coreference.
out.println("Coreference information");
Map<Integer, CorefChain> corefChains =
document.get(CorefCoreAnnotations.CorefChainAnnotation.class);
if (corefChains == null) { return; }
for (Map.Entry<Integer,CorefChain> entry: corefChains.entrySet()) {
out.println("Chain " + entry.getKey());
for (CorefChain.CorefMention m : entry.getValue().getMentionsInTextualOrder()) {
// We need to subtract one since the indices count from 1 but the Lists start from 0
List<CoreLabel> tokens = sentences.get(m.sentNum - 1).get(CoreAnnotations.TokensAnnotation.class);
// We subtract two for end: one for 0-based indexing, and one because we want last token of mention not one following.
out.println(" " + m + ":[" + tokens.get(m.startIndex - 1).beginPosition() + ", " +
tokens.get(m.endIndex - 2).endPosition() + ')');
}
}
out.println();
IOUtils.closeIgnoringExceptions(out);
}
}
I mainly use Python and new to Java. However I am trying to write a Java program and make it work in Python via Py4j Python package. Following program is what I adapted from an example. I encountered a compile error. Could you shed some light? I am pretty sure it is basic error. Thanks.
> compile error: incompatible type: SimpleMatrix cannot be converted to String: return senti_scores.
> intended input in Python:
app = CoreNLPSentiScore()
app.findSentiment("I like this book")
intended output: matrix: Type = dense , numRows = 5 , numCols = 1
0.016
0.037
0.132
0.618
0.196
import java.util.List;
import java.util.Properties;
import edu.stanford.nlp.ling.CoreAnnotations;
import edu.stanford.nlp.neural.rnn.RNNCoreAnnotations;
import edu.stanford.nlp.pipeline.Annotation;
import edu.stanford.nlp.pipeline.StanfordCoreNLP;
import edu.stanford.nlp.sentiment.SentimentCoreAnnotations.SentimentAnnotatedTree;
import edu.stanford.nlp.trees.Tree;
import edu.stanford.nlp.util.ArrayCoreMap;
import edu.stanford.nlp.util.CoreMap;
import py4j.GatewayServer;
import org.ejml.simple.SimpleMatrix;
public class CoreNLPSentiScore {
static StanfordCoreNLP pipeline;
public static void init() {
Properties props = new Properties();
props.setProperty("annotators", "tokenize, ssplit, parse, sentiment");
pipeline = new StanfordCoreNLP(props);
}
public static void main(String[] args) {
CoreNLPSentiScore app = new CoreNLPSentiScore();
// app is now the gateway.entry_point
GatewayServer server = new GatewayServer(app);
server.start();
}
//public static void main(String tweet) {
//public static String findSentiment(String tweet) {
public String findSentiment(String tweet) {
//String SentiReturn = "2";
//String[] SentiClass ={"very negative", "negative", "neutral", "positive", "very positive"};
//Sentiment is an integer, ranging from 0 to 4.
//0 is very negative, 1 negative, 2 neutral, 3 positive and 4 very positive.
//int sentiment = 2;
SimpleMatrix senti_score = new SimpleMatrix();
if (tweet != null && tweet.length() > 0) {
Annotation annotation = pipeline.process(tweet);
List<CoreMap> sentences = annotation.get(CoreAnnotations.SentencesAnnotation.class);
if (sentences != null && sentences.size() > 0) {
ArrayCoreMap sentence = (ArrayCoreMap) sentences.get(0);
//Tree tree = sentence.get(SentimentAnnotatedTree.class);
Tree tree = sentence.get(SentimentAnnotatedTree.class);
senti_score = RNNCoreAnnotations.getPredictions(tree);
//SentiReturn = SentiClass[sentiment];
}
}
//System.out.println(senti_score);
return senti_score;
//System.out.println(senti_score);
}
}
Java is object oriented program but it is not like python where everything is considered as object.
In your program mentioned above. There is a method findsentiment is returning SimpleMatrix but in the method declaration it is String.
Solution - You can overide a method toString() in your class SimpleMatrix and while returning senti_score return senti_score.toString()
I have to extract some integers from a tag of a html code.
For example if I have:
< tag blabla="title"><a href="/test/tt123> TEST 1 < tag >
I did that removing all the chars and leaving only the digits and it worked until in the title name there was another digit, so i got "1231".
str.replaceAll("[^\\d.]", "");
How can I do to extract only the "123" integer?? Thanks for your help!
Jsoup is a good api to play around with html. Using that you could do like
String html = "<tag blabla=\"title\"><a href=\"/test/tt123\"> TEST 1 <tag>";
Document doc = Jsoup.parseBodyFragment(html);
String value = doc.select("a").get(0).attr("href").replaceAll("[^\\d.]", "");
System.out.println(value);
You could do this (a method that removes all duplicates in any number):
int[] foo = new int[str.length];
for(int i = 0; i < str.length; i++) {
foo[i] = Integer.parseInt(str.charAt(i));
}
Set<Integer> set = new HashSet<Integer>();
for(int i = 0; i < foo.length; i++){
set.add(foo[i]);
}
Now you have a set where all duplicate numbers from any string are removed. I saw your last comment not. So this answer might not be very useful to you. What you could do is that the three first digits in the foo array as well, which will give you 123.
First use XPath to parse out only the href value, then apply your replaceAll to achieve what you desired.
And you don't have to download any additional frameworks or libraries for this to work.
Here's a quick demo class on how this works:
package com.example.test;
import java.io.StringReader;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathExpressionException;
import javax.xml.xpath.XPathFactory;
import org.xml.sax.InputSource;
public class Test {
public static void main(String[]args){
String xml = "<tag blabla=\"title\"> TEST 1 </tag>";
XPath xPath = XPathFactory.newInstance().newXPath();
InputSource source = new InputSource(new StringReader(xml));
String hrefValue = null;
try {
hrefValue = (String) xPath.evaluate("//#href", source, XPathConstants.STRING);
} catch (XPathExpressionException e) {
e.printStackTrace();
}
String numbers = hrefValue.replaceAll("[^\\d.]", "");
System.out.println(numbers);
}
}
I have written the following Java program.this program splits the given sentence and tags each word with its parts of speech using standard pos tagger.I have hashed each parts of speech tag with a number in a hash set pos_tag_numb.
I got the correct parts of speech for each word,however when i am tring to get tag number from the hash table ,i get a null value.
import java.util.StringTokenizer;
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.util.HashMap;
import java.util.Hashtable;
import java.util.Map;
import edu.stanford.nlp.ling.Sentence;
import edu.stanford.nlp.ling.TaggedWord;
import edu.stanford.nlp.ling.HasWord;
import edu.stanford.nlp.tagger.maxent.MaxentTagger;
class maindemo
{
public static void main(String [] args) throws IOException {
//if(args.length<1) {
//System.err.println("Usage: java SentiWordNetDemoCode <pathToSentiWordNetFile>");
//return;
//}
String pathToSWN = "D:\\Acad !!\\Project_idrbt\\home\\swn\\www\\admin\\dump\\SentiWordNet_3.0.0_20130122.txt";
MaxentTagger tagger = new MaxentTagger("D:\\Acad !!\\Project_idrbt\\stanford-postagger-2014-01-04\\models\\english-left3words-distsim.tagger");
//hashing each pos tag to a number
Hashtable<String,Integer> pos_tag_numb = new Hashtable<String,Integer>();
pos_tag_numb.put("JJ",2);
pos_tag_numb.put("JJR",2);
pos_tag_numb.put("JJS",2);
pos_tag_numb.put("RB",5);
pos_tag_numb.put("RBR",5);
pos_tag_numb.put("RBS",5);
pos_tag_numb.put("WRB",5);
SentiWordNetDemoCode sentiwordnet = new SentiWordNetDemoCode(pathToSWN);
String review="very good little bad";
String[] tokens=review.split(" ");
int ti=0;
for(String s: tokens)
{
String taggedstring=tagger.tagString(s);
String[] word_pos_pair=taggedstring.split("_");
String pos=new String(word_pos_pair[1]);
System.out.println(word_pos_pair[0]+" "+ pos_tag_numb.get( pos ) );
}
}
}
tagger.tagString(s) gives an output of for WORD_POSTAG,Eg: very_RB ,good_JJ
If i add the line System.out.println("tag is "+pos); at line 54 output is
tag is RB
very null
tag is JJ
good null
tag is RB
little null
tag is JJ
bad null
FInally i solved my own problem,i guess the method tagger.tagString() is returning a string with some trailing spaces..i just added one statement
taggedstring= taggedstring.trim();
before i split the string i.e before the statement
String[] word_pos_pair=taggedstring.split("_");