I have written the following Java program.this program splits the given sentence and tags each word with its parts of speech using standard pos tagger.I have hashed each parts of speech tag with a number in a hash set pos_tag_numb.
I got the correct parts of speech for each word,however when i am tring to get tag number from the hash table ,i get a null value.
import java.util.StringTokenizer;
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.util.HashMap;
import java.util.Hashtable;
import java.util.Map;
import edu.stanford.nlp.ling.Sentence;
import edu.stanford.nlp.ling.TaggedWord;
import edu.stanford.nlp.ling.HasWord;
import edu.stanford.nlp.tagger.maxent.MaxentTagger;
class maindemo
{
public static void main(String [] args) throws IOException {
//if(args.length<1) {
//System.err.println("Usage: java SentiWordNetDemoCode <pathToSentiWordNetFile>");
//return;
//}
String pathToSWN = "D:\\Acad !!\\Project_idrbt\\home\\swn\\www\\admin\\dump\\SentiWordNet_3.0.0_20130122.txt";
MaxentTagger tagger = new MaxentTagger("D:\\Acad !!\\Project_idrbt\\stanford-postagger-2014-01-04\\models\\english-left3words-distsim.tagger");
//hashing each pos tag to a number
Hashtable<String,Integer> pos_tag_numb = new Hashtable<String,Integer>();
pos_tag_numb.put("JJ",2);
pos_tag_numb.put("JJR",2);
pos_tag_numb.put("JJS",2);
pos_tag_numb.put("RB",5);
pos_tag_numb.put("RBR",5);
pos_tag_numb.put("RBS",5);
pos_tag_numb.put("WRB",5);
SentiWordNetDemoCode sentiwordnet = new SentiWordNetDemoCode(pathToSWN);
String review="very good little bad";
String[] tokens=review.split(" ");
int ti=0;
for(String s: tokens)
{
String taggedstring=tagger.tagString(s);
String[] word_pos_pair=taggedstring.split("_");
String pos=new String(word_pos_pair[1]);
System.out.println(word_pos_pair[0]+" "+ pos_tag_numb.get( pos ) );
}
}
}
tagger.tagString(s) gives an output of for WORD_POSTAG,Eg: very_RB ,good_JJ
If i add the line System.out.println("tag is "+pos); at line 54 output is
tag is RB
very null
tag is JJ
good null
tag is RB
little null
tag is JJ
bad null
FInally i solved my own problem,i guess the method tagger.tagString() is returning a string with some trailing spaces..i just added one statement
taggedstring= taggedstring.trim();
before i split the string i.e before the statement
String[] word_pos_pair=taggedstring.split("_");
Related
I am trying to run my java project from cmd and taking back this error:
**Exception in thread "main" java.lang.ClassCastException: class [B cannot be cast to class [C ([B and [C are in module java.base of loader 'bootstrap')
at jodd.util.UnsafeUtil.getChars(UnsafeUtil.java:67)
at jodd.json.JsonParser.parse(JsonParser.java:201)
at IndexTester.main(IndexTester.java:78)**
import java.io.File;
import java.io.IOException;
import java.nio.charset.StandardCharsets;
import java.nio.file.Files;
import java.util.Map;
import jodd.json.JsonParser;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.TextField;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.index.LeafReader;
import org.apache.lucene.index.SlowCompositeReaderWrapper;
import org.apache.lucene.index.Terms;
import org.apache.lucene.index.TermsEnum;
import org.apache.lucene.queryparser.classic.ParseException;
import org.apache.lucene.queryparser.classic.QueryParser;
import org.apache.lucene.queryparser.simple.SimpleQueryParser;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
public class IndexTester {
public static void main(String[] args) throws IOException, ParseException {
if (args.length != 3) {
System.err.println("Incorrect number of arguments! Usage:");
System.err.println("");
System.err.println("java IndexTester should_clear_index path_to_data path_to_index ");
System.err.println("\tif should_clear_index is \"1\", the index will be rebuilt. Otherwise, it will try and use an existing index.");
System.err.println("\tpath_to_index should point to an empty directory somewhere.");
System.exit(-1);
}
String shouldClearIndex = args[0];
String inputPath = args[1]; // where to find the file containing the JSON to index
String idxDirPath = args[2]; // where to put/find the Lucene index we want to search
File inputFile = new File(inputPath);
// set up analyzer:
StandardAnalyzer analyzer = new StandardAnalyzer();
// set up the index
File idxDir = new File(idxDirPath);
Directory dir = FSDirectory.open(idxDir.toPath());
if (shouldClearIndex.compareTo("1") == 0) {
System.out.println("Rebuilding index...");
IndexWriterConfig idxConfig = new IndexWriterConfig(analyzer);
idxConfig.setOpenMode(IndexWriterConfig.OpenMode.CREATE);
IndexWriter idxWriter = new IndexWriter(dir, idxConfig);
// Now, populate the index:
int idx = 0;
JsonParser jParser = new JsonParser();
for (String line : Files.readAllLines(inputFile.toPath(), StandardCharsets.UTF_8)) {
// On large amounts of data, this can take a while
if (idx % 10000 == 0) {
System.out.println(idx);
}
idx++;
// each line of the input file is a serialized JSON object
Map j = jParser.parse(line);
// simple types (strings, numbers, etc.) are handled like so:
String title = (String)j.get("title");
// complex types (lists or dicts) get turned into instances of
// java.util.Map and java.util.List.
String ab = (String)j.get("abstract");
// Look at the docs for TextField to see about other types- Lucene can index numbers, dates, etc.
Field tiField = new Field("title", title, TextField.TYPE_STORED);
// The TYPE_STORED directive tells Lucene to actually store the original token in the index. This is handy
// for all sorts of reasons!
// set up any additional fields here
Document thisDoc = new Document();
thisDoc.add(tiField);
// add our document to the index
idxWriter.addDocument(thisDoc);
}
System.out.println("Done!");
System.out.println(idx + " documents indexed.");
idxWriter.close();
}
do {
// Open up the index for querying:
DirectoryReader reader = DirectoryReader.open(dir);
// Tell me about the index (comment in/out as needed- this may be useful for debugging):
// LeafReader slowC = SlowCompositeReaderWrapper.wrap(reader);
// Terms idxTerms = slowC.terms("title"); // change to a different field as needed
// TermsEnum tEnum = idxTerms.iterator(null);
// System.out.println("Terms in the index for the title field:");
// while (tEnum.next() != null) {
// String s = tEnum.term().utf8ToString();
// System.out.println(s + "\t" + tEnum.docFreq());
// }
// Now search
IndexSearcher searcher = new IndexSearcher(reader);
// Things to note re: QueryParser:
// 1. The first argument is the "default" field to search-
// if nothing else is specified, in the query, this is what
// will be searched.
// 2. You always want to make sure to use the same Analyzer for your
// query as you did when you built the index!
//
// Other query parser classes will behave similarly, but may have different argument ordering.
QueryParser qParser = new QueryParser("title", analyzer);
System.out.print("Query: ");
String queryText = System.console().readLine();
if (queryText.compareTo("") != 0) {
Query q = qParser.parse(queryText);
TopDocs results = searcher.search(q, 10);
System.out.println("Got " + results.totalHits + " hits!");
for (ScoreDoc d : results.scoreDocs) {
System.out.println(d.doc + "\t" + d.score);
Document res = reader.document(d.doc);
System.out.println(res.getField("title").stringValue());
}
}
} while (true); // keep querying until user hits ctrl-C
}
}
This is my code and thiis is my .txt file:
[https://openeclass.uom.gr/modules/document/file.php/DAI148/%CE%94%CE%B9%CE%AC%CE%BB%CE%B5%CE%BE%CE%B7%2006%20-%20%CE%94%CE%B9%CE%B1%CE%B2%CE%B1%CE%B8%CE%BC%CE%B9%CF%83%CE%BC%CE%AD%CE%BD%CE%B7%20%CE%91%CE%BD%CE%AC%CE%BA%CF%84%CE%B7%CF%83%CE%B7%2C%20%CE%9C%CE%BF%CE%BD%CF%84%CE%AD%CE%BB%CE%BF%20%CE%94%CE%B9%CE%B1%CE%BD%CF%85%CF%83%CE%BC%CE%B1%CF%84%CE%B9%CE%BA%CE%BF%CF%8D%20%CE%A7%CF%8E%CF%81%CE%BF%CF%85/Using%20Lucene/data.txt.zip]
Please switch to the latest Jodd JSON v6.
There is probably an issue with the UnsafeUtil.getChars. What you can do is the following:
jParser.parse(line.toCharArray());
i.e. to skip using the UnsafeUtil.getChars().
The new version of Jodd is not using the Unsafe class anymore.
When I call a java UDF inside hive, it returns inconsistent result than the one returned in IDE (sorted in the same order as string provided). The UDF is created to removed duplicates from a string and return the output in same order but in lowercase.
E.g.:
Sony+sony+E2312+xperia+sony => sony+e2312+xperia
while executing inside IDE, it returns the correct value (as above) but when called from Hive console, it returns:
e2312+sony+xperia
UDF code:
package com.javaudf.hive;
import java.util.LinkedHashSet;
import java.util.Set;
import org.apache.commons.lang3.StringUtils;
import org.apache.hadoop.hive.ql.exec.UDF;
import org.apache.hadoop.io.Text;
public class processDeviceModel extends UDF {
private Text result = new Text();
private String delimiters = "[-+\\s*_=|;:]";
public Text evaluate(Text input) {
String dmModel = new String();
if (input == null || input.getLength() == 0)
return null;
else
dmModel = input.toString().trim().toLowerCase();
String[] parts = dmModel.split(delimiters);
Set<String> uniqueparts = new LinkedHashSet<String>();
for(int i = 0; i < parts.length; i++){
uniqueparts.add(parts[i]);
}
String str = StringUtils.join(uniqueparts, '+');
result.set(str);
return result;
}
}
I have test scenario where actual result is fix percentage For example 135.68% but I want to compare actual result with the Expected result which is in range For example 130.00 to 136.00. If actual result fall between the range of 130.00 to 136.00 then test is pass otherwise test is fail.
I am using following Assert statement but no luck:
import java.awt.Robot;
import java.awt.event.KeyEvent;
import java.io.File;
import java.io.FileInputStream;
import java.text.DateFormat;
import java.text.SimpleDateFormat;
import java.util.Date;
import jxl.Sheet;
import jxl.Workbook;
import org.apache.commons.io.FileUtils;
import org.openqa.selenium.By;
import org.openqa.selenium.OutputType;
import org.openqa.selenium.TakesScreenshot;
import org.openqa.selenium.WebElement;
import org.openqa.selenium.support.ui.Select;
import org.testng.Assert;
import org.testng.annotations.Test;
import org.testng.asserts.Assertion;
import org.testng.asserts.SoftAssert;
#Test
public class TC20042 extends BaseClass{
private SoftAssert m_assert = new SoftAssert();
public void registration() throws Exception, InterruptedException {
FileInputStream fi=new FileInputStream("C:\\File\\Book2.xls");
Workbook w=Workbook.getWorkbook(fi);
Sheet s=w.getSheet(2);
try
{
for (int i = 0; i < s.getRows(); i++)
{
//Read data from excel sheet
String s1 = s.getCell(0,i).getContents();
String xPath = "//*[#id='mainForm']/div/div[3]/div/div/table/tbody/tr[1]/td[6]";
String text = driver.findElement(By.xpath(xPath)).getText();
double percentage = Double.parseDouble(text);
assertThat(percentage).isBetween(130.0, 136.0);
private Object assertThat(double percentage) {
// TODO Auto-generated method stub
return null;
}
}
}
catch(Exception e)
{
System.out.println(e);
}
}
}
public static void assertEquals(double actual,
double expected,
double delta,
java.lang.String message)
Asserts that two floats are equal concerning a delta. If they are not, an AssertionError, with the given message, is thrown. If the expected value is infinity then the delta value is ignored.
assertEquals(133.00,expected,3.00,"ERROR")
Simply parse and check:
String expectedPercentage = driver.findElement(By.xpath("//*[#id='app-banner']/div[1]/div/h2")).getText().replace("%", "");
double percentage = Double.parseDouble(expectedPercentage);
// Do this parsing if range is in string format. Otherwise skip this block
String range = "130.00 to 136.00";
String[] numbers = range.replaceAll(" ", "").split("to");
double min = Double.parseDouble(numbers[0]);
double max = Double.parseDouble(numbers[1]);
Assert.assertTrue(expectedPercentage + " is not between the range", min <= percentage && percentage <= max);
I used regular testNg Assert and it worked for me as below:
String xPath = "//*[#id='mainForm']/div/div[3]/div/div/table/tbody/tr[1]/td[6]";
String text = driver.findElement(By.xpath(xPath)).getText();
text = text.replace("%", "");
double percentage = Double.parseDouble(text);
Assert.assertTrue(percentage > 130.67 && percentage < 135.69);
Parse the text and assert the result:
String xPath = "//*[#id='app-banner']/div[1]/div/h2";
String text = driver.findElement(By.xpath(xPath)).getText();
double percentage = Double.parseDouble(text.replace("%", ""));
assertThat(percentage).isBetween(130.0, 136.0);
This solution uses isBetween(Double, Double) from AssertJ as a static import. Alternatively, you can use the TestNG assertion I mentioned in my comment.
I need to perform coreference resolution using Stanford core nlp. i downloaded version 3.3.0.the code i used is shown below.there is an error at getCorefMentions() as it is not found. i dont know wat jar files nee to include to remove this error.
package unlpro;
import edu.stanford.nlp.ling.CoreAnnotations.NamedEntityTagAnnotation;
import edu.stanford.nlp.ling.CoreAnnotations.PartOfSpeechAnnotation;
import edu.stanford.nlp.ling.CoreAnnotations.SentencesAnnotation;
import edu.stanford.nlp.ling.CoreAnnotations.TextAnnotation;
import edu.stanford.nlp.ling.CoreAnnotations.TokensAnnotation;
import edu.stanford.nlp.ling.CoreLabel;
import edu.stanford.nlp.pipeline.Annotation;
import edu.stanford.nlp.pipeline.StanfordCoreNLP;
import edu.stanford.nlp.semgraph.SemanticGraph;
import edu.stanford.nlp.semgraph.SemanticGraphCoreAnnotations.CollapsedCCProcessedDependenciesAnnotation;
import edu.stanford.nlp.trees.Tree;
import edu.stanford.nlp.trees.TreeCoreAnnotations.TreeAnnotation;
import edu.stanford.nlp.util.CoreMap;
import java.io.IOException;
import java.util.*;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import edu.stanford.nlp.dcoref.CorefCoreAnnotations.CorefChainAnnotation;
import edu.stanford.nlp.trees.TreeCoreAnnotations.TreeAnnotation;
import edu.stanford.nlp.dcoref.CorefChain;
import edu.stanford.nlp.dcoref.CorefChain.CorefMention;
import edu.stanford.nlp.dcoref.CorefCluster;
import edu.stanford.nlp.dcoref.Document;
import edu.stanford.nlp.dcoref.Mention;
/**
*
* #author Soundri
*/
public class NewClass {
public static void main(String[] args) throws IOException
{
Properties props = new Properties();
props.put("annotators", "tokenize, ssplit, pos, lemma, ner, parse, dcoref");
; // The path for a file that includes a list of demonyms
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
// read some text in the text variable
// String text ="Ram and Lakshman went to the market he purchased";
String text ="The Revolutionary War occurred during the 1700s and it was the first war in the United States";
// create an empty Annotation just with the given text
Annotation document = new Annotation(text);
// run all Annotators on this text
pipeline.annotate(document);
// these are all the sentences in this document
// a CoreMap is essentially a Map that uses class objects as keys and has values with custom types
List<CoreMap> sentences = document.get(SentencesAnnotation.class);
for(CoreMap sentence: sentences) {
// traversing the words in the current sentence
// a CoreLabel is a CoreMap with additional token-specific methods
for (CoreLabel token: sentence.get(TokensAnnotation.class)) {
// this is the text of the token
String word = token.get(TextAnnotation.class);
// this is the POS tag of the token
String pos = token.get(PartOfSpeechAnnotation.class);
// this is the NER label of the token
String ne = token.get(NamedEntityTagAnnotation.class);
}
// this is the parse tree of the current sentence
Tree tree = sentence.get(TreeAnnotation.class);
// this is the Stanford dependency graph of the current sentence
SemanticGraph dependencies = sentence.get(CollapsedCCProcessedDependenciesAnnotation.class);
System.out.println("Dependencies "+dependencies);
}
// This is the coreference link graph
// Each chain stores a set of mentions that link to each other,
// along with a method for getting the most representative mention
// Both sentence and token offsets start at 1!
Map <Integer, CorefChain> graph = document.get(CorefChainAnnotation.class);
System.out.println("Graph "+graph);
// for(int i=1;i<graph.size();i++)
// {
// System.out.println(graph.get(i));
// }
for(Map.Entry<Integer, CorefChain> entry : graph.entrySet()) {
CorefChain c = entry.getValue();
//this is because it prints out a lot of self references which aren't that useful
if(c.getCorefMentions().size() <= 1)
continue;
CorefMention cm = c.getRepresentativeMention();
String clust = "";
List<CoreLabel> tks = document.get(SentencesAnnotation.class).get(cm.sentNum-1).get(TokensAnnotation.class);
for(int i = cm.startIndex-1; i < cm.endIndex-1; i++)
clust += tks.get(i).get(TextAnnotation.class) + " ";
clust = clust.trim();
System.out.println("representative mention: \"" + clust + "\" is mentioned by:");
for(CorefMention m : c.getCorefMentions()){
String clust2 = "";
tks = document.get(SentencesAnnotation.class).get(m.sentNum-1).get(TokensAnnotation.class);
for(int i = m.startIndex-1; i < m.endIndex-1; i++)
clust2 += tks.get(i).get(TextAnnotation.class) + " ";
clust2 = clust2.trim();
//don't need the self mention
if(clust.equals(clust2))
continue;
System.out.println("\t" + clust2);
//
i included all jar files in that package. i am new to this i dont know if some changes need to be done in code
The problem is that you are calling getCorefMentions() on a variable whose class doesn't declare such a method.
The javadoc for CorefChain is here.
I don't claim to understand either Stanford NLP, or what you are trying to do with it, but one possible fix might be to replace
c.getCorefMentions()
with
c.getCoreMentionMap().values()
I'll leave it to you to decide if that makes sense ...
{"smscresponse":{"calluid":"3333","to":"0000","event":"ABC"}}
I am using
split("{")[1] to get "calluid":"3333","to":"0000","event":"ABC"
But i am getting
Illegal repetition
{ error.
What i want is calluid .How i can get that one.
Thanks in advance...
You could escape the { character, something like...
String text = "{\"smscresponse\":
{\"calluid\":\"3333\",\"to\":\"0000\",\"event\":\"ABC\"}}";
String[] split = text.split("\\{");
System.out.println(split.length);
System.out.println(split[2]);
Which outputs...
3
"calluid":"3333","to":"0000","event":"ABC"}}
To get "3333", you could do something like...
split = split[2].split(":|,"); // Split on : or ,
System.out.println(split[1]);
Which outputs
"3333"
Now, if you really wanted to be clever, you could try something like...
String[] split = text.split("\\{|:|,|\\}");
for (String part : split) {
System.out.println(part);
}
Which outputs
// Note, this is an empty line
"smscresponse"
// Note, this is an empty line
"calluid"
"3333"
"to"
"0000"
"event"
"ABC"
Updated...
A slightly better solution might be...
Pattern p = Pattern.compile("\"([^\"]*)\"");
Matcher m = p.matcher(text);
while (m.find()) {
System.out.println(m.group());
}
Which outputs
"smscresponse"
"calluid"
"3333"
"to"
"0000"
"event"
"ABC"
Try to split using input.split("[{]");
String abc = "{\"smscresponse\":{\"calluid\":\"3333\",\"to\":\"0000\",\"event\":\"ABC\"}}";
String[] splittedValue = abc.split("[{]");
for(String value : splittedValue)
System.out.println(""+value);
String s = "{\"smscresponse\":{\"calluid\":\"3333\",\"to\":\"0000\",\"event\":\"ABC\"}}";
System.out.println(s.split("\\{")[2].split("}")[0]);
Don't worry about "\". This will work for your dynamically generated data.
EDIT : This will get you "calluid"
System.out.println(s.split("\\{")[2].split("}")[0].split(",")[0]);
Create a JSON object of the given string and parse the JSON object to fetch the value. Use the library org.json
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;
import java.util.Iterator;
import org.json.simple.JSONArray;
import org.json.simple.JSONObject;
import org.json.simple.parser.JSONParser;
import org.json.simple.parser.ParseException;
public class JsonSimpleExample {
public static void main(String[] args) {
JSONParser parser = new JSONParser();
try {
JSONObject jsonObj = new JSONObject("{\"smscresponse\":{\"calluid\":\"3333\",\"to\":\"0000\",\"event\":\"ABC\"}}");
String calluid = (String) jsonObject.get("smscresponse").getString("calluid");
System.out.println(calluid);
} catch (ParseException e) {
e.printStackTrace();
}
}
}