i am using PorterStemmer in java to get the base form of a verb, But i found a problem with the verbs "goes" and "gambles". Instead of stemming it to "go" and "gamble", it stems them to "goe" and "gambl". Is there a better tool that can handle verbs that ends with -es and -ed to retrieve the base form of a verb? P.S JAWS with wordnet java does that too.
Here is my code:
public class verb
{
public static void main(String[] args)
{
PorterStemmer ps = new PorterStemmer();
ps.setCurrent("gambles");
ps.stem();
System.out.println(ps.getCurrent());
}
}
Here is the output in console:
gambl
Take a few minutes to read this tutorial of Stanford NLP group
https://nlp.stanford.edu/IR-book/html/htmledition/stemming-and-lemmatization-1.html
You can find the stemmer actually is not working as what you may think. It is crude so it not always gives you a complete base form of verbs with the ending chopped off. In your case, since you are caring about getting a complete base form of a verb, lemmatization seems better for you.
Related
I am trying to add rules in my Oracle dictionary through programming in ADF and JDeveloper:
Rule rule = ruleset.getRuleTable().add();
rule.setName(aliasRule);
rule.setAlias(aliasRule);
rule.setPriority(property);
rule.setAdvancedMode(true);
rule.setDescription(description);
return rule;
then:
diccionaryRules.validate(exceptions, warnings);
I have three warnings with the same message:
RUL-05717: The identifier "Header.Teachers.Courses" is not valid here.
Where in my Oracle.rules file I have three viewobjects connected by links through private key ids:
HeaderVVO
TeachersVVO
CoursesVVO
And the route is correct: Header.Teachers.Courses.
I created an expression from the follwoing path:
Header.Teachers by:
Expression ePath = simpleTest.getExpressionTable().get(0);
ePath.setValue("Header.Teachers");
// Here comes some validation
List<SDKWarning> warnings = new ArrayList<SDKWarning>();
List<SDKException> exceptions = new ArrayList<SDKException>();
ePath.validate(exceptions, warnings);
it doesn't give warnings, but this:
ePath.setValue("Header.Teachers.Courses");
gives the above warning.
I don't know why I get these warnings.
You should presume that most of the people trying to answer this question (myself included) while having a good understanding on ADF, don't know much about Oracle Rules.
That being said, this looks like a problem on Rules side, rather than on ADF. As I see you are using view objects, you can probably test this integration logic from Business Components Tester and you can inject your Rules logic through application modules custom methods.
Bottom line, you are building a Rules client from java, this is not directly related to ADF. If you can make your client work from a java main(String[] args) method, it will work from ADF too.
I have couple of tweets which needs to be processed. I am trying to find occurrences of messages where it mean some harm to a person. How do I go about achieving this via NLP
I bought my son a toy gun
I shot my neighbor with a gun
I don't like this gun
I would love to own this gun
This gun is a very good buy
Feel like shooting myself with a gun
In the above sentences, the 2nd, 6th one is what I would like to find.
If the problem is restricted only to guns and shooting, then you could use a dependency parser (like the Stanford Parser) to find verbs and their (prepositional) objects, starting with the verb and tracing its dependants in the parse tree. For example, in both 2 and 6 these would be "shoot, with, gun".
Then you can use a list of (near) synonyms for "shoot" ("kill", "murder", "wound", etc) and "gun" ("weapon", "rifle", etc) to check if they occur in this pattern (verb - preposition - noun) in each sentence.
There will be other ways to express the same idea, e.g. "I bought a gun to shoot my neighbor", where the dependency relation is different, and you'd need to detect these types of dependencies too.
All of vpekar's suggestions are good. Here is some python code that will at least parse the sentences and see if they contain verbs in a user defined set of harm words. Note: most 'harm words' probably have multiple senses, many of which could have nothing to do with harm. This approach does not attempt to disambiguate word sense.
(This code assumes you have NLTK and Stanford CoreNLP)
import os
import subprocess
from xml.dom import minidom
from nltk.corpus import wordnet as wn
def StanfordCoreNLP_Plain(inFile):
#Create the startup info so the java program runs in the background (for windows computers)
startupinfo = None
if os.name == 'nt':
startupinfo = subprocess.STARTUPINFO()
startupinfo.dwFlags |= subprocess.STARTF_USESHOWWINDOW
#Execute the stanford parser from the command line
cmd = ['java', '-Xmx1g','-cp', 'stanford-corenlp-1.3.5.jar;stanford-corenlp-1.3.5-models.jar;xom.jar;joda-time.jar', 'edu.stanford.nlp.pipeline.StanfordCoreNLP', '-annotators', 'tokenize,ssplit,pos', '-file', inFile]
output = subprocess.Popen(cmd, stdout=subprocess.PIPE, startupinfo=startupinfo).communicate()
outFile = file(inFile[(str(inFile).rfind('\\'))+1:] + '.xml')
xmldoc = minidom.parse(outFile)
itemlist = xmldoc.getElementsByTagName('sentence')
Document = []
#Get the data out of the xml document and into python lists
for item in itemlist:
SentNum = item.getAttribute('id')
sentList = []
tokens = item.getElementsByTagName('token')
for d in tokens:
word = d.getElementsByTagName('word')[0].firstChild.data
pos = d.getElementsByTagName('POS')[0].firstChild.data
sentList.append([str(pos.strip()), str(word.strip())])
Document.append(sentList)
return Document
def FindHarmSentence(Document):
#Loop through sentences in the document. Look for verbs in the Harm Words Set.
VerbTags = ['VBN', 'VB', 'VBZ', 'VBD', 'VBG', 'VBP', 'V']
HarmWords = ("shoot", "kill")
ReturnSentences = []
for Sentence in Document:
for word in Sentence:
if word[0] in VerbTags:
try:
wordRoot = wn.morphy(word[1],wn.VERB)
if wordRoot in HarmWords:
print "This message could indicate harm:" , str(Sentence)
ReturnSentences.append(Sentence)
except: pass
return ReturnSentences
#Assuming your input is a string, we need to put the strings in some file.
Sentences = "I bought my son a toy gun. I shot my neighbor with a gun. I don't like this gun. I would love to own this gun. This gun is a very good buy. Feel like shooting myself with a gun."
ProcessFile = "ProcFile.txt"
OpenProcessFile = open(ProcessFile, 'w')
OpenProcessFile.write(Sentences)
OpenProcessFile.close()
#Sentence split, tokenize, and part of speech tag the data using Stanford Core NLP
Document = StanfordCoreNLP_Plain(ProcessFile)
#Find sentences in the document with harm words
HarmSentences = FindHarmSentence(Document)
This outputs the following:
This message could indicate harm: [['PRP', 'I'], ['VBD', 'shot'], ['PRP$', 'my'], ['NN', 'neighbor'], ['IN', 'with'], ['DT', 'a'], ['NN', 'gun'], ['.', '.']]
This message could indicate harm: [['NNP', 'Feel'], ['IN', 'like'], ['VBG', 'shooting'], ['PRP', 'myself'], ['IN', 'with'], ['DT', 'a'], ['NN', 'gun'], ['.', '.']]
I would have a look at SenticNet
http://sentic.net/sentics
It provides an open source knowledge base and parser that assigns emotional value to text fragments. Using the library, you could train it to recognize statements that you're interested in.
Problem:
I have a servlet that generate reports, more specifically the table body of a report. It is a black box, we do not have access to the source code.
Nevertheless, its working satisfactory, and the servlet is not planned to be rewritten or replaced anytime soon.
We need to modify its response text in order to update a few links it generates to other reports, I was thinking of doing it with a filter that would find the anchor text and replace it using a regex.
Research:
I ran into this question that has a regex filter. It should be what I need, but then maybe not.
I am not trying to parse HTML in the strict sense of the parsing term, and I am not working with the full spec of the language. What I have is a subset of HTML tags that compose a table body, and does not have nested tables, so the HTML subset generated by the servlet is not recursive.
I just need to find / replace the anchors targets and add an attribute to the tag.
So the question is:
I need to modify the output of a servlet in order to change all links of the kind:
<a href="http://mypage.com/servlets/reports/?a=report&id=MyReport&filters=abcdefg">
into links like:
<a href="http://myOtherPage.com/webReports/report.xhtml?id=MyReport&filters=abcdefg" target="_parent">
Should I use the regex filter written by # Jeremy Stein or is there a better solution?
Assuming that the only part of the target A tags which vary is the query component of the href attribute, then this tested regex solution should do a pretty good job:
// TEST.java 20121024_0800
import java.util.regex.*;
public class TEST {
public static String fixReportAnchorElements(String text) {
Pattern re_report_anchor = Pattern.compile(
"<a href=\"http://mypage\\.com/servlets/reports/\\?a=report&id=([^\"]+)\">",
Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CASE);
Matcher m = re_report_anchor.matcher(text);
return m.replaceAll(
"<a href=\"http://myOtherPage.com/webReports/report.xhtml?id=$1\" target=\"_parent\">"
);
}
public static void main(String[] args) {
String input =
"test <a href=\"http://mypage.com/servlets/reports/?a=report&id=MyReport&filters=abcdefg\"> test";
String output = fixReportAnchorElements(input);
System.out.println(output);
}
}
I used Jeremy Stein (click to go to question) classes, with a few changes:
a) Make sure nobody down the filter chain or the servlet DO NOT call getOutputStream() on the wrapper object, or it will throw an invalidStateException (check this answer by BalusC on the subject).
b) I wanted to make a single change on the page, so I did not put any filterConfig on the web.xml.
b.2) I also did not put anything on the web.xml at all. Used the javax.servlet.annotation.WebFilter on the class itself.
c) I set the Pattern and replace strings directly on the class:
Pattern searchPattern = Pattern.compile("<a (.*?) href=\".*?id=(.*?)[&|&]filtros=(.*?)\" (.*?)>(.*?)</a>");
String replaceString = "<a $1 href=\"/webReports/report.xhtml?idRel=$2&filtros=$3\" target=\"_parent\" $4>$5</a>";
note the .*? to have as little as possible matched, to avoid matching more than wanted.
For testing the matching and the regex, I used this applet I found while researching the subject.
Hope this helps anyone with the same problem.
I've got a program that currently has a mass of code that I would like to design away. This code takes a number of text files and passes it through an interestingly written interpreter to produce a plain text file report that goes on to other systems. In theory this allows a non-programmer to be able to modify the report without having to understand the inner workings of Java and the interpreter. In practice, any minor change likely necessitates going into the interpreter and tweaking it (and the domain specific language isn't exactly friendly even to other programmers).
I would love to redesign this code. As a primarily web programmer the first thing that came to mind when thinking of "non-programmer being able to modify the report ..." I replaced report with web page and said to myself "ah ha! Jsp." This would give me a nice What You See Is Almost What You Get approach for people along with taglibs and java scriptlets (as undesirable as the later may be) rather than awkwardly written DSL statements.
While it is possible to use jspc to compile a jsp into java (another part of the application runs ejbs on a jboss server so jspc isn't too far away), the boilerplate code that it uses tries to hook up the output to the pagecontext from the servletcontext. It would involve tricking the code into thinking it was running inside a web container (not an impossibility, but a kluge) and then removing the headers.
Is there a different templateing approach (or library) for java that could be used to print to a text file? Every one that I've looked at so far appears to either be optimized for web or tightly coupled to a particular application server (and designed for web work).
So you need a slim down version of JSP.
See if this one (JSTP) works for you
http://jstp.sourceforge.net/manual.html
Give Apache Velocity a try. It is incredibly simple and does not assume it is running in the context of a web application.
This is totally subjective, but I would argue it's syntax is easier for a non-programmer to understand than JSP and tag libraries.
If you want to be a real tread setter in your company, you could create a Grails application to do it and use Groovy templating (maybe in combination with the Quartz plugin for scheduling), it might be a bit of a hard sell if there is alot of existing code to be replaced but I love it...
http://groovy.codehaus.org/Groovy+Templates
If you want the safe bet, then (the also excellent) Velocity has to be it:
http://velocity.apache.org/
Probably you want to check Rythm template engine, with good performance (2 to 3 times faster than velocity) and elegant syntax (.net Razor like) and designed specifically to Java programmer.
Template, generate a string of user names separated by "," from a list of users
#args List<User> users
#for (User user: users) {
#user.getName() #user_sep
}
Template: if-else demo
#args User user
#if (user.isAdmin()) {
<div id="admin-panel">...</div>
} else {
<div id="user-panel">...</div>
}
Invoke template using template file
// pass render args by name
Map<String, Object> renderArgs = ...
String s = Rythm.render("/path/to/my/template.txt", renderArgs);
// or pass render arguments by position
String s = Rythm.render("/path/to/my/template.txt", "arg1", 2, true, ...);
Invoke template using inline text
User user = ...;
String s = Rythm.render("#args User user;Hello #user.getName()", user);
Invoke template with String interpolation mode
User user = ...;
String s = Rythm.render("Hello #name", user.getName());
ToString mode
public class Address {
public String unitNo;
public String streetNo;
...
public String toString() {
return Rythm.toString("#_.unitNo #_.streetNo #_.street, #_.suburb, #_.state, #_.postCode", this);
}
}
Auto ToString mode (follow apache commons lang's reflectionToStringBuilder, but faster than it)
public class Address {
public String unitNo;
public String streetNo;
...
public String toString() {
return Rythm.toString(this);
}
}
Document could be found at http://www.playframework.org/modules/rythm. Full demo app running on GAE: http://play-rythm-demo.appspot.com.
Note, the demo and doc are created for play-rythm plugin for Play!Framework, but most of the content also apply to the pure rythm template engine.
Source code:
Rythm template engine: https://github.com/greenlaw110/rythm/
Play Rythm Plugin: https://github.com/greenlaw110/play-rythm
well... i have a file containing tintin-script. Now i already managed to grab all actions and substitutions from it to show them properly ordered on a website using Ruby, which helps me to keep an overview.
Example TINTIN-script
#substitution {You tell {([a-zA-Z,\-\ ]*)}, %*$}
{<279>[<269> $sysdate[1]<279>, <269>$systime<279> |<219> Tell <279>] <269>to <219>%2<279> : <219>%3}
{4}
#substitution {{([a-zA-Z,\-\ ]*)} tells you, %*$}
{<279>[<269> $sysdate[1]<279>, <269>$systime<279> |<119> Tell <279>] <269>from <119>%2<279> : <119>%3}
{2}
#action {Your muscles suddenly relax, and your nimbleness is gone.}
{
#if {$sw_keepaon}
{
aon;
};
} {5}
#action {xxxxx}
{
#if {$sw_keepfamiliar}
{
familiar $familiar;
};
} {5}
To grab them in my Ruby-App i read my script-file into a varibable 'input' and then use the following pattern to scan the 'input'
pattern = /(?<braces>{([^{}]|\g<braces>)*}){0}^#(?<type>action|substitution)\s*(?<b1>\g<braces>)\s*(?<b2>\g<braces>)\s*(?<b3>\g<braces>)/im
input = ""
File.open("/home/igambin/lmud/lmud.tt") { |file| input = file.read }
input.scan(pattern) { |prio, type, pattern, code|
## here i usually create objects, but for simplicity only output now
puts "Type : #{type}"
puts "Pattern : #{pattern}"
puts "Priority: #{prio}"
puts "Code :\n#{code}"
puts
}
Now my idea was to use the netbeans platform to write a module to not only keep an overview but also to assist editing the tintin script file. So opening the file in an Editor-Window I still need to parse the tintin-file and have all 'actions' and 'substitutions' from the file grabbed and displayed in an eTable, in wich I could dbl-click on one item to open a modification-window.
I've setup the module and got everything ready so far, i just can't figure out how to translate the ruby-regex pattern i've written to a working java-regex-pattern. It seems named-group-capturing and especially the recursive application of these groups is not supported in Java. Without that I seem to be unable to find a working solution...
Here's the ruby pattern again...
pattern = /(?<braces>{([^{}]|\g<braces>)*}){0}^#(?<type>action|substitution)\s*(?<b1>\g<braces>)\s*(?<b2>\g<braces>)\s*(?<b3>\g<braces>)/im
Can anyone help me to create a java pattern that matches the same?
Many thanks in advance for tips/hints/ideas and especially for solutions or (close-to-solution comments)!
Your text format seems pretty simple; it's possible you don't really need recursive matching. This Java-compatible regex matches your sample data correctly, as far as I can tell:
(?s)#(substitution|action)\s*\{(.*?)\}\s*\{(.*?)\}\s*\{(\d+)\}
Would that work for you? If you run Java 7, you can even name the groups. ;)
Can anyone help me to create a java pattern that matches the same?
No, no one can: Java's regex engine does not support recursive patterns (as Ruby 1.9 does).