OpenNLP categorizer Version 1.8 - java

Im trying to build a categorizer in version 1.8 of openNLP but with the code below I keep getting a NullPointerException. What am I doing wrong?
public class test
{
public static void main(String[] args) throws IOException
{
InputStream is = new FileInputStream("D:/training.txt");
DoccatModel m = new DoccatModel(is);
Tokenizer tokenizer = WhitespaceTokenizer.INSTANCE;
String tweet = "testing sentence";
String[] tokens = tokenizer.tokenize(tweet);
DocumentCategorizerME myCategorizer = new DocumentCategorizerME(m);
double[] outcomes = myCategorizer.categorize(tokens);
String category = myCategorizer.getBestCategory(outcomes);
}
}

You should have a look at following tutorial. They are useing OpenNLP version 1.7.2. This may be a more recent example to work with.
https://www.tutorialkart.com/opennlp/training-of-document-categorizer-using-naive-bayes-algorithm-in-opennlp/
Hope it helps.

Related

Null pointer exception when trying to access method (java) [duplicate]

This question already has answers here:
What is a NullPointerException, and how do I fix it?
(12 answers)
Closed 4 years ago.
I am developing some code in Java but facing a problem with a NPE.
It happens when I try to use a method in another class.
Here is my code :
public class MyClass{
private Attributes attr = null;
public static void main(String [ ] args) throws ParseException, SAXException, IOException, ParserConfigurationException {
MyClass main = new MyClass();
Options options = main.parseCommandLine();
CommandLineParser parser = new BasicParser();
CommandLine cl = parser.parse(options, args);
main.configureAttributes(main);
}
private void configureAttributes(MyClass main, CommandLine cl) throws ParserConfigurationException, SAXException, IOException {
String[] attributes = cl.getOptionValues("a");
Integer tag = null;
for(int i = 0; i < attributes.length ; i++) {
String[] parts = attributes[i].split("=");
String tagName = parts[0];
String value = parts[1];
tag = Integer.parseInt(tagName,16);
this.attr.setString(tag, DICT.vrOf(tag), value);
}
}
}
But I am getting an NPE on the last line : this.attr.setString(tag, DICT.vrOf(tag), value);
I have all the right imports. It is a Maven project so I also added the dependency to my pom.xml.
Thanks a lot for your help !
V.
Looks like you forgot to initialize your attr field. According to your source here is only one assignment to this field: private Attributes attr = null;. So, I think, you need to initialize your field somewhere in your code.
And where are you initializing Attributes attr?
You need to something like Attributes attr = new Attributes(); because right know your variable does contain nothing - null

Implementing save/open with RichTextFX?

Here is my code:
private void save(File file) {
StyledDocument<ParStyle, Either<StyledText<TextStyle>, LinkedImage<TextStyle>>, TextStyle> doc = textarea.getDocument();
// Use the Codec to save the document in a binary format
textarea.getStyleCodecs().ifPresent(codecs -> {
Codec<StyledDocument<ParStyle, Either<StyledText<TextStyle>, LinkedImage<TextStyle>>, TextStyle>> codec
= ReadOnlyStyledDocument.codec(codecs._1, codecs._2, textarea.getSegOps());
try {
FileOutputStream fos = new FileOutputStream(file);
DataOutputStream dos = new DataOutputStream(fos);
codec.encode(dos, doc);
fos.close();
} catch (IOException fnfe) {
fnfe.printStackTrace();
}
});
}
I am trying to implement the save/loading from the demo from here on the RichTextFX GitHub.
I am getting errors in the following lines:
StyledDocument<ParStyle, Either<StyledText<TextStyle>, LinkedImage<TextStyle>>, TextStyle> doc = textarea.getDocument();
error: incompatible types:
StyledDocument<Collection<String>,StyledText<Collection<String>>,Collection<String>>
cannot be converted to
StyledDocument<ParStyle,Either<StyledText<TextStyle>,LinkedImage<TextStyle>>,TextStyle>
and
= ReadOnlyStyledDocument.codec(codecs._1, codecs._2, textarea.getSegOps());
error: incompatible types: inferred type does not conform to equality
constraint(s) inferred: ParStyle
equality constraints(s): ParStyle,Collection<String>
I have added all the required .java files and imported them into my main code. I thought it would be relatively trivial to implement this demo but it has been nothing but headaches.
If this cannot be resolved, does anyone know an alternative way to save the text with formatting from RichTextFX?
Thank you
This question is quite old, but since i ran into the same problem i figured a solution might be useful to others as well.
In the demo, the code from which you use, ParStyle and TextStyle (Custom Types) are used for defining how information about the style is stored.
The error messages you get pretty much just tell you that your way of storing the information about the style (In your case in a String) is not compatible with the way it is done in the demo.
If you want to store the style in a String, which i did as well, you need to implement some way of serializing and deserializing the information yourself.
You can do that, for example (I used an InlineCssTextArea), in the following way:
public class SerializeManager {
public static final String PAR_REGEX = "#!par!#";
public static final String PAR_CONTENT_REGEX = "#!pcr!#";
public static final String SEG_REGEX = "#!seg!#";
public static final String SEG_CONTENT_REGEX = "#!scr!#";
public static String serialized(InlineCssTextArea textArea) {
StringBuilder builder = new StringBuilder();
textArea.getDocument().getParagraphs().forEach(par -> {
builder.append(par.getParagraphStyle());
builder.append(PAR_CONTENT_REGEX);
par.getStyledSegments().forEach(seg -> builder
.append(
seg.getSegment()
.replaceAll(PAR_REGEX, "")
.replaceAll(PAR_CONTENT_REGEX, "")
.replaceAll(SEG_REGEX, "")
.replaceAll(SEG_CONTENT_REGEX, "")
)
.append(SEG_CONTENT_REGEX)
.append(seg.getStyle())
.append(SEG_REGEX)
);
builder.append(PAR_REGEX);
});
String textAreaSerialized = builder.toString();
return textAreaSerialized;
}
public static InlineCssTextArea fromSerialized(String string) {
InlineCssTextArea textArea = new InlineCssTextArea();
ReadOnlyStyledDocumentBuilder<String, String, String> builder = new ReadOnlyStyledDocumentBuilder<>(
SegmentOps.styledTextOps(),
""
);
if (string.contains(PAR_REGEX)) {
String[] parsSerialized = string.split(PAR_REGEX);
for (int i = 0; i < parsSerialized.length; i++) {
String par = parsSerialized[i];
String[] parContent = par.split(PAR_CONTENT_REGEX);
String parStyle = parContent[0];
List<String> segments = new ArrayList<>();
StyleSpansBuilder<String> spansBuilder = new StyleSpansBuilder<>();
String styleSegments = parContent[1];
Arrays.stream(styleSegments.split(SEG_REGEX)).forEach(seg -> {
String[] segContent = seg.split(SEG_CONTENT_REGEX);
segments.add(segContent[0]);
if (segContent.length > 1) {
spansBuilder.add(segContent[1], segContent[0].length());
} else {
spansBuilder.add("", segContent[0].length());
}
});
StyleSpans<String> spans = spansBuilder.create();
builder.addParagraph(segments, spans, parStyle);
}
textArea.append(builder.build());
}
return textArea;
}
}
You can then take the serialized InlineCssTextArea, write the resulting String to a file, and load and deserialize it.
As you can see in the code, i made up some Strings as regexes which will be removed in the serialization process (We don't want our Serializer to be injectable, do we ;)).
You can change these to whatever you like, just note they will be removed if used in the text of the TextArea, so they should be something users wont miss in their TextArea.
Also note that this solution serializes the Style of the Text, the Text itself and the Paragraph style, BUT not inserted images or parameters of the TextArea (such as width and height), just the text content of the TextArea with its Style.
This issue on github really helped me btw.

Antlr4 grammar requires me to use setInterpreter

When i set up a grammar with antlr4, and generated it i see the following line throughout the parser
_errHandler.sync(this);
Which in turn, does
getInterpreter()
and then calls methods on it. By default this returns null, and thus parsing throws NPEs.
I glomed together something that gets around this
myparser.setInterpreter(new ParserATNSimulator(myparser, myparser.getATN(), mylexer.getInterpreter().decisionToDFA,
new PredictionContextCache()));
But I'm certain that is wrong. The odd thing is I don't see any examples address this requirement, so I'm wondering what i have done wrong that this even needs to be done.
Interesting TestRig works fine, w/o the setInterpreter line, here's what i'm doing:
PelLexer pl = new PelLexer(CharStreams.fromString(s));
CommonTokenStream tokens = new CommonTokenStream(pl);
SecureRandom r = new SecureRandom();
String clsName = Parser.class.getPackage().getName() + ".eval.Eval" + Math.abs(r.nextLong());
PelParser pp = new PelParser(tokens, clsName);
pp.setBuildParseTree(false);
// pp.setInterpreter(new ParserATNSimulator(pp, pp.getATN(), pl.getInterpreter().decisionToDFA, new PredictionContextCache()));
pp.addErrorListener(new PELErrorListener());
pp.blockStatements();
byte[] clzData = pp.getClassBytes();
PELClassLoader pcl = AccessController.doPrivileged(new PrivilegedAction<PELClassLoader>() {
#Override
public PELClassLoader run() {
return new PELClassLoader(Thread.currentThread().getContextClassLoader());
}
});
pcl.addClass(clsName, clzData);
Class<Evaluable> c = (Class<Evaluable>) pcl.loadClass(clsName);
return c.newInstance();
Here's the answer.
When you add a constructor to your parser, you DON'T want to call
super(tokens);
You want to call
this(tokens);
As the default constructor created in your parser does
public PelParser(TokenStream input) {
super(input);
_interp = new ParserATNSimulator(this,_ATN,_decisionToDFA,_sharedContextCache);
}

Cannot access java.lang.String in script run in Rhino

I have problems accessing Java classes in JavaScript. Calling a code snippet
var String = Java.type("java.lang.String");
from Java via javax.script.ScriptEngine, yields follwing Error
Exception in thread "main" javax.script.ScriptException: sun.org.mozilla.javascript.internal.EcmaError: ReferenceError: "Java" is not defined. (path/to/string.js#1) in path/to/string.js at line number 1
at com.sun.script.javascript.RhinoScriptEngine.eval(RhinoScriptEngine.java:156)
at main.JsTest.main(JsTest.java:55)
Using non-Java classes in the script works fine, e.g. var value = a + b, where a and b are defined in a javax.script.ScriptContext.
This is the Java class that executes the script.
JsTest.java
public class JsTest
{
public static void main(String[] args) throws Exception
{
ScriptEngineManager sem = new ScriptEngineManager();
ScriptEngine se = sem.getEngineByExtension("js");
String script = "path/to/string.js";
File scriptFile = new File(script);
FileReader fr = new FileReader(scriptFile);
se.put(ScriptEngine.FILENAME, script);
ScriptContext sc = new SimpleScriptContext();
se.eval(fr, sc);
}
}
I have no idea where your Java.type is coming from, but the official documentation uses Packages.java or just java.
So your line should probably look like
var String = Packages.java.lang.String;

How to use Apache Commons CLI to parse the property file and --help option?

I have a property file which is like this -
hostName=machineA.domain.host.com
emailFrom=tester#host.com
emailTo=world#host.com
emailCc=hello#host.com
And now I am reading the above property file from my Java program as shown below. I am parsing the above property file manual way as of now -
public class FileReaderTask {
private static String hostName;
private static String emailFrom;
private static String emailTo;
private static String emailCc;
private static final String configFileName = "config.properties";
private static final Properties prop = new Properties();
public static void main(String[] args) {
readConfig(arguments);
// use the above variables here
System.out.println(hostName);
System.out.println(emailFrom);
System.out.println(emailTo);
System.out.println(emailCc);
}
private static void readConfig(String[] args) throws FileNotFoundException, IOException {
if (!TestUtils.isEmpty(args) && args.length != 0) {
prop.load(new FileInputStream(args[0]));
} else {
prop.load(FileReaderTask.class.getClassLoader().getResourceAsStream(configFileName));
}
StringBuilder sb = new StringBuilder();
for (String arg : args) {
sb.append(arg).append("\n");
}
String commandlineProperties = sb.toString();
if (!commandlineProperties.isEmpty()) {
// read, and overwrite, properties from the commandline...
prop.load(new StringReader(commandlineProperties));
}
hostName = prop.getProperty("hostName").trim();
emailFrom = prop.getProperty("emailFrom").trim();
emailTo = prop.getProperty("emailTo").trim();
emailCc = prop.getProperty("emailCc").trim();
}
}
Most of the time, I will be running my above program through command line as a runnable jar like this -
java -jar abc.jar config.properties
java -jar abc.jar config.properties hostName=machineB.domain.host.com
My question is-
Is there any way to add --help option while running the abc.jar that can tell us more about how to run the jar file and what does each property means and how to use them? I have seen --help while running most of the C++ executable or Unix stuff so not sure how we can do the same thing in Java?
Do I need to use CommandLine parser like Commons CLI for this in Java to achieve this and instead of doing manual parsing, I should use Commons CLI to parse the file as well? If yes, then can anyone provide an example how would I do that in my scenario?
In the long run if you plan to add other options in the future then commons-cli is surely a fairly good fit as it makes it easy to add new options and manual parsing quickly becomes complicated.
Take a look at the official examples, they provide a good overview of what the library can do.
Your specific case would probably lead to something like the following:
// create Options object
Options options = new Options();
Option help = new Option( "h", "help", false, "print this message" );
options.addOption(help);
CommandLineParser parser = new PosixParser();
CommandLine cmd = parser.parse( options, args);
if(cmd.hasOption("help") || cmd.getArgList().isEmpty()) {
// automatically generate the help statement
HelpFormatter formatter = new HelpFormatter();
formatter.printHelp( "cli-test [options] <property-file>", options );
return;
}
// do your thing...
System.out.println("Had properties " + cmd.getArgList());

Categories