Mallet HierarchicalLDATUI throws NullPointerException for certain files

Mallet HierarchicalLDATUI throws NullPointerException for certain files - java

In the past few days, I have started using Mallet. I am specifically interested in running a hierarchical topic model, like HLDA or HPAM. When importing the sample data files and running them using the cc.mallet.topics.tui.HierarchicalLDATUI class, I get results, no problems.
When running the same on the Wikipedia article on WW2, after importing I get the following error:
$ bin/mallet run cc.mallet.topics.tui.HierarchicalLDATUI --input ww2.mallet
Exception in thread "main" java.lang.NullPointerException
at cc.mallet.topics.HierarchicalLDA$NCRPNode.dropPath(HierarchicalLDA.java:637)
at cc.mallet.topics.HierarchicalLDA.samplePath(HierarchicalLDA.java:164)
at cc.mallet.topics.HierarchicalLDA.estimate(HierarchicalLDA.java:133)
at cc.mallet.topics.tui.HierarchicalLDATUI.main(HierarchicalLDATUI.java:109)
I imported the data like this:
$ bin/mallet import-dir --input ww2Wiki --output ww2.mallet --keep-sequence TRUE --skip-html TRUE --remove-stopwords TRUE
To make your lives easier, here's the code at which the error occurs in HierarchicalLDA.java (lines 627-640)
public void dropPath() {
NCRPNode node = this;
node.customers--;
if (node.customers == 0) {
node.parent.remove(node);
}
for (int l = 1; l < numLevels; l++) {
node = node.parent;
node.customers--;
if (node.customers == 0) {
node.parent.remove(node); //line 637 (producing the error)
}
}
}
Seemingly, the error occurs when, during the NCRP implementation, it tries to remove a node, which is null. I do not know why this happens with certain files but not with others.
I checked if it might be a general problem related to the file running the same file on cc.mallet.topics.HierarchicalPAM and with that the file works and HPAM produces reasonable results. Other files work in the HLDA implementation, so I do not think it is the code itself.
At this point I am clueless what to do. Did anyone encounter and solve this problem before?
Thanks!
PS: I feel like I have to point this out for the Java community. This is not my code, it is an open source software, which I compiled on my computer. I am missing both time and overview to read through the whole code to track down the error.

It took a while but I found the answer to the problem and it seems too simple.
HLDATUI considers files as documents, which means if there is only one file there are not enough documents and the program crashes. That means one has to import more than one file.
The solution to my personal situation is that I will write a program, which will split the .xml file I want to run HLDATUI on into multiple smaller files, which then can be imported and analyzed.

Related

ProcessBuilder/Runtime.exec() with Weka Command Line Demonstrating Peculiar Behavior

Below is basically an MCVE of my full problem, which is much messier. What you need to know is that the following line runs when directly put in terminal:
java -classpath /path/to/weka.jar weka.filters.MultiFilter \
-F "weka.filters.unsupervised.attribute.ClusterMembership -I first" \
-i /path/to/in.arff
This is relatively straightforward. Basically, all I am doing is trying to cluster the data from in.arff using all of the default settings for the ClusterMembership filter, but I want to ignore the first attribute. I have the MultiFilter there because in my actual project, there are other filters, so I need this to stay. Like previously mentioned, this works fine. However, when I try to run the same line with ProcessBuilder, I get a "quote parse error", and it seems like the whole structure of nesting quotes breaks down. One way of demonstrating this is trying to get the following to work:
List<String> args = new ArrayList<String>();
args.add("java");
args.add("-cp");
args.add("/path/to/weka.jar");
args.add("weka.filters.MultiFilter");
args.add("-F");
args.add("\"weka.filters.unsupervised.attribute.ClusterMembership");
args.add("-I");
args.add("first\"");
args.add("-i");
args.add("/path/to/in.arff");
ProcessBuilder pb = new ProcessBuiler(args);
// ... Run the process below
At first glance, you might think this is identical to the above line (that's certainly what my naive self thought). In fact, if I just print args out with spaces in between each one, the resulting strings are identical and run perfectly if directly copy and pasted to the terminal. However, for whatever reason, the program won't work as I got the message (from Weka) Quote parse error. I tried googling and found this question about how ProcessBuilder adds extra quotes to the command line (this led me to try numerous combinations of escape sequences, all of which did not work), and read this article about how ProcessBuilder/Runtime.exec() work (I tried both ProcessBuilder and Runtime.exec(), and ultimately the same problem persisted), but couldn't find anything relevant to what I needed. Weka already had bad documentation, and then their Wikispace page went down a couple weeks ago due to Wikispaces shutting down, so I have found very little info on the Weka side.
My question then is this: Is there a way to get something like the second example I put above to run such that I can group arguments together for much larger commands? I understand it may require some funky escape sequences (or maybe not?), or perhaps something else I have not considered. Any help here is much appreciated.
Edit: I updated the question to hopefully give more insight into what my problem is.

You don't need to group arguments together. It doesn't even work, as you've already noted. Take a look what happens when I call my Java programm like this:
java -jar Test.jar -i -s "-t 500"
This is my "program":
public class Test {
public static void main(String[] args) {
for( String arg : args ) {
System.out.println(arg);
}
}
}
And this is the output:
-i
-s
-t 500
The quotes are not included in the arguments, they are used to group the arguments. So when you pass the arguments to the ProcessBuilder like you did, it is essentially like you'd written them with quotes on the command line and they are treated as a single argument, which confuses the parser.
The quotes are only necessary when you have nested components, e.g. FilteredClassifier. Maybe my answer on another Weka question can help you with those nested components. (I recently changed the links to their wiki to point to the Google cache until they established a new wiki.)
Since you didn't specify what case exactly caused you to think about grouping, you could try to get a working command line for Weka and then use that one as input for a program like mine. You can then see how you would need to pass them to a ProcessBuilder.
For your example I'd guess the following will work:
List<String> args = new ArrayList<String>();
args.add("java");
args.add("-cp");
args.add("/path/to/weka.jar");
args.add("weka.filters.MultiFilter");
args.add("-F");
args.add("weka.filters.unsupervised.attribute.ClusterMembership -I first");
args.add("-i");
args.add("/path/to/in.arff");
ProcessBuilder pb = new ProcessBuiler(args);
Additional details
What happens inside Weka is basically the following: The options from the arguments are first processed by weka.filters.Filter, then all non-general filter options are processed by weka.filters.MultiFilter, which contains the following code in setOptions(...):
filters = new Vector<Filter>();
while ((tmpStr = Utils.getOption("F", options)).length() != 0) {
options2 = Utils.splitOptions(tmpStr);
filter = options2[0];
options2[0] = "";
filters.add((Filter) Utils.forName(Filter.class, filter, options2));
}
Here, tmpStr is the value for the -F option and will be processed by Utils.splitOption(tmpStr) (source code). There, all the quoting and unquoting magic happens, so that the next component will receive an options array that looks just like it would look if it was a first-level component.

Btrace not returning Anything

So I am introducing myself to btrace but currently I am getting no output out of it. With this script :
package com.sun.btrace.samples;
import com.sun.btrace.annotations.*;
import static com.sun.btrace.BTraceUtils.*;
#BTrace
public class AllLines {
#OnMethod(
clazz="/.*/",
location=#Location(value=Kind.LINE, line=-1)
)
public static void online(#ProbeClassName String pcn, #ProbeMethodName String pmn, int line) {
print(Strings.strcat(pcn, "."));
print(Strings.strcat(pmn, ":"));
println(line);
}
}
This come straight from the samples directory, just changed the "clazz="/.*/"," out of desperation to get something printed out. No luck.
The pid I am pointing btrace at is a simple java program I developped just for testing purpose which calls a certain method on a loop. I am running it through Eclipse.
Any ideas what i could be missing ?
Thanks!
Update: Turned on debug mode to find out it is hanging at "debug: checking port availability: 2020 ". Any ideas ?

Are the classes you are trying to trace compiled with javac -g or at least javac -g:lines? You need to do that in order to be able to access the line number information in the bytecode.
Additionally - enabling line tracing for all methods of all classes is a really Bad Idea(tm). You will cause a huge number of class retransformations and reloadings and with a bit of bad luck you can shoot your application down (due to memory problems).

Exception while Skeleton Tracking using openNI on pre-recorded ONI file

I am trying to run the sample openNI Skeleton Tracking application (UserTracker.java application) on a pre-recorded .oni file. I have edited the SamplesConfig.xml file to direct the input from the ONI file and not a Kinect (I don't actually have one). However, I get the following Exception. Can anybody help me here?
org.OpenNI.StatusException: Function was not implemented!
at org.OpenNI.WrapperUtils.throwOnError(WrapperUtils.java:30)
at org.OpenNI.Context.initFromXmlEx(Context.java:371)
at org.OpenNI.Context.createFromXmlFile(Context.java:36)
at UserTracker.<init>(UserTracker.java:149)
at UserTrackerApplication.main(UserTrackerApplication.java:67)
Any help will be appreciated. Thanks!
EDIT: I found a solution here, this has removed the earlier exception that I was getting, but now I get the following!
org.OpenNI.StatusException: This operation is invalid!
Anybody knows why this is happening?

I had a similar problem, I wanted to read data from a .oni file that I generated and I was getting the same issue. Now the problem is solved and maybe you solved it too, but I think it's important to share information to others that might come to this post. I found some clues in others posts by the way.
So here is the solution. The NiUserTracker sample can be used with an .oni file so I checked the code and they do the following:
xn::Player g_Player; //Global variable
// This goes in the main or another function
if (argc > 1)
{
nRetVal = g_Context.Init();
CHECK_RC(nRetVal, "Init");
nRetVal = g_Context.OpenFileRecording(argv[1], g_Player);
if (nRetVal != XN_STATUS_OK)
{
printf("Can't open recording %s: %s\n", argv[1], xnGetStatusString(nRetVal));
return 1;
}
}
This is C++ code, I work with c++. So as you can see they don't init the kinect via XML file if they want to open a recorded .oni file, they just init it via Init() method and then open a file with openFileRecording method.
If you want to open a .oni file there's no need to modify your XML, this way you can do an application that allows you to chose if you want to use a .oni or the kinect.
I hope this helps someone.
cheers.

Any sure fire way to check file existence on Linux NFS? [duplicate]

This question already has answers here:
Alternative to File.exists() in Java
(6 answers)
Closed 2 years ago.
I am working on a Java program that requires to check the existence of files.
Well, simple enough, the code make use calls to File.exists() for checking file existence. And the problem I have is, it reports false positive. That means the file does not actually exist but exists() method returns true. No exception was captured (at least no exception like "Stale NFS handle"). The program even managed to read the file through InputStream, getting 0 bytes as expected and yet no exception. The target directory is a Linux NFS. And I am 100% sure that the file being looked for never exists.
I know there are known bugs (kind of API limitation) exist for java.io.File.exists(). So I've then added another way round by checking file existence using Linux command ls. Instead of making call to File.exists() the Java code now runs a Linux command to ls the target file. If exit code is 0, file exists. Otherwise, file does not exist.
The number of times the issue is hit seems to be reduced with the introduction of the trick, but still pops. Again, no error was captured anywhere (stdout this time). That means the problem is so serious that even native Linux command won't fix for 100% of the time.
So there are couple of questions around:
I believe Java's well known issue on File.exists() is about reporting false negative. Where file was reported to not exist but in fact does exist. As the API does not throws IOException for File.exists(), it choose to swallow the Exception in the case calls to OS's underlying native functions failed e.g. NFS timeout. But then this does not explain the false positive case I am having, given that the file never exist. Any throw on this one?
My understanding on Linux ls exit code is, 0 means okay, equivalent to file exists. Is this understanding wrong? The man page of ls is not so clear on explaining the meaning of exit code: Exit status is 0 if OK, 1 if minor problems, 2 if serious trouble.
All right, back to subject. Any surefire way to check File existence with Java on Linux? Before we see JDK7 with NIO2 officially released.

Here is a JUnit test that shows the problem and some Java Code that actually tries to read the file.
The problem happens e.g. using Samba on OSX Mavericks. A possible reason
is explaned by the statement in:
http://appleinsider.com/articles/13/06/11/apple-shifts-from-afp-file-sharing-to-smb2-in-os-x-109-mavericks
It aggressively caches file and folder properties and uses opportunistic locking to enable better caching of data.
Please find below a checkFile that will actually attempt to read a few bytes and forcing a true file access to avoid the caching misbehaviour ...
JUnit test:
/**
* test file exists function on Network drive replace the testfile name and ssh computer
* with your actual environment
* #throws Exception
*/
#Test
public void testFileExistsOnNetworkDrive() throws Exception {
String testFileName="/Volumes/bitplan/tmp/testFileExists.txt";
File testFile=new File(testFileName);
testFile.delete();
for (int i=0;i<10;i++) {
Thread.sleep(50);
System.out.println(""+i+":"+OCRJob.checkExists(testFile));
switch (i) {
case 3:
// FileUtils.writeStringToFile(testFile, "here we go");
Runtime.getRuntime().exec("/usr/bin/ssh phobos /usr/bin/touch "+testFileName);
break;
}
}
}
checkExists source code:
/**
* check if the given file exists
* #param f
* #return true if file exists
*/
public static boolean checkExists(File f) {
try {
byte[] buffer = new byte[4];
InputStream is = new FileInputStream(f);
if (is.read(buffer) != buffer.length) {
// do something
}
is.close();
return true;
} catch (java.io.IOException fnfe) {
}
return false;
}

JDK7 was released a few months ago. There are exists and notExists methods in the Files class but they return a boolean rather than throwing an exception. If you really want an exception then use FileSystems.getDefault().provider().checkAccess(path) and it will throw an exception if the file does not exist.

If you need to be robust, try to read the file - and fail gracefully if the file is not there (or there is a permission or other problem). This applies to any other language than Java as well.
The only safe way to tell if the file exist and you can read from it is to actually read a data from the file. Regardless of a file system - local, or remote. The reason is a race condition which can occur right after you get success from checkAccess(path): check, then open file, and you find it suddenly does not exist. Some other thread (or another remote client) may have removed it, or has acquired an exclusive lock. So don't bother checking access, but rather try to read the file. Spending time in running ls just makes race condition window easier to fit.

JUnit tests fail when creating new Files

We have several JUnit tests that rely on creating new files and reading them. However there are issues with the files not being created properly. But this fault comes and goes.
This is the code:
#Test
public void test_3() throws Exception {
// Deletes files in tmp test dir
File tempDir = new File(TEST_ROOT, "tmp.dir");
if (tempDir.exists()) {
for (File f : tempDir.listFiles()) {
f.delete();
}
} else {
tempDir.mkdir();
}
File file_1 = new File(tempDir, "file1");
FileWriter out_1 = new FileWriter(file_1);
out_1.append("# File 1");
out_1.close();
File file_2 = new File(tempDir, "file2");
FileWriter out_2 = new FileWriter(file_2);
out_2.append("# File 2");
out_2.close();
File file_3 = new File(tempDir, "fileXXX");
FileWriter out_3 = new FileWriter(file_3);
out_3.append("# File 3");
out_3.close();
....
The fail is that the second file object, file_2, never gets created. Sometimes. Then when we try to write to it a FileNotFoundException is thrown
If we run only this testcase, everything works fine.
If we run this testfile with some ~40 testcases, it can both fail and work depending on the current lunar cycle.
If we run the entire testsuite, consisting of some 10*40 testcases, it always fails.
We have tried
adding sleeps (5sec) after new File, nothing
adding while loop until file_2.exists() is true but the loop never stopped
catching SecurityException, IOException and even throwable when we do the New File(..), but caught nothing.
At one point we got all files to be created, but file_2 was created before file_1 and a test that checked creation time failed.
We've also tried adding file_1.createNewFile() and it always returns true.
So what is going on? How can we make tests that depend on actual files and always be sure they exist?
This has been tested in both java 1.5 and 1.6, also in Windows 7 and Linux. The only difference that can be observed is that sometimes a similar testcase before fails, and sometimes file_1 isn't created instead
Update
We tried a new variation:
File file_2 = new File(tempDir, "file2");
while (!file_2.canRead()) {
Thread.sleep(500);
try {
file_2.createNewFile();
} catch (IOException e) {
e.printStackTrace();
}
}
This results in alot of Exceptions of the type:
java.io.IOException: Access is denied
at java.io.WinNTFileSystem.createFileExclusively(Native Method)
at java.io.File.createNewFile(File.java:883)
... but eventually it works, the file is created.

Are there multiple instances of your program running at once?
Check for any extra instances of javaw.exe running. If multiple programs have handles to the same file at once, things can get very wonky very quickly.
Do you have antivirus software or anything else running that could be getting in the way of file creation/deletion, by handle?

Don't hardcode your file names, use random names. It's the only way to abstract yourself from the various external situations that can occur (multiple access to the same file, permissions, file system error, locking problems, etc...).
One thing for sure: using sleep() or retrying is guaranteed to cause weird errors at some point in the future, avoid doing that.

I did some googling and based on this lucene bug and this board question seems to indicate that there could be an issue with file locking and other processes using the file.
Since we are running this on ClearCase it seems plausible that ClearCase does some indexing or something similar when the files are being created. Adding loops that repeat until the file is readable solved the issue, so we are going with that. Very ugly solution though.

Try File#createTempFile, this at least guarantees you that there are no other files by the same name that would still hold a lock.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.