Unable to understand the HLDA Output in MALLET - java

Below is a snippet of my code:
HierarchicalLDA hlda = new HierarchicalLDA();
hlda.initialize(instances, instances, 5, new Randoms());
hlda.estimate(1000);
hlda.printState(new PrintWriter(new File("Data.txt")));
I am unable to understand the meaning of both the console output and what is printed in the "Data.txt" file. I have already scoured the MALLET site but haven't found anything helpful. Any help or suggestion would be greatly appreciated.
Thanks in advance!

In hLDA each document samples a path through a tree of topics. Each token exists on one "level" of that path. The printState method gives you the ids of each tree node in the path for the document, followed by information about the word: the numeric ID for the word, the string for that id, and the level in the path.
node = documentLeaves[doc];
for (level = numLevels - 1; level >= 0; level--) {
path.append(node.nodeID + " ");
node = node.parent;
}
for (token = 0; token < seqLen; token++) {
type = fs.getIndexAtPosition(token);
level = docLevels[token];
// The "" just tells java we're not trying to add a string and an int
out.println(path + "" + type + " " + alphabet.lookupObject(type) + " " + level + " ");
}

Related

NPE in a do/while loop due to EOF...catching the EOF earlier to avoid the NPE [duplicate]

This question already has answers here:
What is a NullPointerException, and how do I fix it?
(12 answers)
Closed 5 years ago.
I have written this program to compare 2 files. They are 500mb to 2.8gb in size and are created every 6 hours. I have 2 files from 2 sources (NMD and XMP). They are broken up into lines of text that have fields separated by the pipe(|) character. Each line is a single record and may be up to 65,000 characters long. The data is about TV shows and movies, showing times and descriptive content. I have determined that any particular show or movie has a minimum of 3 pieces of data that will uniquely identify that show or movie. IE: CallSign, ProgramId and StartLong. The two sources for this data are systems called NMD and XMP hence that acronym added to various variables. So my goal is to compare a file created by NMD and one created by XMP and confirm that everything that NMD produces is also produced by XMP and that the data in each matched record is the same.
What I am trying to accomplish here is this: 1. Read the NMD file record by record for the 3 unique data fields. 2. Read the XMP file record by record and look for a match for the current record in the NMD file. 3.The NMD file should iterate one record at a time. Each NMD record should then be searched for in the entire XMD file, record by record for that same record. 4. Write a log entry in one of 2 files indicating success or failure and what that data was.
I have an NPE issue when I reach the end of the testdataXMP.txt file. I assume the same thing will happen for testdataNMD.txt. I'm trying to break out of the loop right after the readLine since the epgsRecordNMD or epgsRecordXMP will have just reached the end of the file if it at that point in the file. The original NPE was for trying to do a string split on null data at the end of the file. Now I'm getting an NPE here according to the debugger.
if (epgsRecordXMP.equals(null)) {
break;
}
Am I doing this wrong? If I'm really at the end of the file, the readLine ought to return null right?
I did it this way too, but to my limited experience they feel like they are effectively the same thing. It too threw an NPE.
if (epgsRecordXMP.equals(null)) break;
Here's the code...
public static void main(String[] args) throws java.io.IOException {
String epgsRecordNMD = null;
String epgsRecordXMP = null;
BufferedWriter logSuccessWriter = null;
BufferedWriter logFailureWriter = null;
BufferedReader readXMP = null;
BufferedReader readNMD = null;
int successCount = 0;
readNMD = new BufferedReader(new FileReader("d:testdataNMD.txt"));
readXMP = new BufferedReader(new FileReader("d:testdataXMP.txt"));
do {
epgsRecordNMD = readNMD.readLine();
if (epgsRecordNMD.equals(null)) {
break;
}
String[] epgsSplitNMD = epgsRecordNMD.split("\\|");
String epgsCallSignNMD = epgsSplitNMD[0];
String epgsProgramIdNMD = epgsSplitNMD[2];
String epgsStartLongNMD = epgsSplitNMD[9];
System.out.println("epgsCallsignNMD: " + epgsCallSignNMD + " epgsProgramIdNMD: " + epgsProgramIdNMD + " epgsStartLongNMD: " + epgsStartLongNMD );
do {
epgsRecordXMP = readXMP.readLine();
if (epgsRecordXMP.equals(null)) {
break;
}
String[] epgsSplitXMP = epgsRecordXMP.split("\\|");
String epgsCallSignXMP = epgsSplitXMP[0];
String epgsProgramIdXMP = epgsSplitXMP[2];
String epgsStartLongXMP = epgsSplitXMP[9];
System.out.println("epgsCallsignXMP: " + epgsCallSignXMP + " epgsProgramIdXMP: " + epgsProgramIdXMP + " epgsStartLongXMP: " + epgsStartLongXMP);
if (epgsCallSignXMP.equals(epgsCallSignNMD) && epgsProgramIdXMP.equals(epgsProgramIdNMD) && epgsStartLongXMP.equals(epgsStartLongNMD)) {
logSuccessWriter = new BufferedWriter (new FileWriter("d:success.log", true));
logSuccessWriter.write("NMD match found in XMP " + "epgsCallsignNMD: " + epgsCallSignNMD + " epgsProgramIdNMD: " + epgsProgramIdNMD + " epgsStartLongNMD: " + epgsStartLongNMD);
logSuccessWriter.write("\n");
successCount++;
logSuccessWriter.write("Successful matches: " + successCount);
logSuccessWriter.write("\n");
logSuccessWriter.close();
System.out.println ("Match found");
System.out.println ("Successful matches: " + successCount);
}
} while (epgsRecordXMP != null);
readXMP.close();
if (successCount == 0) {
logFailureWriter = new BufferedWriter (new FileWriter("d:failure.log", true));
logFailureWriter.write("NMD match not found in XMP" + "epgsCallsignNMD: " + epgsCallSignNMD + " epgsProgramIdNMD: " + epgsProgramIdNMD + " epgsStartLongNMD: " + epgsStartLongNMD);
logFailureWriter.write("\n");
logFailureWriter.close();
System.out.println ("Match NOT found");
}
} while (epgsRecordNMD != null);
readNMD.close();
}
}
You should not make this:
if (epgsRecordXMP.equals(null)) {
break;
}
If you want to know if epgsRecordXMPis null then the if should be like this:
if (epgsRecordXMP == null) {
break;
}
To sum up: your app throws NPE when try to call equals method in epgsRecordXMP.

Retrieve Connected Components Graphstream

I am using GraphStream in a project and my problem is that I want to retrieve the list of connected components but I only get either their count or at the very best their Ids.
I have tried this code but it doesn't return anything :
ConnectedComponents cc = new ConnectedComponents();
cc.init(graph);
System.out.println("List of Connected Components :");
for(ConnectedComponent conn : cc) {
System.out.println("Component " + conn.id + " :");
System.out.println("--------------");
for(Node n : conn.getEachNode()) {
Object[] attr = n.getAttribute("xy");
Double x = (Double) attr[0];
Double y = (Double) attr[1];
System.out.println(x + " , " + y);
}
}
The nodes have an attribute "xy" which contains the coordinates stored as Double[].
What did I do wrong? And how can I fix it?
ConnectedComponents has been rewritten in commit on 2015-12-15. There was a problem with retrieving content of components.
If you are not using the git version of GraphStream, maybe you should give it a try.

Display Stanford NER confidence score

I'm extracting named-entities from news articles with the use of Stanford NER CRFClassifier and in order to implement active learning, I would like to know what are the confidence scores of the classes for each labelled entity.
Exemple of display :
LOCATION(0.20) PERSON(0.10) ORGANIZATION(0.60) MISC(0.10)
Here is my code for extracting named-entities from a text :
AbstractSequenceClassifier<CoreLabel> classifier = CRFClassifier.getClassifierNoExceptions(classifier_path);
String annnotatedText = classifier.classifyWithInlineXML(text);
Is there a workaround to get thoses values along with the annotations ?
I've found it out by myself, in CRFClassifier's doc it is written :
Probabilities assigned by the CRF can be interrogated using either the
printProbsDocument() or getCliqueTrees() methods.
The first method is not useful since it only prints what I want on the console, but I want to be able to access this data, so I have read how this method is coded and copied a bit its behaviour like this :
List<CoreLabel> classifiedLabels = classifier.classify(sentences);
CRFCliqueTree<String> cliqueTree = classifier.getCliqueTree(classifiedLabels);
for (int i = 0; i < cliqueTree.length(); i++) {
CoreLabel wi = classifiedLabels.get(i);
for (Iterator<String> iter = classifier.classIndex.iterator(); iter.hasNext();) {
String label = iter.next();
int index = classifier.classIndex.indexOf(label);
double prob = cliqueTree.prob(i, index);
System.out.println("\t" + label + "(" + prob + ")");
}
String tag = StringUtils.getNotNullString(wi.get(CoreAnnotations.AnswerAnnotation.class));
System.out.println("Class : " + tag);
}

How to extract data from certain facebook pages for a specified date range(from and to)?

I am stuck up in this date range query. I need to extract data from particular facebook pages for a specified date range.I am able to do this individually, by using since and until fields. But how to use these two fields together.
Here is my code:
public static String getFacebookPostes(Facebook facebook, String searchPost)
throws FacebookException {
String searchResult = "Item : " + searchPost + "\n";
StringBuffer searchMessage = new StringBuffer();
ResponseList<Post> results = facebook.searchPosts(searchPost, new Reading().since("2014-04-02"));
String userId="";
for (Post post : results) {
System.out.println(post.getMessage());
searchMessage.append(post.getMessage() + "\n");
for (int j = 0; j < post.getComments().size(); j++) {
searchMessage.append(post.getComments().get(j).getFrom()
.getName()
+ ", ");
searchMessage.append(post.getComments().get(j).getMessage()
+ ", ");
searchMessage.append(post.getComments().get(j).getCreatedTime()
+ ", ");
searchMessage.append(post.getComments().get(j).getLikeCount()
+ "\n");
userId=post.getComments().get(j).getFrom().getId();
User user = facebook.getUser(userId);
//System.out.println("ROCK");
System.out.println(user);
}
}
Any guidance is appreciated. Thanks in advance.
PS : I am using facebook4j-core-2.0.2.jar and eclipse kepler.
According to http://facebook4j.org/en/code-examples.html you can use all date formats descirbed in http://www.php.net/manual/en/datetime.formats.date.php
From my understanding the code would then look like this:
ResponseList<Post> results = facebook.searchPosts(searchPost, new Reading().since("2014/04/02").until("2014/04/08"));

uncaught syntax error

I am using following code for local storage.
for(int i=0; i< files.length; i++)
{
System.out.println("base = " + files[i].getName() + "\n i=" +i + "\n");
AudioFile f = AudioFileIO.read(files[i]);
Tag tag = f.getTag();
//AudioHeader h = f.getAudioHeader();
int l = f.getAudioHeader().getTrackLength();
String s1 = tag.getFirst(FieldKey.ALBUM);
out.print("writeToStorage("+s1+","+s1+");");
}
getting uncaught syntex erroe: unexpected identifer as a error.
Im guessing you meant java rather than javascript?
Your unexpected identifier is here out.println you need System. infront of it.
The reason for this is that out is not defined in your code. You need to access it by using the static variable in the System class. Hence why you use System.out.
Alternatley you could set a variable out to be equal to System.out for shorthand, although I don;t tend to. But this can allow you to switch out to a different type of output stream without having to refactor your code much.
Have you added following ?
import static java.lang.System.out;
Probably you need to output "s in the last line to surround the s1 values.
"writeToStorage("+s1+","+s1+");"
->
"writeToStorage('"+s1+"','"+s1+"');"
Btw for the same reason you have to fix the other line too:
"base = " + files[i].getName() + "...
->
"base = '" + files[i].getName() + "'...

Categories