Stanford Parser questions - java

I am writing a project that works with NLP (natural language parser). I am using the stanford parser.
I create a thread pool that takes sentences and run the parser with them.
When I create one thread its all works fine, but when I create more, I get errors.
The "test" procedure is finding words that have some connections.
If I do an synchronized its supposed to work like one thread but still I get errors.
My problem is that I have errors on this code:
public synchronized String test(String s,LexicalizedParser lp )
{
if (s.isEmpty()) return "";
if (s.length()>80) return "";
System.out.println(s);
String[] sent = s.split(" ");
Tree parse = (Tree) lp.apply(Arrays.asList(sent));
TreebankLanguagePack tlp = new PennTreebankLanguagePack();
GrammaticalStructureFactory gsf = tlp.grammaticalStructureFactory();
GrammaticalStructure gs = gsf.newGrammaticalStructure(parse);
Collection tdl = gs.typedDependenciesCollapsed();
List list = new ArrayList(tdl);
//for (int i=0;i<list.size();i++)
//System.out.println(list.get(1).toString());
//remove scops and numbers like sbj(screen-4,good-6)->screen good
Pattern p = Pattern.compile(".*\\((.*?)\\-\\d+,(.*?)\\-\\d+\\).*");
if (list.size()>2){
// Split input with the pattern
Matcher m = p.matcher(list.get(1).toString());
//check if the result have more than 1 groups
if (m.find()&& m.groupCount()>1){
if (m.groupCount()>1)
{
System.out.println(list);
return m.group(1)+m.group(2);
}}
}
return "";
}
the errors that I have are:
at blogsOpinions.ParserText.(ParserText.java:47)
at blogsOpinions.ThreadPoolTest$1.run(ThreadPoolTest.java:50)
at blogsOpinions.ThreadPool$PooledThread.run(ThreadPoolTest.java:196)
Recovering using fall through
strategy: will construct an (X ...)
tree. Exception in thread
"PooledThread-21"
java.lang.ClassCastException:
java.lang.String cannot be cast to
edu.stanford.nlp.ling.HasWord
at
edu.stanford.nlp.parser.lexparser.LexicalizedParser.apply(LexicalizedParser.java:289)
at blogsOpinions.ParserText.test(ParserText.java:174)
at blogsOpinions.ParserText.insertDb(ParserText.java:76)
at blogsOpinions.ParserText.(ParserText.java:47)
at blogsOpinions.ThreadPoolTest$1.run(ThreadPoolTest.java:50)
at blogsOpinions.ThreadPool$PooledThread.run(ThreadPoolTest.java:196)
and how can i get the discription of the subject like the screen is very good, and I want to get screen good from the list the I get and not like list.get(1).

You can't call LexicalizedParser.parse on a List of Strings; it expects a list of HasWord objects. It's much easier to call the apply method on your input string. This will also run a proper tokenizer on your input (instead of your simple split on spaces).
To get relations such as subjectness out of the returned Tree, call its dependencies member.

Hm, I witnessed the same stack trace. Turned out I was loading two instances of the LexicalizedParser in the same JVM. This seemed to be the problem. When I made sure only one instance is created, I was able to call lp.apply(Arrays.asList(sent)) just fine.

Related

Trying to add substrings from newLines in a large file to a list

I downloaded my extended listening history from Spotify and I am trying to make a program to turn the data into a list of artists without doubles I can easily make sense of. The file is rather huge because it has data on every stream I have done since 2016 (307790 lines of text in total). This is what 2 lines of the file looks like:
{"ts":"2016-10-30T18:12:51Z","username":"edgymemes69endmylifepls","platform":"Android OS 6.0.1 API 23 (HTC, 2PQ93)","ms_played":0,"conn_country":"US","ip_addr_decrypted":"68.199.250.233","user_agent_decrypted":"unknown","master_metadata_track_name":"Devil's Daughter (Holy War)","master_metadata_album_artist_name":"Ozzy Osbourne","master_metadata_album_album_name":"No Rest for the Wicked (Expanded Edition)","spotify_track_uri":"spotify:track:0pieqCWDpThDCd7gSkzx9w","episode_name":null,"episode_show_name":null,"spotify_episode_uri":null,"reason_start":"fwdbtn","reason_end":"fwdbtn","shuffle":true,"skipped":null,"offline":false,"offline_timestamp":0,"incognito_mode":false},
{"ts":"2021-03-26T18:15:15Z","username":"edgymemes69endmylifepls","platform":"Android OS 11 API 30 (samsung, SM-F700U1)","ms_played":254120,"conn_country":"US","ip_addr_decrypted":"67.82.66.3","user_agent_decrypted":"unknown","master_metadata_track_name":"Opportunist","master_metadata_album_artist_name":"Sworn In","master_metadata_album_album_name":"Start/End","spotify_track_uri":"spotify:track:3tA4jL0JFwFZRK9Q1WcfSZ","episode_name":null,"episode_show_name":null,"spotify_episode_uri":null,"reason_start":"fwdbtn","reason_end":"trackdone","shuffle":true,"skipped":null,"offline":false,"offline_timestamp":1616782259928,"incognito_mode":false},
It is formatted in the actual text file so that each stream is on its own line. NetBeans is telling me the exception is happening at line 19 and it only fails when I am looking for a substring bounded by the indexOf function. My code is below. I have no idea why this isn't working, any ideas?
import java.util.*;
public class MainClass {
public static void main(String args[]){
File dat = new File("SpotifyListeningData.txt");
List<String> list = new ArrayList<String>();
Scanner swag = null;
try {
swag = new Scanner(dat);
}
catch(Exception e) {
System.out.println("pranked");
}
while (swag.hasNextLine())
if (swag.nextLine().length() > 1)
if (list.contains(swag.nextLine().substring(swag.nextLine().indexOf("artist_name"), swag.nextLine().indexOf("master_metadata_album_album"))))
System.out.print("");
else
try {list.add(swag.nextLine().substring(swag.nextLine().indexOf("artist_name"), swag.nextLine().indexOf("master_metadata_album_album")));}
catch(Exception e) {}
System.out.println(list);
}
}
Find a JSON parser you like.
Create a class that with the fields you care about marked up to the parsers specs.
Read the file into a collection of objects. Most parsers will stream the contents so you're not string a massive string.
You can then load the data into objects and store that as you see fit. For your purposes, a TreeSet is probably what you want.
Your code will throw a lot of exceptions only because you don't use braces. Please do use braces in each blocks, whether it is if, else, loops, whatever. It's a good practice and prevent unnecessary bugs.
However, everytime scanner.nextLine() is called, it reads the next line from the file, so you need to avoid using that in this way.
The best way to deal with this is to write a class containing the fields same as the json in each line of the file. And map the json to the class and get desired field value from that.
Your way is too much risky and dependent on structure of the data, even on whitespaces. However, I fixed some lines in your code and this will work for your purpose, although I actually don't prefer operating string in this way.
while (swag.hasNextLine()) {
String swagNextLine = swag.nextLine();
if (swagNextLine.length() > 1) {
String toBeAdded = swagNextLine.substring(swagNextLine.indexOf("artist_name") + "artist_name".length() + 2
, swagNextLine.indexOf("master_metadata_album_album") - 2);
if (list.contains(toBeAdded)) {
System.out.print("Match");
} else {
try {
list.add(toBeAdded);
} catch (Exception e) {
System.out.println("Add to list failed");
}
}
System.out.println(list);
}
}

Write elements of a map to a CSV correctly in a simplified way in Java 8

I have a countries Map with the following design:
England=24
Spain=21
Italy=10
etc
Then, I have a different citiesMap with the following design:
London=10
Manchester=5
Madrid=7
Barcelona=4
Roma=3
etc
Currently, I am printing these results on screen:
System.out.println("\nCountries:");
Map<String, Long> countryMap = countTotalResults(orderDataList, OrderData::getCountry);
writeResultInCsv(countryMap);
countryMap.entrySet().stream().forEach(System.out::println);
System.out.println("\nCities:\n");
Map<String, Long> citiesMap = countTotalResults(orderDataList, OrderData::getCity);
writeResultInCsv(citiesMap);
citiesMap.entrySet().stream().forEach(System.out::println);
I want to write each line of my 2 maps in the same CSV file. I have the following code:
public void writeResultInCsv(Map<String, Long> resultMap) throws Exception {
File csvOutputFile = new File(RUTA_FICHERO_RESULTADO);
try (PrintWriter pw = new PrintWriter(csvOutputFile)) {
resultMap.entrySet().stream()
.map(this::convertToCSV)
.forEach(pw::println);
}
}
public String convertToCSV(String[] data) {
return Stream.of(data)
.map(this::escapeSpecialCharacters)
.collect(Collectors.joining("="));
}
public String escapeSpecialCharacters(String data) {
String escapedData = data.replaceAll("\\R", " ");
if (data.contains(",") || data.contains("\"") || data.contains("'")) {
data = data.replace("\"", "\"\"");
escapedData = "\"" + data + "\"";
}
return escapedData;
}
But I get compilation error in writeResultInCsv method, in the following line:
.map(this::convertToCSV)
This is the compilation error I get:
reason: Incompatible types: Entry is not convertible to String[]
How can I indicate the following result in a CSV file in Java 8 in a simplified way?
This is the result and design that I want my CSV file to have:
Countries:
England=24
Spain=21
Italy=10
etc
Cities:
London=10
Manchester=5
Madrid=7
Barcelona=4
Roma=3
etc
Your resultMap.entrySet() is a Set<Map.Entry<String, Long>>. You then turn that into a Stream<Map.Entry<String, Long>>, and then run .map on this. Thus, the mapper you provide there needs to map objects of type Map.Entry<String, Long> to whatever you like. but you pass the convertToCSV method to it, which maps string arrays.
Your code tries to join on comma (Collectors.joining(",")), but your desired output contains zero commas.
It feels like one of two things is going on:
you copy/pasted this code from someplace or it was provided to you and you have no idea what any of it does. I would advise tearing this code into pieces: Take each individual piece, experiment with it until you understand it, then put it back together again and now you know what you're looking at. At that point you would know that having Collectors.joining(",") in this makes no sense whatsoever, and that you're trying to map an entry of String, Long using a mapping function that maps string arrays - which obviously doesn't work.
You would know all this stuff but you haven't bothered to actually look at your code. That seems a bit surprising, so I don't think this is it. But if it is - the code you have is so unrelated to the job you want to do, that you might as well remove your code entirely and turn this question into: "I have this. I want this. How do I do it?"
NB: A text file listing key=value pairs is not usually called a CSV file.

Returning an arraylist and iterating throught the returned list

Im trying to return an arraylist from the method getNumbers (which contains strings)
public ArrayList<String> getNumbers(){
return (numeros);
}
Then by using a searcher im trying to compare between a variable m (which contains the desired info to look for) and the returned list.
public class NumberSearcher {
Reader reader = new KeyboardReader();
public NumberSearcher(ArrayList<Contacto> contactos){
String m = reader.read();
for(int i = 0; i<contactos.size();i++){
if(contactos.get(i).getPhoneNumbers().contains(m)){
contactos.get(i).display();
}
}
}
}
I have succeded in creating a searcher using this very same style but only when using methods that return String alone.
The problem is its not working. If there there would be a match it should display the contact information but it seem it isnt "comparing" properly because nothing happens.
It's difficult to understand what you're asking here. Your getNumbers method doesn't get called from the second code block, so I don't see where that is relating to anything. It's also unclear what you mean the problem is. Can you try to give us a more detailed description of what is going wrong?
Anyways, I'll try to give you some general advice here, but without knowing the issue it's hard to say how much this will help.
Firstly, it is almost always recommended to have your method's return type as the List interface, rather than a specific implementation (ArrayList, etc). You can specify a return type from within the method but this way they client doesn't need to know what the underlying data structure is, and you are also flexible to future data structure changes.
public List<String> getNumbers(){
return (numeros);
}
Secondly, I would probably change the name 'getNumbers' to something slightly more precise - if I see a 'getNumbers' method I expect it to return some numeric entities, not a list of strings. If they are phone numbers then explicity call it 'getPhoneNumbers'.
Though I'm not entirely sure I understand what you asking, I think this may solve your issues:
for(int i = 0; i < contactos.size(); i++) {
Contacto next = contactos.get(i);
if(next.getEmails().contains(m)) {
next.display();
}
}
And as an afterthought, is there any specific reason you're only checking string containment? I would suggest that you check case-insensitive equality unless you really do want to find out if the string just contains the element.
Is this what you are looking for?
public class EmailSearcher {
Reader reader = new KeyboardReader();
public EmailSearcher(ArrayList<Contacto> contactos){
while(reader.read() != 'keyThatTerminates') {
String m = reader.read();
for(int i = 0; i<contactos.size();i++){
var row = contactos.get(i);
if(row.getEmails().contains(m)){
row.display();
}
}
}
}
}

StreamingPathFilter trims spaces

I use the XOM library to parse and process .docx documents. MS Word stores text content in runs (<w:r>) inside the paragraph tags (<w:p>), and often breaks the text into several runs. Sometimes every word and every space between them is in a separate run. When I load a run containing only a space, the parser removes that space and handles it as an empty tag, as a result, the output contains the text without spaces. How could I force the parser to keep all the spaces? I would prefer keeping this parser, but if there is no solution, could you recommend an alternative one?
This is how I call the parser:
StreamingPathFilter filter = new StreamingPathFilter("/w:document/w:body/*:*", prefixes);
Builder builder = new Builder(filter.createNodeFactory(null, contentTransform));
builder.build(documentFile);
...
StreamingTransform contentTransform = new StreamingTransform() {
#Override
public Nodes transform(nu.xom.Element node){
<...process XML and output text...>
}
}
Meanwhile, I found the solution to this issue, thanks to the hint of Elliotte Rusty Harold on the XOM mailing list.
First, the StreamingPathFilter is in fact not part of the nu.xom package, it belongs to nux.xom.
Second, the issue was caused by StreamingPathFilter. When I changed the code to use the default Builder constructor, the missing spaces appeared in the output.
Just for documentation, the new code looks like the following:
Builder builder = new Builder();
nu.xom.Document doc = builder.build(documentFile);
context = XPathContext.makeNamespaceContext(doc.getRootElement());
Nodes nodes = doc.getRootElement().query("w:body/*", context);
for (int i = 0; i < nodes.size(); i++) {
transform((nu.xom.Element) nodes.get(i));
}
...
private void transform(nu.xom.Element node){
//process nodes
...
}

Select object dynamically

Here's the situation :
I have 3 objects all named **List and I have a method with a String parameter;
gameList = new StringBuffer();
appsList = new StringBuffer();
movieList = new StringBuffer();
public void fetchData(String category) {
URL url = null;
BufferedReader input;
gameList.delete(0, gameList.length());
Is there a way to do something like the following :
public void fetchData(String category) {
URL url = null;
BufferedReader input;
"category"List.delete(0, gameList.length());
, so I can choose which of the lists to be used based on the String parameter?
I suggest you create a HashMap<String, StringBuffer> and use that:
map = new HashMap<String, StringBuffer>();
map.put("game", new StringBuffer());
map.put("apps", new StringBuffer());
map.put("movie", new StringBuffer());
...
public void fetchData(String category) {
StringBuffer buffer = map.get(category);
if (buffer == null) {
// No such category. Throw an exception?
} else {
// Do whatever you need to
}
}
If the lists are fields of your object - yes, using reflection:
Field field = getClass().getDeclaredField(category + "List");
List result = field.get();
But generally you should avoid reflection. And if your objects are fixed - i.e. they don't change, simply use an if-clause.
The logically simplest way taking your question as given would just be:
StringBuffer which;
if (category.equals("game"))
which=gameList;
else if (category.equals("apps"))
which=appList;
else if (category.equals("movie"))
which=movieList;
else
... some kind of error handling ...
which.delete();
As Jon Skeet noted, if the list is big or dynamic you probably want to use a map rather than an if/else/if.
That said, I'd encourage you to use integer constant or an enum rather than a String. Like:
enum ListType {GAME, APP, MOVIE};
void deleteList(ListType category)
{
if (category==GAME)
... etc ...
In this simple example, if this is all you'd ever do with it, it wouldn't matter much. But I'm working on a system now that uses String tokens for this sort of thing all over the place, and it creates a lot of problems.
Suppose you call the function and by mistake you pass in "app" instead of "apps", or "Game" instead of "game". Or maybe you're thinking you added handling for "song" yesterday but in fact you went to lunch instead. This will successfully compile, and you won't have any clue that there's a problem until run-time. If the program does not throw an error on an invalid value but instead takes some default action, you could have a bug that's difficult to track down. But with an enum, if you mis-spell the name or try to use one that isn't defined, the compiler will immediately alert you to the error.
Suppose that some functions take special action for some of these options but not others. Like you find yourself writing
if (category.equals("app"))
getSpaceRequirements();
and that sort of thing. Then someone reading the program sees a reference to "app" here, a reference to "game" 20 lines later, etc. It could be difficult to determine what all the possible values are. Any given function might not explicitly reference them all. But with an enum, they're all neatly in one place.
You could use a switch statement
StringBuffer buffer = null;
switch (category) {
case "game": buffer = gameList;
case "apps": buffer = appsList;
case "movie": buffer = movieList;
default: return;
}

Categories