In a project, I'm trying to query the tweets of a particular user's handle and find the most common word in the user's tweets and also return the frequency of that most common word.
Below is my code:
public String mostPopularWord()
{
this.removeCommonEnglishWords();
this.sortAndRemoveEmpties();
Map<String, Integer> termsCount = new HashMap<>();
for(String term : terms)
{
Integer c = termsCount.get(term);
if(c==null)
c = new Integer(0);
c++;
termsCount.put(term, c);
}
Map.Entry<String,Integer> mostRepeated = null;
for(Map.Entry<String, Integer> curr: termsCount.entrySet())
{
if(mostRepeated == null || mostRepeated.getValue()<curr.getValue())
mostRepeated = curr;
}
//frequencyMax = termsCount.get(mostRepeated.getKey());
try
{
frequencyMax = termsCount.get(mostRepeated.getKey());
return mostRepeated.getKey();
}
catch (NullPointerException e)
{
System.out.println("Cannot find most popular word from the tweets.");
}
return "";
}
I also think it would help to show the codes for the first two methods I call in the method above, as shown below. They are all in the same class, with the following defined:
private Twitter twitter;
private PrintStream consolePrint;
private List<Status> statuses;
private List<String> terms;
private String popularWord;
private int frequencyMax;
#SuppressWarnings("unchecked")
public void sortAndRemoveEmpties()
{
Collections.sort(terms);
terms.removeAll(Arrays.asList("", null));
}
private void removeCommonEnglishWords()
{
Scanner sc = null;
try
{
sc = new Scanner(new File("commonWords.txt"));
}
catch(Exception e)
{
System.out.println("The file is not found");
}
List<String> commonWords = new ArrayList<String>();
int count = 0;
while(sc.hasNextLine())
{
count++;
commonWords.add(sc.nextLine());
}
Iterator<String> termIt = terms.iterator();
while(termIt.hasNext())
{
String term = termIt.next();
for(String word : commonWords)
if(term.equalsIgnoreCase(word))
termIt.remove();
}
}
I apologise for the rather long code snippets. But one frustrating thing is that even though my removeCommonEnglish() method is apparently right (discussed in another post), when I run the mostPopularWord(), it returns "the", which is clearly a part of the common English Words list that I have and meant to eliminate from the List terms. What might I be doing wrong?
UPDATE 1:
Here is the link ot the commonWords file:
https://drive.google.com/file/d/1VKNI-b883uQhfKLVg-L8QHgPTLNb22uS/view?usp=sharing
UPDATE 2: One thing I've noticed while debugging is that the
while(sc.hasNext())
in removeCommonEnglishWords() is entirely skipped. I don't understand why, though.
It can be more simple if you use stream like so :
String mostPopularWord() {
return terms.stream()
.collect(Collectors.groupingBy(s -> s, Collectors.counting()))
.entrySet().stream()
.sorted(Map.Entry.comparingByValue(Comparator.reverseOrder()))
.findFirst()
.map(Map.Entry::getKey)
.orElse("");
}
I tried your code. Here is what you will have to do. Replace the following part in removeCommonEnglishWords()
Iterator<String> termIt = terms.iterator();
while(termIt.hasNext())
{
String term = termIt.next();
for(String word : commonWords)
if(!term.equalsIgnoreCase(word))
reducedTerms.add( term );
}
with this:
List<String> reducedTerms = new ArrayList<>();
for( String term : this.terms ) {
if( !commonWords.contains( term ) ) reducedTerms.add( term );
}
this.terms = reducedTerms;
Since you hadn't provided the class, I created one with some assumptions, but I think this code will go through.
A slightly different approach using streams.
This uses the relatively common frequency count idiom using streams and stores them in a map.
It then does a simple scan to find the largest count obtained and either returns
that word or the string "No words found".
It also filters out the words in a Set<String> called ignore so you need to create that too.
import java.util.Arrays;
import java.util.Comparator;
import java.util.Map;
import java.util.Map.Entry;
import java.util.stream.Collectors;
Set<String> ignore = Set.of("the", "of", "and", "a",
"to", "in", "is", "that", "it", "he", "was",
"you", "for", "on", "are", "as", "with",
"his", "they", "at", "be", "this", "have",
"via", "from", "or", "one", "had", "by",
"but", "not", "what", "all", "were", "we",
"RT", "I", "&", "when", "your", "can",
"said", "there", "use", "an", "each",
"which", "she", "do", "how", "their", "if",
"will", "up", "about", "out", "many",
"then", "them", "these", "so", "some",
"her", "would", "make", "him", "into",
"has", "two", "go", "see", "no", "way",
"could", "my", "than", "been", "who", "its",
"did", "get", "may", "…", "#", "??", "I'm",
"me", "u", "just", "our", "like");
Map.Entry<String, Long> entry = terms.stream()
.filter(wd->!ignore.contains(wd)).map(String::trim)
.collect(Collectors.groupingBy(a -> a,
Collectors.counting()))
.entrySet().stream()
.collect(Collectors.maxBy(Comparator
.comparing(Entry::getValue)))
.orElse(Map.entry("No words found", 0L));
System.out.println(entry.getKey() + " " + entry.getValue());
I have a JSON message that after parsing it w/ the JsonSluper the ordering is messed up. I know the ordering isn't important, but I need to put the message back into ascending order after the message is parsed and flatted into single objects, so I can a build a JsonArray and present the message in the proper asc order.
String test = """[
{
"AF": "test1",
"BE": "test2",
"CD": "test3",
"DC": "test4",
"EB": "test5",
"FA": "test5"
},
{
"AF": "test1",
"BE": "test2",
"CD": "test3",
"DC": "test4",
"EB": "test5",
"FA": "test5"
}
]"""
The parseText produces this:
def json = new groovy.json.JsonSlurper().parseText(test);
[{CD=test3, BE=test2, AF=test1, FA=test5, EB=test5, DC=test4}, {CD=test3,
BE=test2, AF=test1, FA=test5, EB=test5, DC=test4}]
After parsing the json message, I need to pass the flatten json object into a method at which point needs to be sorted in ascending order by the map keys prior to adding to a JSONArray like below.
def json = new groovy.json.JsonSlurper().parseText(test);
for( int c = 0; c < json?.size(); c++ )
doSomething(json[c]);
void doSomething( Object json ){
def jSort= json.????
JSONArray jsonArray = new JSONArray();
jsonArray.add(jSort);
}
You can just sort entries before adding them. The following uses collectEntries, which creates LinkedHashMap objects (thus preserving order):
def json = new groovy.json.JsonSlurper().parseText(test);
def sortedJson = json.collect{map -> map.entrySet().sort{it.key}
.collectEntries{[it.key, it.value]}}
sortedJson has this content, which seems to be sorted as required:
[[AF:test1, BE:test2, CD:test3, DC:test4, EB:test5, FA:test5],
[AF:test1, BE:test2, CD:test3, DC:test4, EB:test5, FA:test5]]
I currently have this JSON:
[
{
"example": "12345678",
"test": "0",
"name": "tom",
"testdata": "",
"testtime": 1531209885613
},
{
"example": "12634346",
"test": "43223452234",
"name": "jerry",
"testdata": "pawenkls",
"testtime": 1531209888196
}
]
I am trying to parse through the array to find a value of "testdata" that matches the value of "testdata" that I have generated, which I am currently doing like so:
JsonArray entries = (JsonArray) new JsonParser().parse(blockchainJson);
JsonElement dataHash = ((JsonObject)entries.get(i)).get("dataHash");
Then I wish to find the value of "example" that is in the same array as the "testdata" with the value "pawenkls".
How do I search for the "example" value that is in the same group as the value of "test data" that I have found?
You need to run through the objects in the array and check the value of the testData field against yours. Then read its example field.
String testData = "pawenkls";
JsonArray entries = (JsonArray) new JsonParser().parse(blockchainJson);
String example = null;
for(JsonElement dataHashElement : entries) {
JsonObject currentObject = dataHashElement.getAsJsonObject();
if(testData.equals(currentObject.get("testdata").getAsString())) {
example = currentObject.get("example").getAsString();
break;
}
}
System.out.println("example: "+example);
This prints out
example: 12634346
Here is a Java 8 version doing the same thing:
String testData = "pawenkls";
JsonObject[] objects = new Gson().fromJson(blockchainJson, JsonObject[].class);
Optional<JsonObject> object = Arrays.stream(objects)
.filter(o -> testData.equals(o.get("testdata").getAsString()))
.findFirst();
String example = null;
if(object.isPresent())
example = object.get().get("example").getAsString();
System.out.println("example: "+example);
I have an JSON that looks like this:
{ "Message": "None", "PDFS": [
[
"test.pdf",
"localhost/",
"777"
],
[
"retest.pdf",
"localhost\",
"666"
] ], "Success": true }
I'm trying to access the individual strings within the arrays but I'm having difficulty doing it as getString is requiring me to use a key and not indexes.
I've tried this to access the first string in each sub-array:
JSONArray pdfArray = resultJson.getJSONArray("PDFS");
for (int i = 0; i < pdfArray.length(); i++) {
JSONObject pdfObject = pdfArray.getJSONObject(i);
String fileName = pdfObject.getString(0);
}
Read the array as an array:
JSONArray array = pdfArray.getJSONArray(i);
String fileName = array.getString(0);
import lemurproject.indri.*;
import java.io.*;
public class Indritest {
public static void main(String[] args) throws Exception {
String [] stopWordList = {"a", "an", "and", "are", "as", "at", "be",
"by","for", "from", "has", "he", "in", "is",
"it", "its", "of", "on", "that", "the", "to",
"was", "were", "will", "with"};
String myIndex = "C:/Program Files/lemur/lemur4.12/src/app/obj/myIndex5";
try {
IndexEnvironment envI = new IndexEnvironment();
envI.setStoreDocs(true);
// create an Indri repository
envI.setMemory(256000000);
envI.setStemmer("krovetz");
envI.setStopwords(stopWordList);
envI.setIndexedFields( new String[] {"article", "header", "p", "title", "link"});
envI.open(myIndex);
envI.create( myIndex );
// add xml files to the just created index i.e myIndex
// xml_data is a folder which contains the list of xml files to be added
File filesDir = new File("C:/NetbeanProg2/xml_data");
File[] files = filesDir.listFiles();
int noOffiles = files.length;
for (int i = 0; i < noOffiles; i++) {
System.out.println(files[i].getCanonicalPath() + "\t" + files[i].getCanonicalFile());
envI.addFile(files[i].getCanonicalPath(), "xml");
}
} catch (Exception e) {
System.out.println("issue is: " + e);
}
}
}
I have found this code from a tutorial but it isn't working. It's giving me an exception.
Exception in thread "main" java.lang.UnsatisfiedLinkError: C:\Program Files\Indri\Indri 5.9\bin\indri_jni.dll: Can't find dependent libraries
In the myindex variable I have provided the path of my IndexUI.jar file.
I am new to indri. I have not much idea about its usage. I have downloaded indri 5.9
issue was the version of indri