finding the most popular word in a person's tweets - java

In a project, I'm trying to query the tweets of a particular user's handle and find the most common word in the user's tweets and also return the frequency of that most common word.
Below is my code:
public String mostPopularWord()
{
this.removeCommonEnglishWords();
this.sortAndRemoveEmpties();
Map<String, Integer> termsCount = new HashMap<>();
for(String term : terms)
{
Integer c = termsCount.get(term);
if(c==null)
c = new Integer(0);
c++;
termsCount.put(term, c);
}
Map.Entry<String,Integer> mostRepeated = null;
for(Map.Entry<String, Integer> curr: termsCount.entrySet())
{
if(mostRepeated == null || mostRepeated.getValue()<curr.getValue())
mostRepeated = curr;
}
//frequencyMax = termsCount.get(mostRepeated.getKey());
try
{
frequencyMax = termsCount.get(mostRepeated.getKey());
return mostRepeated.getKey();
}
catch (NullPointerException e)
{
System.out.println("Cannot find most popular word from the tweets.");
}
return "";
}
I also think it would help to show the codes for the first two methods I call in the method above, as shown below. They are all in the same class, with the following defined:
private Twitter twitter;
private PrintStream consolePrint;
private List<Status> statuses;
private List<String> terms;
private String popularWord;
private int frequencyMax;
#SuppressWarnings("unchecked")
public void sortAndRemoveEmpties()
{
Collections.sort(terms);
terms.removeAll(Arrays.asList("", null));
}
private void removeCommonEnglishWords()
{
Scanner sc = null;
try
{
sc = new Scanner(new File("commonWords.txt"));
}
catch(Exception e)
{
System.out.println("The file is not found");
}
List<String> commonWords = new ArrayList<String>();
int count = 0;
while(sc.hasNextLine())
{
count++;
commonWords.add(sc.nextLine());
}
Iterator<String> termIt = terms.iterator();
while(termIt.hasNext())
{
String term = termIt.next();
for(String word : commonWords)
if(term.equalsIgnoreCase(word))
termIt.remove();
}
}
I apologise for the rather long code snippets. But one frustrating thing is that even though my removeCommonEnglish() method is apparently right (discussed in another post), when I run the mostPopularWord(), it returns "the", which is clearly a part of the common English Words list that I have and meant to eliminate from the List terms. What might I be doing wrong?
UPDATE 1:
Here is the link ot the commonWords file:
https://drive.google.com/file/d/1VKNI-b883uQhfKLVg-L8QHgPTLNb22uS/view?usp=sharing
UPDATE 2: One thing I've noticed while debugging is that the
while(sc.hasNext())
in removeCommonEnglishWords() is entirely skipped. I don't understand why, though.

It can be more simple if you use stream like so :
String mostPopularWord() {
return terms.stream()
.collect(Collectors.groupingBy(s -> s, Collectors.counting()))
.entrySet().stream()
.sorted(Map.Entry.comparingByValue(Comparator.reverseOrder()))
.findFirst()
.map(Map.Entry::getKey)
.orElse("");
}

I tried your code. Here is what you will have to do. Replace the following part in removeCommonEnglishWords()
Iterator<String> termIt = terms.iterator();
while(termIt.hasNext())
{
String term = termIt.next();
for(String word : commonWords)
if(!term.equalsIgnoreCase(word))
reducedTerms.add( term );
}
with this:
List<String> reducedTerms = new ArrayList<>();
for( String term : this.terms ) {
if( !commonWords.contains( term ) ) reducedTerms.add( term );
}
this.terms = reducedTerms;
Since you hadn't provided the class, I created one with some assumptions, but I think this code will go through.

A slightly different approach using streams.
This uses the relatively common frequency count idiom using streams and stores them in a map.
It then does a simple scan to find the largest count obtained and either returns
that word or the string "No words found".
It also filters out the words in a Set<String> called ignore so you need to create that too.
import java.util.Arrays;
import java.util.Comparator;
import java.util.Map;
import java.util.Map.Entry;
import java.util.stream.Collectors;
Set<String> ignore = Set.of("the", "of", "and", "a",
"to", "in", "is", "that", "it", "he", "was",
"you", "for", "on", "are", "as", "with",
"his", "they", "at", "be", "this", "have",
"via", "from", "or", "one", "had", "by",
"but", "not", "what", "all", "were", "we",
"RT", "I", "&", "when", "your", "can",
"said", "there", "use", "an", "each",
"which", "she", "do", "how", "their", "if",
"will", "up", "about", "out", "many",
"then", "them", "these", "so", "some",
"her", "would", "make", "him", "into",
"has", "two", "go", "see", "no", "way",
"could", "my", "than", "been", "who", "its",
"did", "get", "may", "…", "#", "??", "I'm",
"me", "u", "just", "our", "like");
Map.Entry<String, Long> entry = terms.stream()
.filter(wd->!ignore.contains(wd)).map(String::trim)
.collect(Collectors.groupingBy(a -> a,
Collectors.counting()))
.entrySet().stream()
.collect(Collectors.maxBy(Comparator
.comparing(Entry::getValue)))
.orElse(Map.entry("No words found", 0L));
System.out.println(entry.getKey() + " " + entry.getValue());

Related

Using streams to extract specific entries of a List of Maps in to a new Map

Given a org.bson.Document
{
"doneDate":"",
"todoEstimates":"",
"forecastDate":"",
"cardType":{
"projectData":[
{
"color":"#ffcd03",
"boardId":"30022"
},
{
"color":"#ffcd03",
"boardId":"1559427"
}
],
"cardFields":[
{
"fieldName":"id",
"fieldLabel":"Unique ID",
"fieldType":"Integer",
"itemType":"Long",
"isRequired":"NO",
"isReadOnly":"Yes",
"isDisabled":"NO",
"inputMethod":"System Generated",
"defaultValue":null,
"isUserType":"No"
},
{
"fieldName":"name",
"fieldLabel":"Title",
"fieldType":"Single-Line Text",
"itemType":"String",
"isRequired":"Yes",
"isReadOnly":"NO",
"isDisabled":"NO",
"inputMethod":"Manual Entry",
"defaultValue":null,
"isUserType":"No"
}
]
}
How do I extract the values of fieldName and fieldLabel via streams into the following?
{
"id": "Unique ID",
"name:" "Title",
...
}
I tried the following but I get stuck at the part where I get value of the cardFields list.
document.entrySet().stream().filter(e -> e.getKey().equals("cardType"))
.collect(Collectors.toMap(Map.Entry::getKey, Map.Entry::getValue))
.entrySet().stream().filter(e -> e.getKey().equals("cardFields"))
.map(e -> (Map)e.getValue()).toList();
Here is a working solution with streams:
Map<String, Object> fields = ((List<Map<String, Object>>) ((Map<String, Object>) document.entrySet()
.stream()
.filter(entry -> entry.getKey().equals("cardType"))
.findFirst()
.orElseThrow(() -> new RuntimeException("card type not found"))
.getValue())
.entrySet()
.stream()
.filter(entry -> entry.getKey().equals("cardFields"))
.findFirst()
.orElseThrow(() -> new RuntimeException("card fields not found"))
.getValue())
.stream()
.collect(Collectors.toMap(el -> el.get("fieldName").toString(), element -> element.get("fieldLabel")));
Document result = new Document(fields);
System.out.println(result.toJson());
That's probably the worst code i have written - absolutely unreadable and you can't debug it. I would suggest that you do not use stream for this particular task, it isn't the right tool for it. So here is another working solution using Map.get(key):
Map<String, Object> cardType = (Map<String, Object>) document.get("cardType");
List<Map<String, Object>> cardFields = (List<Map<String, Object>>) cardType.get("cardFields");
Document result = new Document();
cardFields.forEach(cardField -> result.put((String) cardField.get("fieldName"), cardField.get("fieldLabel")));
System.out.println(result.toJson());
This is shorter, readable, you can debug it if needed and probably it's more performant. I'd say it's much better overall.
You may be able to parse your document like this:
Document cardType = document.get("cardType", Document.class);
final Class<? extends List> listOfMaps = new ArrayList<Map<String, String>>().getClass();
List<Map<String, String>> fields = cardType.get("cardFields", listOfMaps);
fields.stream().map(f -> {
System.out.println(f.get("fieldName") + ": " + f.get("fieldLabel"));
// here you can construct your new object
}).collect(Collectors.toList());
If you don't mind casting a lot, you could try following:
List cardFields = (List) ((Map) document.get("cardType")).get("cardFields");
Map<String, String> map = (Map) cardFields.stream()
.collect(Collectors.toMap(cf -> ((Document) cf).getString("fieldName"),
cv -> ((Document) cv).getString("fieldLabel")));
System.out.println(map);
or you can emit omit the casting with the following:
List<Document> carFields = document.get("cardType", Document.class)
.getList("cardFields", Document.class);
Map<String, String> map = carFields.stream()
.collect(Collectors.toMap(k -> k.getString("fieldName"),
v -> v.getString("fieldLabel")));
System.out.println(map);
Here is the complete example running with java 17:
import org.bson.Document;
import java.util.List;
import java.util.Map;
import java.util.stream.Collectors;
public class Bson {
private String json=
"""
{
"doneDate": "",
"todoEstimates": "",
"forecastDate": "",
"cardType": {
"projectData": [
{
"color": "#ffcd03",
"boardId": "30022"
},
{
"color": "#ffcd03",
"boardId": "1559427"
}
],
"cardFields": [
{
"fieldName": "id",
"fieldLabel": "Unique ID",
"fieldType": "Integer",
"itemType": "Long",
"isRequired": "NO",
"isReadOnly": "Yes",
"isDisabled": "NO",
"inputMethod": "System Generated",
"defaultValue": null,
"isUserType": "No"
},
{
"fieldName": "name",
"fieldLabel": "Title",
"fieldType": "Single-Line Text",
"itemType": "String",
"isRequired": "Yes",
"isReadOnly": "NO",
"isDisabled": "NO",
"inputMethod": "Manual Entry",
"defaultValue": null,
"isUserType": "No"
}
]
}
}
""";
public static void main(String[] args) {
Bson bson = new Bson();
Document document = Document.parse(bson.json);
List cardType = (List) ((Map) document.get("cardType")).get("cardFields");
Map<String, String> map = (Map) cardType.stream()
.collect(Collectors.toMap(cf -> ((Document) cf).getString("fieldName"),
cv -> ((Document) cv).getString("fieldLabel")));
System.out.println(map);
List<Document> carFields = document.get("cardType", Document.class).getList("cardFields", Document.class);
Map<String, String> map1 = carFields.stream()
.collect(Collectors.toMap(k -> k.getString("fieldName"), v -> v.getString("fieldLabel")));
System.out.println(map1);
}
}

Remove elements from List of objects by comparing with another array in Java

I have a list of subscriptions
subscriptions = [
{
code : "Heloo",
value:"some value",
reason : "some reason"
},
{
code : "Byeee",
value:"some byee",
reason : "some byee"
},
{
code : "World",
value:"some World",
reason : "some world"
}
]
I have another list of unsubscriptions:
unsubscribe : ["Heloo","World"]
I want to unsubscribe elements in the subscriptions by comparing these two arrays
Final Result :
subscriptions = [
{
code : "Byeee",
value:"some byee value",
reason : "some byee reason"
}
]
Below is my solution :
List<String> newList = new ArrayList<>();
for (String products : subscriptions) {
newList.add(products.code);
}
if (!newList.containsAll(unsubscribe) {
log.error("Few products are not subscribed");
}
for (int i = 0; i < subscriptions.size(); i++) {
if(unsubscribe.contains(subscriptions.get(i).code)) {
subscriptions.remove(i);
}
}
This could be better . I am looking for a better/optimized solution.
Using removeIf will clean up your code considerably:
List<Subscription> subscriptions = ... ;
List<String> unsubscribe = ...;
subscriptions.removeIf(s -> unsubscribe.contains(s.code));
You can also do it with streams:
List<String> newList = subscriptions
.stream()
.filter(it -> !unsubscribe.contains(it.code))
.collect(Collectors.toList());

Compare two String[] arrays and print out the strings which differ

I've got a list of all the file names in a folder and a list of files which have been manually "checked" by a developer. How would I go about comparing the two arrays such that we print out only those which are not contained in the master list.
public static void main(String[] args) throws java.lang.Exception {
String[] list = {"my_purchases", "my_reservation_history", "my_reservations", "my_sales", "my_wallet", "notifications", "order_confirmation", "payment", "payment_methods", "pricing", "privacy", "privacy_policy", "profile_menu", "ratings", "register", "reviews", "search_listings", "search_listings_forms", "submit_listing", "submit_listing_forms", "terms_of_service", "transaction_history", "trust_verification", "unsubscribe", "user", "verify_email", "verify_shipping", "404", "account_menu", "auth", "base", "dashboard_base", "dashboard_menu", "fiveohthree", "footer", "header", "header_menu", "listings_menu", "main_searchbar", "primary_navbar"};
String[] checked = {"404", "account_menu", "auth", "base", "dashboard_base", "dashboard_menu", "fiveohthree", "footer", "header", "header_menu", "listings_menu"};
ArrayList<String> ar = new ArrayList<String>();
for(int i = 0; i < checked.length; i++)
{
if(!Arrays.asList(list).contains(checked[i]))
ar.add(checked[i]);
}
}
Change your loop to :
ArrayList<String> ar = new ArrayList<String>();
for(int i = 0; i < checked.length; i++) {
if(!Arrays.asList(list).contains(checked[i]))
ar.add(checked[i]);
}
ArrayList ar should be outside of the for loop. Otherwise ar will be created every time when element of checked array exists in list.
Edit:
if(!Arrays.asList(list).contains(checked))
With this statement you are checking whether the checked reference is not the element of list. It should be checked[i] to check whether the element of checked exists in list or not.
If you want to print elements in list that are not in checked. Then use :
for(int i = 0; i < list.length; i++) {
if(!Arrays.asList(checked).contains(list[i]))
ar.add(list[i]);
}
System.out.println(ar);
Your updated solution seems kind of odd to me, not sure why you would add list[i] to the result list. Generally this sounds like something hashsets are made for:
String[] list = { "my_purchases", "my_reservation_history","my_reservations","my_sales", "my_wallet", "notifications", "order_confirmation", "payment", "payment_methods", "pricing", "privacy", "privacy_policy", "profile_menu", "ratings", "register", "reviews", "search_listings", "search_listings_forms", "submit_listing", "submit_listing_forms", "terms_of_service", "transaction_history", "trust_verification", "unsubscribe", "user", "verify_email", "verify_shipping", "404", "account_menu", "auth", "base", "dashboard_base", "dashboard_menu", "fiveohthree", "footer", "header", "header_menu", "listings_menu", "main_searchbar", "primary_navbar"};
String[] checked = { "404", "account_menu", "auth", "base", "dashboard_base", "dashboard_menu", "fiveohthree", "footer", "header", "header_menu", "listings_menu"};
HashSet<String> s1 = new HashSet<String>(Arrays.asList(checked));
s1.removeAll(Arrays.asList(list));
System.out.println(s1);
for (String s: checked) { // go through all in second list
if (! list.contains(s)) { // if string not in master list
System.out.println(s); // print that string
}
}
First of all, I think your code has some errors:
s1 is not defined
ar is not defined
you mean to use Arrays.toString instead of Array.toString
So I fixed your code (using Java 8) and it should work like that:
public static void main(String[] args) throws java.lang.Exception {
String[] list = {"my_purchases", "my_reservation_history", "my_reservations", "my_sales", "my_wallet", "notifications", "order_confirmation", "payment", "payment_methods", "pricing", "privacy", "privacy_policy", "profile_menu", "ratings", "register", "reviews", "search_listings", "search_listings_forms", "submit_listing", "submit_listing_forms", "terms_of_service", "transaction_history", "trust_verification", "unsubscribe", "user", "verify_email", "verify_shipping", "404", "account_menu", "auth", "base", "dashboard_base", "dashboard_menu", "fiveohthree", "footer", "header", "header_menu", "listings_menu", "main_searchbar", "primary_navbar"};
String[] checked = {"404", "account_menu", "auth", "base", "dashboard_base", "dashboard_menu", "fiveohthree", "footer", "header", "header_menu", "listings_menu"};
final List<String> result = Stream.of(list)
.filter(listEntry -> Stream.of(checked)
.filter(checkedEntry -> checkedEntry.equals(listEntry)).findFirst().orElse(null) == null)
.collect(Collectors.toList());
System.out.println(result);
}
If you don't want to use Java 8, you have to replace the usage of Streams and filters and collect with the appropriate functions in Java 7 (see e.g., Satya's post).
Anyways, I should mention that there are better (regarding performance) implementations to solve your problem, e.g.,
you could sort your lists prior to searching for duplicates,
you could use, e.g., hash-based implementations to increase the speed when searching for duplicates,
you could move the code outside of the inner loop,
and many more

How to use indri for indexing in java?

import lemurproject.indri.*;
import java.io.*;
public class Indritest {
public static void main(String[] args) throws Exception {
String [] stopWordList = {"a", "an", "and", "are", "as", "at", "be",
"by","for", "from", "has", "he", "in", "is",
"it", "its", "of", "on", "that", "the", "to",
"was", "were", "will", "with"};
String myIndex = "C:/Program Files/lemur/lemur4.12/src/app/obj/myIndex5";
try {
IndexEnvironment envI = new IndexEnvironment();
envI.setStoreDocs(true);
// create an Indri repository
envI.setMemory(256000000);
envI.setStemmer("krovetz");
envI.setStopwords(stopWordList);
envI.setIndexedFields( new String[] {"article", "header", "p", "title", "link"});
envI.open(myIndex);
envI.create( myIndex );
// add xml files to the just created index i.e myIndex
// xml_data is a folder which contains the list of xml files to be added
File filesDir = new File("C:/NetbeanProg2/xml_data");
File[] files = filesDir.listFiles();
int noOffiles = files.length;
for (int i = 0; i < noOffiles; i++) {
System.out.println(files[i].getCanonicalPath() + "\t" + files[i].getCanonicalFile());
envI.addFile(files[i].getCanonicalPath(), "xml");
}
} catch (Exception e) {
System.out.println("issue is: " + e);
}
}
}
I have found this code from a tutorial but it isn't working. It's giving me an exception.
Exception in thread "main" java.lang.UnsatisfiedLinkError: C:\Program Files\Indri\Indri 5.9\bin\indri_jni.dll: Can't find dependent libraries
In the myindex variable I have provided the path of my IndexUI.jar file.
I am new to indri. I have not much idea about its usage. I have downloaded indri 5.9
issue was the version of indri

Extract data inside nested braces

I want to extract content between the first nested braces and second nested braces separately. Now I am totally stuck with this can anyone help me. My file read.txt contains the below data . I just read that to a string "s".
BufferedReader br=new BufferedReader(new FileReader("read.txt"));
while(br.ready())
{
String s=br.readLine();
System.out.println(s);
}
Output
{ { "John", "ran" }, { "NOUN", "VERB" } },
{ { "The", "dog", "jumped"}, { "DET", "NOUN", "VERB" } },
{ { "Mike","lives","in","Poland"}, {"NOUN","VERB","DET","NOUN"} },
ie my output should look like
"John", "ran"
"NOUN", "VERB"
"The", "dog", "jumped"
"DET", "NOUN", "VERB"
"Mike","lives","in","Poland"
"NOUN","VERB","DET","NOUN"
Use this regex:
(?<=\{)(?!\s*\{)[^{}]+
See the matches in the Regex Demo.
In Java:
Pattern regex = Pattern.compile("(?<=\\{)(?!\\s*\\{)[^{}]+");
Matcher regexMatcher = regex.matcher(subjectString);
while (regexMatcher.find()) {
// matched text: regexMatcher.group()
}
Explanation
The lookbehind (?<=\{) asserts that what precedes the current position is a {
The negative lookahead (?!\s*\{) asserts that what follows is not optional whitespace then {
[^{}]+ matches any chars that are not curlies
If you split on "}," then you get your sets of words in a single string, then just a matter of replacing curly braces
As per your code
BufferedReader br=new BufferedReader(new FileReader("read.txt"));
while(br.ready())
{
String s=br.readLine();
String [] words = s.split ("},");
for (int x = 0; x < words.length; x++) {
String printme = words[x].replace("{", "").replace("}", "");
}
}
You could always remove the opening brackets, then split by '},' which would leave you with the list of strings you've asked for. (If that is all one string, of course)
String s = input.replace("{","");
String[] splitString = s.split("},");
Would first remove open brackets:
"John", "ran" }, "NOUN", "VERB" } },
"The", "dog", "jumped"}, "DET", "NOUN", "VERB" } },
"Mike","lives","in","Poland"},"NOUN","VERB","DET","NOUN"} },
Then would split by },
"John", "ran"
"NOUN", "VERB" }
"The", "dog", "jumped"
"DET", "NOUN", "VERB" }
"Mike","lives","in","Poland"
"NOUN","VERB","DET","NOUN"}
Then you just need to tidy them up with another replace!
Another approach could be searching for {...} substring with no inner { or } characters and take only its inner part without { and }.
Regex describing such substring can look like
\\{(?<content>[^{}]+)\\}
Explanation:
\\{ is escaped { so now it represents { literal (normally it represents start of quantifier {x,y} so it needed to be escaped)
(?<content>...) is named-capturing group, it will store only part between { and } and later we would be able to use this part (instead of entire match which would also include { })
[^{}]+ represents one or more non { } characters
\\} escaped } which means it represents }
DEMO:
String input = "{ { \"John\", \"ran\" }, { \"NOUN\", \"VERB\" } },\r\n" +
"{ { \"The\", \"dog\", \"jumped\"}, { \"DET\", \"NOUN\", \"VERB\" } },\r\n" +
"{ { \"Mike\",\"lives\",\"in\",\"Poland\"}, {\"NOUN\",\"VERB\",\"DET\",\"NOUN\"} },";
Pattern p = Pattern.compile("\\{(?<content>[^{}]+)\\}");
Matcher m = p.matcher(input);
while(m.find()){
System.out.println(m.group("content").trim());
}
Output:
"John", "ran"
"NOUN", "VERB"
"The", "dog", "jumped"
"DET", "NOUN", "VERB"
"Mike","lives","in","Poland"
"NOUN","VERB","DET","NOUN"

Categories