Find the most common word from user input - java

I'm very new to Java creating a software application that allows a user to input text into a field and the program runs through all of the text and identifies what the most common word is. At the moment, my code looks like this:
JButton btnMostFrequentWord = new JButton("Most Frequent Word");
btnMostFrequentWord.addActionListener(new ActionListener() {
public void actionPerformed(ActionEvent e) {
String text = textArea.getText();
String[] words = text.split("\\s+");
HashMap<String, Integer> occurrences = new HashMap<String, Integer>();
for (String word : words) {
int value = 0;
if (occurrences.containsKey(word)) {
value = occurrences.get(word);
}
occurrences.put(word, value + 1);
}
JOptionPane.showMessageDialog(null, "Most Frequent Word: " + occurrences.values());
}
}
This just prints what the values of the words are, but I would like it to tell me what the number one most common word is instead. Any help would be really appreciated.

Just after your for loop, you can sort the map by value then reverse the sorted entries by value and select the first.
for (String word: words) {
int value = 0;
if (occurrences.containsKey(word)) {
value = occurrences.get(word);
}
occurrences.put(word, value + 1);
}
Map.Entry<String,Integer> tempResult = occurrences.entrySet().stream()
.sorted(Map.Entry.<String, Integer>comparingByValue().reversed())
.findFirst().get();
JOptionPane.showMessageDialog(null, "Most Frequent Word: " + tempResult.getKey());

For anyone who is more familiar with Java, here is a very easy way to do it with Java 8:
List<String> words = Arrays.asList(text.split("\\s+"));
Collections.sort(words, Comparator.comparingInt(word -> {
return Collections.frequency(words, word);
}).reversed());
The most common word is stored in words.get(0) after sorting.

I would do something like this
int max = 0;
String a = null;
for (String word : words) {
int value = 0;
if(occurrences.containsKey(word)){
value = occurrences.get(word);
}
occurrences.put(word, value + 1);
if(max < value+1){
max = value+1;
a = word;
}
}
System.out.println(a);
You could sort it, and the solution would be much shorter, but I think this runs faster.

You can either iterate through occurrences map and find the max or
Try like below
String text = textArea.getText();;
String[] words = text.split("\\s+");
HashMap<String, Integer> occurrences = new HashMap<>();
int mostFreq = -1;
String mostFreqWord = null;
for (String word : words) {
int value = 0;
if (occurrences.containsKey(word)) {
value = occurrences.get(word);
}
value = value + 1;
occurrences.put(word, value);
if (value > mostFreq) {
mostFreq = value;
mostFreqWord = word;
}
}
JOptionPane.showMessageDialog(null, "Most Frequent Word: " + mostFreqWord);

Related

Creating a table in a for loop java

I have 2 arrays which values are words, each word in the first table is associated with a text (String), now each word from second table is showing how many times (int) is repeating in text (String). The expected table should to be like:
This is the code that I've written so far:
keyW = txtKeyword.getText();
search = textField.getText();
System.out.println("String for car = " + search);
System.out.println("String keyword = " + keyW);
btnUpload.setEnabled(false);
btnNewButton_1.setEnabled(false);
btnNewButton.setEnabled(false);
txtKeyword.setEnabled(false);
textField.setEditable(false);
//waitLabel.setVisible(true);
int iar = 0;
int item;
Map<String, Integer> dictionary = new HashMap<String, Integer>();
String[] searchArray = search.split(",");
String[] itemsFromArray1 = new String[searchArray.length];
//Keyword1 = ("Searched Key"+ "\r\n\t ");
//listKeys.add(Keyword1);
for (iar = 0; iar < searchArray.length; iar++) {
itemsFromArray1[iar] = searchArray[iar].trim();
Keyword1 = (searchArray[iar]);
//listKeys.add(Keyword1);
}
String[] items = keyW.split(",");
for (item = 0; item < searchArray.length; item++) {
WebDriver driver = new HtmlUnitDriver();
((HtmlUnitDriver) driver).setJavascriptEnabled(true);
driver.get("https://en.wikipedia.org/wiki/" + searchArray[item]);
tstr1 = driver.findElement(By.xpath("//*[#id='content']")).getText();
driver.quit();
String[] itemsFromArray = new String[items.length];
for (int i = 0; i < items.length; i++) {
itemsFromArray[i] = items[i].trim();
}
for (String string : itemsFromArray) {
int i = countWords(tstr1, string);
dictionary.put(searchArray[item].concat(string), i);
System.out.println("ARRAY " + dictionary);
}
}
private static int countWords(String tstr1, String string) {
tstr1 = tstr1.toLowerCase();
string = string.toLowerCase();
int posCount = 0;
String positive = string;
Pattern positivePattern = Pattern.compile(positive);
Matcher matcher = positivePattern.matcher(tstr1);
while (matcher.find()) {
posCount++;
}
return posCount;
}
I tried to achieve this with Map<String, Integer> dictionary = new HashMap<String, Integer>(); but the results (dictionary.put(searchArray[item], i);) are wrong. Can anyone give me an idea how to solve this. Thanks!
****UPDATE****
Now the results in the console is something like this:
ARRAY { boyanimal=4, catfree=18, catanimal=60, boyfree=2, catgender=0, boygender=6, windowfree=5}
ARRAY { boyanimal=4, catfree=18, catanimal=60, boyfree=2, windowanimal=4, catgender=0, boygender=6, windowfree=5}
ARRAY { boyanimal=4, catfree=18, catanimal=60, boyfree=2, windowanimal=4, catgender=0, boygender=6, windowgender=0, windowfree=5}
There are values that are repeting. How to make to show just like a table?
Try using this:
Map<String, Integer> tableMap = new HashMap<String, Integer>();
keep the key as:
tableMap.put("Word1-Search1",23);
Using this, you will always have a unique combination for each key.
I hope you don't want to store the data in a data structure? Instead you should use a 2 dimensional String array to store it.
Answering your latest update:
I think you're getting multiple copies because of this line.
dictionary.put(searchArray[item].concat(string), i);
I think the concat is being applied to the entire row of elements. I would use my debugger to analyze this and see what the value of searchArray[item] is and what the value of string is.

Java Inverted Index program

I am writing an inverted index program on java which returns the frequency of terms among multiple documents. I have been able to return the number times a word appears in the entire collection, but I have not been able to return which documents the word appears in. This is the code I have so far:
import java.util.*; // Provides TreeMap, Iterator, Scanner
import java.io.*; // Provides FileReader, FileNotFoundException
public class Run
{
public static void main(String[ ] args)
{
// **THIS CREATES A TREE MAP**
TreeMap<String, Integer> frequencyData = new TreeMap<String, Integer>( );
Map[] mapArray = new Map[5];
mapArray[0] = new HashMap<String, Integer>();
readWordFile(frequencyData);
printAllCounts(frequencyData);
}
public static int getCount(String word, TreeMap<String, Integer> frequencyData)
{
if (frequencyData.containsKey(word))
{ // The word has occurred before, so get its count from the map
return frequencyData.get(word); // Auto-unboxed
}
else
{ // No occurrences of this word
return 0;
}
}
public static void printAllCounts(TreeMap<String, Integer> frequencyData)
{
System.out.println("-----------------------------------------------");
System.out.println(" Occurrences Word");
for(String word : frequencyData.keySet( ))
{
System.out.printf("%15d %s\n", frequencyData.get(word), word);
}
System.out.println("-----------------------------------------------");
}
public static void readWordFile(TreeMap<String, Integer> frequencyData)
{
int total = 0;
Scanner wordFile;
String word; // A word read from the file
Integer count; // The number of occurrences of the word
int counter = 0;
int docs = 0;
//**FOR LOOP TO READ THE DOCUMENTS**
for(int x=0; x<Docs.length; x++)
{ //start of for loop [*
try
{
wordFile = new Scanner(new FileReader(Docs[x]));
}
catch (FileNotFoundException e)
{
System.err.println(e);
return;
}
while (wordFile.hasNext( ))
{
// Read the next word and get rid of the end-of-line marker if needed:
word = wordFile.next( );
// This makes the Word lower case.
word = word.toLowerCase();
word = word.replaceAll("[^a-zA-Z0-9\\s]", "");
// Get the current count of this word, add one, and then store the new count:
count = getCount(word, frequencyData) + 1;
frequencyData.put(word, count);
total = total + count;
counter++;
docs = x + 1;
}
} //End of for loop *]
System.out.println("There are " + total + " terms in the collection.");
System.out.println("There are " + counter + " unique terms in the collection.");
System.out.println("There are " + docs + " documents in the collection.");
}
// Array of documents
static String Docs [] = {"words.txt", "words2.txt",};
Instead of simply having a Map from word to count, create a Map from each word to a nested Map from document to count. In other words:
Map<String, Map<String, Integer>> wordToDocumentMap;
Then, inside your loop which records the counts, you want to use code which looks like this:
Map<String, Integer> documentToCountMap = wordToDocumentMap.get(currentWord);
if(documentToCountMap == null) {
// This word has not been found anywhere before,
// so create a Map to hold document-map counts.
documentToCountMap = new TreeMap<>();
wordToDocumentMap.put(currentWord, documentToCountMap);
}
Integer currentCount = documentToCountMap.get(currentDocument);
if(currentCount == null) {
// This word has not been found in this document before, so
// set the initial count to zero.
currentCount = 0;
}
documentToCountMap.put(currentDocument, currentCount + 1);
Now you're capturing the counts on a per-word and per-document basis.
Once you've completed the analysis and you want to print a summary of the results, you can run through the map like so:
for(Map.Entry<String, Map<String,Integer>> wordToDocument :
wordToDocumentMap.entrySet()) {
String currentWord = wordToDocument.getKey();
Map<String, Integer> documentToWordCount = wordToDocument.getValue();
for(Map.Entry<String, Integer> documentToFrequency :
documentToWordCount.entrySet()) {
String document = documentToFrequency.getKey();
Integer wordCount = documentToFrequency.getValue();
System.out.println("Word " + currentWord + " found " + wordCount +
" times in document " + document);
}
}
For an explanation of the for-each structure in Java, see this tutorial page.
For a good explanation of the features of the Map interface, including the entrySet method, see this tutorial page.
Try adding second map word -> set of document name like this:
Map<String, Set<String>> filenames = new HashMap<String, Set<String>>();
...
word = word.replaceAll("[^a-zA-Z0-9\\s]", "");
// Get the current count of this word, add one, and then store the new count:
count = getCount(word, frequencyData) + 1;
frequencyData.put(word, count);
Set<String> filenamesForWord = filenames.get(word);
if (filenamesForWord == null) {
filenamesForWord = new HashSet<String>();
}
filenamesForWord.add(Docs[x]);
filenames.put(word, filenamesForWord);
total = total + count;
counter++;
docs = x + 1;
When you need to get a set of filenames in which you encountered a particular word, you'll just get() it from the map filenames. Here is the example that prints out all the file names, in which we have encountered a word:
public static void printAllCounts(TreeMap<String, Integer> frequencyData, Map<String, Set<String>> filenames) {
System.out.println("-----------------------------------------------");
System.out.println(" Occurrences Word");
for(String word : frequencyData.keySet( ))
{
System.out.printf("%15d %s\n", frequencyData.get(word), word);
for (String filename : filenames.get(word)) {
System.out.println(filename);
}
}
System.out.println("-----------------------------------------------");
}
I've put a scanner into the main methode, and the word I search for will return the documents the word occurce in. I also return how many times the word occurs, but I will only get it to be the total of times in all of three documents. And I want it to return how many times it occurs in each document. I want this to be able to calculate tf-idf, if u have a total answer for the whole tf-idf I would appreciate. Cheers
Here is my code:
import java.util.*; // Provides TreeMap, Iterator, Scanner
import java.io.*; // Provides FileReader, FileNotFoundException
public class test2
{
public static void main(String[ ] args)
{
// **THIS CREATES A TREE MAP**
TreeMap<String, Integer> frequencyData = new TreeMap<String, Integer>();
Map<String, Set<String>> filenames = new HashMap<String, Set<String>>();
Map<String, Integer> countByWords = new HashMap<String, Integer>();
Map[] mapArray = new Map[5];
mapArray[0] = new HashMap<String, Integer>();
readWordFile(countByWords, frequencyData, filenames);
printAllCounts(countByWords, frequencyData, filenames);
}
public static int getCount(String word, TreeMap<String, Integer> frequencyData)
{
if (frequencyData.containsKey(word))
{ // The word has occurred before, so get its count from the map
return frequencyData.get(word); // Auto-unboxed
}
else
{ // No occurrences of this word
return 0;
}
}
public static void printAllCounts( Map<String, Integer> countByWords, TreeMap<String, Integer> frequencyData, Map<String, Set<String>> filenames)
{
System.out.println("-----------------------------------------------");
System.out.print("Search for a word: ");
String worde;
int result = 0;
Scanner input = new Scanner(System.in);
worde=input.nextLine();
if(!filenames.containsKey(worde)){
System.out.println("The word does not exist");
}
else{
for(String filename : filenames.get(worde)){
System.out.println(filename);
System.out.println(countByWords.get(worde));
}
}
System.out.println("\n-----------------------------------------------");
}
public static void readWordFile(Map<String, Integer> countByWords ,TreeMap<String, Integer> frequencyData, Map<String, Set<String>> filenames)
{
Scanner wordFile;
String word; // A word read from the file
Integer count; // The number of occurrences of the word
int counter = 0;
int docs = 0;
//**FOR LOOP TO READ THE DOCUMENTS**
for(int x=0; x<Docs.length; x++)
{ //start of for loop [*
try
{
wordFile = new Scanner(new FileReader(Docs[x]));
}
catch (FileNotFoundException e)
{
System.err.println(e);
return;
}
while (wordFile.hasNext( ))
{
// Read the next word and get rid of the end-of-line marker if needed:
word = wordFile.next( );
// This makes the Word lower case.
word = word.toLowerCase();
word = word.replaceAll("[^a-zA-Z0-9\\s]", "");
// Get the current count of this word, add one, and then store the new count:
count = countByWords.get(word);
if(count != null){
countByWords.put(word, count + 1);
}
else{
countByWords.put(word, 1);
}
Set<String> filenamesForWord = filenames.get(word);
if (filenamesForWord == null) {
filenamesForWord = new HashSet<String>();
}
filenamesForWord.add(Docs[x]);
filenames.put(word, filenamesForWord);
counter++;
docs = x + 1;
}
} //End of for loop *]
System.out.println("There are " + counter + " terms in the collection.");
System.out.println("There are " + docs + " documents in the collection.");
}
// Array of documents
static String Docs [] = {"Document1.txt", "Document2.txt", "Document3.txt"};
}

Java Unicode Characters

I'm familiar with problems with ascii. The problem is I have no experience with same problems in unicode characters. For example, how to return the word that occurs most frequently given a string array containing words? Thanks in advance!
p.s.: You can always use an array which length is "256" to represent all the characters in ASCII while you can't do that when it comes to unicode. Is HashMap a must and the best way to solve the problem? I heard that there are better ways to solve it. Below is what I can think of:
String str = "aa df ds df df"; // assume they are Unicode
String[] words = str.split(" ");
HashMap<String, Integer> map = new HashMap<String, Integer>();
for (String word : words){
if (map.containsKey(word)){
int f = map.get(word);
map.put(word, f+1);
} else{
map.put(word, 1);
}
}
int max = 0;
String maxWord = "";
for (String word : words){
int f = map.get(word);
if (f > max){
max = f;
maxWord = word;
}
}
System.out.println(maxWord + " " +max);
// Inspired by GameKyuubi. It can be solved using array sort and count the most frequently used word using constatnt space.
Arrays.sort(words);
int max = 0;
int count = 0;
String maxWord = "";
String prev = "";
for (String word : words){
if (prev.equals("") || word.equals(prev)){
count++;
} else{
count = 1;
}
if (max < count){
max = count;
maxWord = word;
}
prev = word;
}
System.out.println(maxWord + " " +max);

how do i count occurrence of words in a line

I am fairly new to java. I want to count the occurrences of words in a particular line. So far i can only count the words but no idea how to count occurrences.
Is there a simple way to do this?
Scanner file = new Scanner(new FileInputStream("/../output.txt"));
int count = 0;
while (file.hasNextLine()) {
String s = file.nextLine();
count++;
if(s.contains("#AVFC")){
System.out.printf("There are %d words on this line ", s.split("\\s").length-1);
System.out.println(count);
}
}
file.close();
Output:
There are 4 words on this line 1
There are 8 words on this line 13
There are 3 words on this line 16
Simplest way I can think of is to use String.split("\\s"), which will split based on spaces.
Then have a HashMap containing a word as the key with the value being the number of times it is used.
HashMap<String, Integer> mapOfWords = new HashMap<String, Integer>();
while (file.hasNextLine()) {
String s = file.nextLine();
String[] words = s.split("\\s");
int count;
for (String word : words) {
if (mapOfWords.get(word) == null) {
mapOfWords.put(word, 1);
}
else {
count = mapOfWord.get(word);
mapOfWords.put(word, count + 1);
}
}
}
Implementation you requested to skip strings that contain certain words
HashMap<String, Integer> mapOfWords = new HashMap<String, Integer>();
while (file.hasNextLine()) {
String s = file.nextLine();
String[] words = s.split("\\s");
int count;
if (isStringWanted(s) == false) {
continue;
}
for (String word : words) {
if (mapOfWords.get(word) == null) {
mapOfWords.put(word, 1);
}
else {
count = mapOfWord.get(word);
mapOfWords.put(word, count + 1);
}
}
}
private boolean isStringWanted(String s) {
String[] checkStrings = new String[] {"chelsea", "Liverpool", "#LFC"};
for (String check : checkString) {
if (s.contains(check)) {
return false;
}
}
return true;
}
Try below code, it may solve your problem, in addition you can call String.toLowerCase() before you put it into the hashmap
String line ="a a b b b b a q c c";
...
Map<String,Integer> map = new HashMap<String,Integer>();
Scanner scanner = new Scanner(line);
while (scanner.hasNext()) {
String s = scanner.next();
Integer count = map.put(s,1);
if(count!=null) map.put(s,count + 1);
}
...
System.out.println(map);
Result:
{b=4, c=2, q=1, a=3}
Fastest would be store the splitted data in a ArrayList then iterate on your ArrayList and use [Collections.frequency] (http://www.tutorialspoint.com/java/util/collections_frequency.htm)
Check Guava's Multiset. Their description starts with 'The traditional Java idiom for e.g. counting how many times a word occurs in a document is something like:'. You find some code snippets how to do that without a MultiSet.
BTW: If you only wanted to count the number of words in your string, why not just count the spaces? You could use StringUtils from the apache commons. It's much better than creating an array of the split parts. Also have a look at their implementation.
int count = StringUtils.countMatches(string, " ");
In a given String, occurrences of a given String can be counted using String#indexOf(String, int) and through a loop
String haystack = "This is a string";
String needle = "i";
int index = 0;
while (index != -1) {
index = haystack.indexOf(needle, index + 1);
if (index != -1) {
System.out.println(String.format("Found %s in %s at index %s.", needle, haystack, index));
}
}

Manipulating a user's input

So I'm trying to manipulate the user's input in such a way that when I find a certain string in his input I turn that into a variable and replace the string with the name of the variable. (jumbled explanation I know, maybe an example will make it more clear).
public class Test {
static List<String> refMap = new ArrayList<String>();
public static void main(String[] args) {
String x = "PROPERTY_X";
String y = "PROPERTY_Y";
refMap.add(x);
refMap.add(y);
String z = "getInteger("PROPERTY_X)";
String text = "q=PROPERTY_X+10/(200*PROPERTY_X)";
String text1 = "if(PROPERTY_X==10){"
+ "j=1;"
+ "PROPERTY_X=5; "
+ "if(true){"
+ "m=4/PROPERTY_X"
+ "}"
+ "}";
detectEquals(text);
}
public static String detectEquals(String text) {
String a = null;
text = TestSplitting.addDelimiters(text);
String[] newString = text.split(" ");
List<String> test = Arrays.asList(newString);
StringBuilder strBuilder = new StringBuilder();
HashMap<String, Integer> signs = new HashMap<String, Integer>();
HashMap<String, Integer> references = new HashMap<String, Integer>();
List<String> referencesList = new ArrayList<String>();
List<Integer> indexList = new ArrayList<Integer>();
int index = 0;
for (int i = 0; i < test.size(); i++) {
a = test.get(i).trim();
//System.out.println("a= " + a);
strBuilder.append(a);
index = strBuilder.length() - a.length();
if (a.equals("=")) {
signs.put(a, index);
indexList.add(index);
// System.out.println("signs map--> : "+signs.get(a));
}
if (refMap .contains(a)) {
references.put(a, index);
// System.out.println("reference index-> "+references.get(a));
// System.out.println("reference-> "+references.toString());
}
}
//stuck here
for (String s : references.keySet()) {
//System.out.println("references-> " + s);
int position = references.get(s);
for (int j : indexList) {
if (j <= position) {
System.out.println(j);
}
}
//strBuilder.insert(j - 1, "temp1=\r\n");
}
System.out.println(strBuilder);
return a;
}
Say the user inputs the content of the string "text", I'm trying to parse that input so when I find "PROPERTY_X", I want to create a variable out of it and place it right before the occurrence of text, and then replace "PROPERTY_X" with the name of the newly created variable.
The reason I'm also searching for "=" sign is because I only want to do the above for the first occurrence of "PROPERTY_X" in the whole input and then just replace "PROPERTY_X" with tempVar1 wherever else I find "PROPERTY_X".
ex:
tempVar1=PROPERTY_X;
q=tempVar1+10/(200*tempVar1);
Things get more complex as the user input gets more complex, but for the moment I'm only trying to do it right for the first input example I created and then take it from there :).
As you can see, I'm a bit stuck on the logic part, the way I went with it was this:
I find all the "=" signs in the string (when I move on to more complex inputs I will need to search for conditions like if,for,else,while also) and save each of them and their index to a map, then I do the same for the occurrences of "PROPERTY_X" and their indexes. Then I try to find the index of "=" which is closest to the index of the "PROPERTY_X" and and insert my new variable there, after which I go on to replace what I need with the name of the variable.
Oh the addDelimiters() method does a split based on some certain delimiters, basically the "text" string once inserted in the list will look something like this:
q
=
PROPERTY_X
+
10
etc..
Any suggestions are welcome.

Categories