Java Inverted Index program

Java Inverted Index program - java

I am writing an inverted index program on java which returns the frequency of terms among multiple documents. I have been able to return the number times a word appears in the entire collection, but I have not been able to return which documents the word appears in. This is the code I have so far:
import java.util.*; // Provides TreeMap, Iterator, Scanner
import java.io.*; // Provides FileReader, FileNotFoundException
public class Run
{
public static void main(String[ ] args)
{
// **THIS CREATES A TREE MAP**
TreeMap<String, Integer> frequencyData = new TreeMap<String, Integer>( );
Map[] mapArray = new Map[5];
mapArray[0] = new HashMap<String, Integer>();
readWordFile(frequencyData);
printAllCounts(frequencyData);
}
public static int getCount(String word, TreeMap<String, Integer> frequencyData)
{
if (frequencyData.containsKey(word))
{ // The word has occurred before, so get its count from the map
return frequencyData.get(word); // Auto-unboxed
}
else
{ // No occurrences of this word
return 0;
}
}
public static void printAllCounts(TreeMap<String, Integer> frequencyData)
{
System.out.println("-----------------------------------------------");
System.out.println(" Occurrences Word");
for(String word : frequencyData.keySet( ))
{
System.out.printf("%15d %s\n", frequencyData.get(word), word);
}
System.out.println("-----------------------------------------------");
}
public static void readWordFile(TreeMap<String, Integer> frequencyData)
{
int total = 0;
Scanner wordFile;
String word; // A word read from the file
Integer count; // The number of occurrences of the word
int counter = 0;
int docs = 0;
//**FOR LOOP TO READ THE DOCUMENTS**
for(int x=0; x<Docs.length; x++)
{ //start of for loop [*
try
{
wordFile = new Scanner(new FileReader(Docs[x]));
}
catch (FileNotFoundException e)
{
System.err.println(e);
return;
}
while (wordFile.hasNext( ))
{
// Read the next word and get rid of the end-of-line marker if needed:
word = wordFile.next( );
// This makes the Word lower case.
word = word.toLowerCase();
word = word.replaceAll("[^a-zA-Z0-9\\s]", "");
// Get the current count of this word, add one, and then store the new count:
count = getCount(word, frequencyData) + 1;
frequencyData.put(word, count);
total = total + count;
counter++;
docs = x + 1;
}
} //End of for loop *]
System.out.println("There are " + total + " terms in the collection.");
System.out.println("There are " + counter + " unique terms in the collection.");
System.out.println("There are " + docs + " documents in the collection.");
}
// Array of documents
static String Docs [] = {"words.txt", "words2.txt",};

Instead of simply having a Map from word to count, create a Map from each word to a nested Map from document to count. In other words:
Map<String, Map<String, Integer>> wordToDocumentMap;
Then, inside your loop which records the counts, you want to use code which looks like this:
Map<String, Integer> documentToCountMap = wordToDocumentMap.get(currentWord);
if(documentToCountMap == null) {
// This word has not been found anywhere before,
// so create a Map to hold document-map counts.
documentToCountMap = new TreeMap<>();
wordToDocumentMap.put(currentWord, documentToCountMap);
}
Integer currentCount = documentToCountMap.get(currentDocument);
if(currentCount == null) {
// This word has not been found in this document before, so
// set the initial count to zero.
currentCount = 0;
}
documentToCountMap.put(currentDocument, currentCount + 1);
Now you're capturing the counts on a per-word and per-document basis.
Once you've completed the analysis and you want to print a summary of the results, you can run through the map like so:
for(Map.Entry<String, Map<String,Integer>> wordToDocument :
wordToDocumentMap.entrySet()) {
String currentWord = wordToDocument.getKey();
Map<String, Integer> documentToWordCount = wordToDocument.getValue();
for(Map.Entry<String, Integer> documentToFrequency :
documentToWordCount.entrySet()) {
String document = documentToFrequency.getKey();
Integer wordCount = documentToFrequency.getValue();
System.out.println("Word " + currentWord + " found " + wordCount +
" times in document " + document);
}
}
For an explanation of the for-each structure in Java, see this tutorial page.
For a good explanation of the features of the Map interface, including the entrySet method, see this tutorial page.

Try adding second map word -> set of document name like this:
Map<String, Set<String>> filenames = new HashMap<String, Set<String>>();
...
word = word.replaceAll("[^a-zA-Z0-9\\s]", "");
// Get the current count of this word, add one, and then store the new count:
count = getCount(word, frequencyData) + 1;
frequencyData.put(word, count);
Set<String> filenamesForWord = filenames.get(word);
if (filenamesForWord == null) {
filenamesForWord = new HashSet<String>();
}
filenamesForWord.add(Docs[x]);
filenames.put(word, filenamesForWord);
total = total + count;
counter++;
docs = x + 1;
When you need to get a set of filenames in which you encountered a particular word, you'll just get() it from the map filenames. Here is the example that prints out all the file names, in which we have encountered a word:
public static void printAllCounts(TreeMap<String, Integer> frequencyData, Map<String, Set<String>> filenames) {
System.out.println("-----------------------------------------------");
System.out.println(" Occurrences Word");
for(String word : frequencyData.keySet( ))
{
System.out.printf("%15d %s\n", frequencyData.get(word), word);
for (String filename : filenames.get(word)) {
System.out.println(filename);
}
}
System.out.println("-----------------------------------------------");
}

I've put a scanner into the main methode, and the word I search for will return the documents the word occurce in. I also return how many times the word occurs, but I will only get it to be the total of times in all of three documents. And I want it to return how many times it occurs in each document. I want this to be able to calculate tf-idf, if u have a total answer for the whole tf-idf I would appreciate. Cheers
Here is my code:
import java.util.*; // Provides TreeMap, Iterator, Scanner
import java.io.*; // Provides FileReader, FileNotFoundException
public class test2
{
public static void main(String[ ] args)
{
// **THIS CREATES A TREE MAP**
TreeMap<String, Integer> frequencyData = new TreeMap<String, Integer>();
Map<String, Set<String>> filenames = new HashMap<String, Set<String>>();
Map<String, Integer> countByWords = new HashMap<String, Integer>();
Map[] mapArray = new Map[5];
mapArray[0] = new HashMap<String, Integer>();
readWordFile(countByWords, frequencyData, filenames);
printAllCounts(countByWords, frequencyData, filenames);
}
public static int getCount(String word, TreeMap<String, Integer> frequencyData)
{
if (frequencyData.containsKey(word))
{ // The word has occurred before, so get its count from the map
return frequencyData.get(word); // Auto-unboxed
}
else
{ // No occurrences of this word
return 0;
}
}
public static void printAllCounts( Map<String, Integer> countByWords, TreeMap<String, Integer> frequencyData, Map<String, Set<String>> filenames)
{
System.out.println("-----------------------------------------------");
System.out.print("Search for a word: ");
String worde;
int result = 0;
Scanner input = new Scanner(System.in);
worde=input.nextLine();
if(!filenames.containsKey(worde)){
System.out.println("The word does not exist");
}
else{
for(String filename : filenames.get(worde)){
System.out.println(filename);
System.out.println(countByWords.get(worde));
}
}
System.out.println("\n-----------------------------------------------");
}
public static void readWordFile(Map<String, Integer> countByWords ,TreeMap<String, Integer> frequencyData, Map<String, Set<String>> filenames)
{
Scanner wordFile;
String word; // A word read from the file
Integer count; // The number of occurrences of the word
int counter = 0;
int docs = 0;
//**FOR LOOP TO READ THE DOCUMENTS**
for(int x=0; x<Docs.length; x++)
{ //start of for loop [*
try
{
wordFile = new Scanner(new FileReader(Docs[x]));
}
catch (FileNotFoundException e)
{
System.err.println(e);
return;
}
while (wordFile.hasNext( ))
{
// Read the next word and get rid of the end-of-line marker if needed:
word = wordFile.next( );
// This makes the Word lower case.
word = word.toLowerCase();
word = word.replaceAll("[^a-zA-Z0-9\\s]", "");
// Get the current count of this word, add one, and then store the new count:
count = countByWords.get(word);
if(count != null){
countByWords.put(word, count + 1);
}
else{
countByWords.put(word, 1);
}
Set<String> filenamesForWord = filenames.get(word);
if (filenamesForWord == null) {
filenamesForWord = new HashSet<String>();
}
filenamesForWord.add(Docs[x]);
filenames.put(word, filenamesForWord);
counter++;
docs = x + 1;
}
} //End of for loop *]
System.out.println("There are " + counter + " terms in the collection.");
System.out.println("There are " + docs + " documents in the collection.");
}
// Array of documents
static String Docs [] = {"Document1.txt", "Document2.txt", "Document3.txt"};
}

Related

Find the most common word from user input

I'm very new to Java creating a software application that allows a user to input text into a field and the program runs through all of the text and identifies what the most common word is. At the moment, my code looks like this:
JButton btnMostFrequentWord = new JButton("Most Frequent Word");
btnMostFrequentWord.addActionListener(new ActionListener() {
public void actionPerformed(ActionEvent e) {
String text = textArea.getText();
String[] words = text.split("\\s+");
HashMap<String, Integer> occurrences = new HashMap<String, Integer>();
for (String word : words) {
int value = 0;
if (occurrences.containsKey(word)) {
value = occurrences.get(word);
}
occurrences.put(word, value + 1);
}
JOptionPane.showMessageDialog(null, "Most Frequent Word: " + occurrences.values());
}
}
This just prints what the values of the words are, but I would like it to tell me what the number one most common word is instead. Any help would be really appreciated.

Just after your for loop, you can sort the map by value then reverse the sorted entries by value and select the first.
for (String word: words) {
int value = 0;
if (occurrences.containsKey(word)) {
value = occurrences.get(word);
}
occurrences.put(word, value + 1);
}
Map.Entry<String,Integer> tempResult = occurrences.entrySet().stream()
.sorted(Map.Entry.<String, Integer>comparingByValue().reversed())
.findFirst().get();
JOptionPane.showMessageDialog(null, "Most Frequent Word: " + tempResult.getKey());

For anyone who is more familiar with Java, here is a very easy way to do it with Java 8:
List<String> words = Arrays.asList(text.split("\\s+"));
Collections.sort(words, Comparator.comparingInt(word -> {
return Collections.frequency(words, word);
}).reversed());
The most common word is stored in words.get(0) after sorting.

I would do something like this
int max = 0;
String a = null;
for (String word : words) {
int value = 0;
if(occurrences.containsKey(word)){
value = occurrences.get(word);
}
occurrences.put(word, value + 1);
if(max < value+1){
max = value+1;
a = word;
}
}
System.out.println(a);
You could sort it, and the solution would be much shorter, but I think this runs faster.

You can either iterate through occurrences map and find the max or
Try like below
String text = textArea.getText();;
String[] words = text.split("\\s+");
HashMap<String, Integer> occurrences = new HashMap<>();
int mostFreq = -1;
String mostFreqWord = null;
for (String word : words) {
int value = 0;
if (occurrences.containsKey(word)) {
value = occurrences.get(word);
}
value = value + 1;
occurrences.put(word, value);
if (value > mostFreq) {
mostFreq = value;
mostFreqWord = word;
}
}
JOptionPane.showMessageDialog(null, "Most Frequent Word: " + mostFreqWord);

How can I convert a String into ArrayList by counting occurrence of each characters?

I have a Input String as :
String str="1,1,2,2,2,1,3";
I want count each id occurrence and store them into List,and I want output Like this:
[
{
"count": "3",
"ids": "1, 2"
}
{
"count": "1",
"ids": "3"
}
]
I tried by using org.springframework.util.StringUtils.countOccurrencesOf(input, "a"); like this. But after counting not getting the things like I want.

This will give you the desired result. You first count the occurrences of each character, then you group by count each character in a new HashMap<Integer, List<String>>.
Here's a working example:
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
public class Test {
public static void main(String[] args) {
String str = "1,1,2,2,2,1,3";
String[] list = str.split(",");
HashMap<String, Integer> occr = new HashMap<>();
for (int i = 0; i < list.length; i++) {
if (occr.containsKey(list[i])) {
occr.put(list[i], occr.get(list[i]) + 1);
} else {
occr.put(list[i], 1);
}
}
HashMap<Integer, List<String>> res = new HashMap<>();
for (String key : occr.keySet()) {
int count = occr.get(key);
if (res.containsKey(count)) {
res.get(count).add(key);
} else {
List<String> l = new ArrayList<>();
l.add(key);
res.put(count, l);
}
}
StringBuffer sb = new StringBuffer();
sb.append("[\n");
for (Integer count : res.keySet()) {
sb.append("{\n");
List<String> finalList = res.get(count);
sb.append("\"count\":\"" + count + "\",\n");
sb.append("\"ids\":\"" + finalList.get(0));
for (int i = 1; i < finalList.size(); i++) {
sb.append("," + finalList.get(i));
}
sb.append("\"\n}\n");
}
sb.append("\n]");
System.out.println(sb.toString());
}
}
EDIT: A more generalised solution
Here's the method that returns a HashMap<Integer,List<String>>, which contains the number of occurrences of a string as a key of the HashMap where each key has a List<String> value which contains all the strings that occur key number of times.
public HashMap<Integer, List<String>> countOccurrences(String str, String delimiter) {
// First, we count the number of occurrences of each string.
String[] list = str.split(delimiter);
HashMap<String, Integer> occr = new HashMap<>();
for (int i = 0; i < list.length; i++) {
if (occr.containsKey(list[i])) {
occr.put(list[i], occr.get(list[i]) + 1);
} else {
occr.put(list[i], 1);
}
}
/** Now, we group them by the number of occurrences,
* All strings with the same number of occurrences are put into a list;
* this list is put into a HashMap as a value, with the number of
* occurrences as a key.
*/
HashMap<Integer, List<String>> res = new HashMap<>();
for (String key : occr.keySet()) {
int count = occr.get(key);
if (res.containsKey(count)) {
res.get(count).add(key);
} else {
List<String> l = new ArrayList<>();
l.add(key);
res.put(count, l);
}
}
return res;
}

You need to do some boring transfer, I'm not sure if you want to keep the ids sorted. A simple implementation is:
public List<Map<String, Object>> countFrequency(String s) {
// Count by char
Map<String, Integer> countMap = new HashMap<String, Integer>();
for (String ch : s.split(",")) {
Integer count = countMap.get(ch);
if (count == null) {
count = 0;
}
count++;
countMap.put(ch, count);
}
// Count by frequency
Map<Integer, String> countByFrequency = new HashMap<Integer, String>();
for (Map.Entry<String, Integer> entry : countMap.entrySet()) {
String chars = countByFrequency.get(entry.getValue());
System.out.println(entry.getValue() + " " + chars);
if (chars == null) {
chars = "" + entry.getKey();
} else {
chars += ", " + entry.getKey();
}
countByFrequency.put(entry.getValue(), chars);
}
// Convert to list
List<Map<String, Object>> result = new ArrayList<Map<String, Object>>();
for (Map.Entry<Integer, String> entry : countByFrequency.entrySet()) {
Map<String, Object> item = new HashMap<String, Object>();
item.put("count", entry.getKey());
item.put("ids", entry.getValue());
result.add(item);
}
return result;
}

Hey check the below code, it help you to achieve your expected result
public class Test
{
public static void main(String args[])
{
String str = "1,1,2,2,2,1,3"; //Your input string
List<String> listOfIds = Arrays.asList(str.split(",")); //Splits the string
System.out.println("List of IDs : " + listOfIds);
HashMap<String, List<String>> map = new HashMap<>();
Set<String> uniqueIds = new HashSet<>(Arrays.asList(str.split(",")));
for (String uniqueId : uniqueIds)
{
String frequency = String.valueOf(Collections.frequency(listOfIds, uniqueId));
System.out.println("ID = " + uniqueId + ", frequency = " + frequency);
if (!map.containsKey(frequency))
{
map.put(frequency, new ArrayList<String>());
}
map.get(frequency).add(uniqueId);
}
for (Map.Entry<String, List<String>> entry : map.entrySet())
{
System.out.println("Count = "+ entry.getKey() + ", IDs = " + entry.getValue());
}
}
}

One of the approach i can suggest you is to
put each "character" in hashMap as a key and "count" as a value.
Sample code to do so is
String str = "1,1,2,2,2,1,3";
HashMap<String, String> map = new HashMap();
for (String c : str.split(",")) {
if (map.containsKey( c)) {
int count = Integer.parseInt(map.get(c));
map.put(c, ++count + "");
} else
map.put(c, "1");
}
System.out.println(map.toString());
}

<!--first you split string based on "," and store into array, after that iterate array end of array lenght in side loop create new map and put element in map as a Key and set value as count 1 again check the key and increase count value in map-->
like....
String str="1,1,2,2,2,1,3";
String strArray=str.split(",");
Map strMap= new hashMap();
for(int i=0; i < strArray.length(); i++){
if(!strMap.containsKey(strArray[i])){
strMap.put(strArray[i],1)
}else{
strMap.put(strArray[i],strMap.get(strArray[i])+1)
}
}

String str="1,1,2,2,2,1,3";
//Converting given string to string array
String[] strArray = str.split(",");
//Creating a HashMap containing char as a key and occurrences as a value
Map<String,Integer> charCountMap = new HashMap<String, Integer>();
//checking each element of strArray
for(String num :strArray){
if(charCountMap.containsKey(num))
{
//If char is present in charCountMap, incrementing it's count by 1
charCountMap.put(num, charCountMap.get(num)+1);
}
else
{
//If char is not present in charCountMap, and putting this char to charCountMap with 1 as it's value
charCountMap.put(num, 1);
}
}
//Printing the charCountMap
for (Map.Entry<String, Integer> entry : charCountMap.entrySet())
{
System.out.println("ID ="+entry.getKey() + " count=" + entry.getValue());
}
}

// Split according to comma
HashMap<String, Integer> hm = new HashMap<String, Integer>();
for (String key : tokens) {
if (hm.containsKey(key)) {
Integer currentCount = hm.get(key);
hm.put(key, ++currentCount);
} else {
hm.put(key, 1);
}
}
// Organize info according to ID
HashMap<Integer, String> result = new HashMap<Integer, String>();
for (Map.Entry<String, Integer> entry : hm.entrySet()) {
Integer newKey = entry.getValue();
if (result.containsKey(newKey)) {
String newValue = entry.getKey() + ", " + result.get(newKey);
result.put(newKey, newValue);
} else {
result.put(newKey, entry.getKey());
}
}

And here is a complete Java 8 streaming solution for the problem. The main idea is to first build a map of the occurances of each id, which results in:
{1=3, 2=3, 3=1}
(first is ID and second the count) and then to group by it by the count:
public static void main(String[] args) {
String str = "1,1,2,2,2,1,3";
System.out.println(
Pattern.compile(",").splitAsStream(str)
.collect(groupingBy(identity(), counting()))
.entrySet().stream()
.collect(groupingBy(i -> i.getValue(), mapping( i -> i.getKey(), toList())))
);
}
which results in:
{1=[3], 3=[1, 2]}
This is the most compact version I could come up with. Is there anything even smaller?
EDIT: By the way here is the complete class, to get all static method imports right:
import static java.util.function.Function.identity;
import java.util.regex.Pattern;
import static java.util.stream.Collectors.counting;
import static java.util.stream.Collectors.groupingBy;
import static java.util.stream.Collectors.mapping;
import static java.util.stream.Collectors.toList;
public class Java8StreamsTest6 {
public static void main(String[] args) {
String str = "1,1,2,2,2,1,3";
System.out.println(
Pattern.compile(",").splitAsStream(str)
.collect(groupingBy(identity(), counting()))
.entrySet().stream()
.collect(groupingBy(i -> i.getValue(), mapping(i -> i.getKey(), toList())))
);
}
}

get the two most used words in sentence in java

How do I get the two most used words in a sentence for example here after it count the total number of appearances of all the words it should also display the two most used words
import javax.swing.*;
import java.util.*;
import java.awt.event.*;
import java.util.Map;
import java.util.HashMap;
public class Tokenizer
{
public static void main(String[] args)
{
int index = 0; int tokenCount; int i =0;
Map<String,Integer> wordCount = new HashMap<String,Integer>();
Map<Integer,Integer> letterCount = new HashMap<Integer,Integer>();
String message="The Quick brown fox jumps over the lazy brown dog";
StringTokenizer string = new StringTokenizer(message);
tokenCount = string.countTokens();
System.out.println("Number of tokens = " + tokenCount);
while (string.hasMoreTokens()) {
String word = string.nextToken().toLowerCase();
Integer count = wordCount.get(word);
Integer lettercount = letterCount.get(word);
if(count == null) {
wordCount.put(word, 1);
}
else {
wordCount.put(word, count + 1);
}
}
for (String words : wordCount.keySet())
{System.out.println("Word : " + words + " has count :" +wordCount.get(words));
}
}

Iterate thorough the HashMap and then keep track of the highest counts.
int first, second;
first = second = Integer.MIN_VALUE;
String firstWord, secondWord;
for (Map.Entry<String, Integer> entry : map.entrySet())
{
int count = entry.getValue();
String word = entry.getKey();
if (count > first)
{
second = first;
secondWord = firstWord;
first = count;
firstWord = word;
}
else if (count > second && count != first)
{
second = count;
secondWord = word;
}
}
System.out.println(firstWord + " " + first);
System.out.println(secondWord + " " + second);

You need to iterate over map's entry set.
This will return you entry object which will contain key and max value.
Map.Entry<String, Integer> max = null;
for (Map.Entry<String, Integer> entry : map.entrySet())
{
if (max == null || entry.getValue().compareTo(max .getValue()) > 0)
{
max = entry;
}
}
For second most used word,i would say you can remove the max one and then again from this way,you can retrieve second one.

How do you find words in a text file and print the most frequent word shown using array?

I'm having trouble of figuring out how to find the most frequent word and the most frequent case-insensitive word for a program. I have a scanner that reads through the text file and a while loop, but still doesn't know how to implement what I'm trying to find. Do I use a different string function to read and print the word out?
Here is my code as of now:
public class letters {
public static void main(String[] args) throws FileNotFoundException {
FileInputStream fis = new FileInputStream("input.txt");
Scanner scanner = new Scanner(fis);
String word[] = new String[500];
while (scanner.hasNextLine()) {
String s = scanner.nextLine();
for (int i = 0; i < s.length(); i++) {
char ch = s.charAt(i);
}
}
String []roll = s.split("\\s");
for(int i=0;i<roll.length;i++){
String lin = roll[i];
//System.out.println(lin);
}
}
This is what I have so far. I need the output to say:
Word:
6 roll
Case-insensitive word:
18 roll
And here is my input file:
#
roll tide roll!
Roll Tide Roll!
ROLL TIDE ROLL!
ROll tIDE ROll!
roll tide roll!
Roll Tide Roll!
ROLL TIDE ROLL!
roll tide roll!
Roll Tide Roll !
#
65-43+21= 43
65.0-43.0+21.0= 43.0
65 -43 +21 = 43
65.0 -43.0 +21.0 = 43.0
65 - 43 + 21 = 43
65.00 - 43.0 + 21.000 = +0043.0000
65 - 43 + 21 = 43
I just need it to find the most occuring word(Which is the maximal consecutive sequence of letters)(which is roll) and print out how many times it is located(which is 6) . If anybody can help me on this, that would be really great! thanks

Consider using a Map<String,Integer> for the word then you can implement this to count words and will be work for any number of words. See Documentation for Map.
Like this (would require modification for case insensitive)
public Map<String,Integer> words_count = new HashMap<String,Integer>();
//read your line (you will have to determine if this line should be split or is equations
//also just noticed that the trailing '!' would need to be removed
String[] words = line.split("\\s+");
for(int i=0;i<words.length;i++)
{
String s = words[i];
if(words_count.ketSet().contains(s))
{
Integer count = words_count.get(s) + 1;
words_count.put(s, count)
}
else
words_count.put(s, 1)
}
Then you have the number of occurrences for each word in the string and to get the most occurring do something like
Integer frequency = null;
String mostFrequent = null;
for(String s : words_count.ketSet())
{
Integer i = words_count.get(s);
if(frequency == null)
frequency = i;
if(i > frequency)
{
frequency = i;
mostFrequent = s;
}
}
Then to print
System.out.println("The word "+ mostFrequent +" occurred "+ frequency +" times");

Start with accumulating all the words into a Map as follows:
...
String[] roll = s.split("\\s+");
for (final String word : roll) {
Integer qty = words.get(word);
if (qty == null) {
qty = 1;
} else {
qty = qty + 1;
}
words.put(word, qty);
}
...
Then you need to figure out which has the biggest score:
String bestWord;
int maxQty = 0;
for(final String word : words.keySet()) {
if(words.get(word) > maxQty) {
maxQty = words.get(word);
bestWord = word;
}
}
System.out.println("Word:");
System.out.println(Integer.toString(maxQty) + " " + bestWord);
And last you need to merge all forms of the same word together:
Map<String, Integer> wordsNoCase = new HashMap<String, Integer>();
for(final String word : words.keySet()) {
Integer qty = wordsNoCase.get(word.toLowerCase());
if(qty == null) {
qty = words.get(word);
} else {
qty += words.get(word);
}
wordsNoCase.put(word.toLowerCase(), qty);
}
words = wordsNoCase;
Then re-run the previous code snippet to find the word with the biggest score.

Try to use HashMap for better results. You need to use BufferedReader and Filereader for taking input file as follows:
FileReader text = new FileReader("file.txt");
BufferedReader textFile = new BufferedReader(text);
The Bufferedreader object textfile needs to passed as a parameter to the method below:
public HashMap<String, Integer> countWordFrequency(BufferedReader textFile) throws IOException
{
/*This method finds the frequency of words in a text file
* and saves the word and its corresponding frequency in
* a HashMap.
*/
HashMap<String, Integer> mapper = new HashMap<String, Integer>();
StringBuffer multiLine = new StringBuffer("");
String line = null;
if(textFile.ready())
{
while((line = textFile.readLine()) != null)
{
multiLine.append(line);
String[] words = line.replaceAll("[^a-zA-Z]", " ").toLowerCase().split(" ");
for(String word : words)
{
if(!word.isEmpty())
{
Integer freq = mapper.get(word);
if(freq == null)
{
mapper.put(word, 1);
}
else
{
mapper.put(word, freq+1);
}
}
}
}
textFile.close();
}
return mapper;
}
The line line.replaceAll("[^a-zA-Z]", " ").toLowerCase().split(" "); is used for replacing all the characters other than alphabets, the it makes all the words in lower case (which solves your case insensitive problem) and then splits the words seperated by spaces.
/*This method finds the highest value in HashMap
* and returns the same.
*/
public int maxFrequency(HashMap<String, Integer> mapper)
{
int maxValue = Integer.MIN_VALUE;
for(int value : mapper.values())
{
if(value > maxValue)
{
maxValue = value;
}
}
return maxValue;
}
The above code returns that value in hashmap which is highest.
/*This method prints the HashMap Key with a particular Value.
*/
public void printWithValue(HashMap<String, Integer> mapper, Integer value)
{
for (Entry<String, Integer> entry : mapper.entrySet())
{
if (entry.getValue().equals(value))
{
System.out.println("Word : " + entry.getKey() + " \nFrequency : " + entry.getValue());
}
}
}
Now you can print the most frequent word along with its frequency as above.

/* i have declared LinkedHashMap containing String as a key and occurrences as a value.
* Creating BufferedReader object
* Reading the first line into currentLine
* Declere while-loop & splitting the currentLine into words
* iterated using for loop. Inside for loop, i have an if else statement
* If word is present in Map increment it's count by 1 else set to 1 as value
* Reading next line into currentLine
*/
public static void main(String[] args) {
Map<String, Integer> map = new LinkedHashMap<String, Integer>();
BufferedReader reader = null;
try {
reader = new BufferedReader(new FileReader("F:\\chidanand\\javaIO\\Student.txt"));
String currentLine = reader.readLine();
while (currentLine!= null) {
String[] input = currentLine.replaceAll("[^a-zA-Z]", " ").toLowerCase().split(" ");
for (int i = 0; i < input.length; i++) {
if (map.containsKey(input[i])) {
int count = map.get(input[i]);
map.put(input[i], count + 1);
} else {
map.put(input[i], 1);
}
}
currentLine = reader.readLine();
}
String mostRepeatedWord = null;
int count = 0;
for (Entry<String, Integer> m:map.entrySet())
{
if(m.getValue() > count)
{
mostRepeatedWord = m.getKey();
count = m.getValue();
}
}
System.out.println("The most repeated word in input file is : "+mostRepeatedWord);
System.out.println("Number Of Occurrences : "+count);
} catch (IOException e) {
e.printStackTrace();
} finally {
try {
reader.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
}

How to Count Repetition of Words in Array List?

I've these code for searching occurrence in Array-List but my problem is how I can get result
out side of this for loop in integer type cause I need in out side , may be there is another way for finding
occurrence with out using for loop can you help me ?
thank you...
List<String> list = new ArrayList<String>();
list.add("aaa");
list.add("bbb");
list.add("aaa");
Set<String> unique = new HashSet<String>(list);
for (String key : unique) {
int accurNO = Collections.frequency(list, key);
System.out.println(key + ": " accurNO);
}

You should declare a map like Map<String, Integer> countMap = new HashMap<String, Integer>(); before the loop, and populate it within the loop.
Map<String, Integer> countMap = new HashMap<String, Integer>();
for (String key : unique) {
int accurNO = Collections.frequency(list, key);
coutMap.put(key, accurNO);
//...
}
//now you have a map with keys and their frequencies in the list

Set unique = new HashSet(list);
and
Collections.frequency(list, key);
are too much overhead.
Here is how i would do it
List<String> list = new ArrayList<String>();
list.add("aaa");
list.add("bbb");
list.add("aaa");
Map<String, Integer> countMap = new HashMap<>();
for (String word : list) {
Integer count = countMap.get(word);
if(count == null) {
count = 0;
}
countMap.put(word, (count.intValue()+1));
}
System.out.println(countMap.toString());
Output
{aaa=2, bbb=1}
EDIT output one by one: iterate over the set of entries of the map
for(Entry<String, Integer> entry : countMap.entrySet()) {
System.out.println("frequency of '" + entry.getKey() + "' is "
+ entry.getValue());
}
Output
frequency of 'aaa' is 2
frequency of 'bbb' is 1
EDIT 2 No need for looping
String word = null;
Integer frequency = null;
word = "aaa";
frequency = countMap.get(word);
System.out.println("frequency of '" + word + "' is " +
(frequency == null ? 0 : frequency.intValue()));
word = "bbb";
frequency = countMap.get(word);
System.out.println("frequency of '" + word + "' is " +
(frequency == null ? 0 : frequency.intValue()));
word = "foo";
frequency = countMap.get(word);
System.out.println("frequency of '" + word + "' is " +
(frequency == null ? 0 : frequency.intValue()));
Output
frequency of 'aaa' is 2
frequency of 'bbb' is 1
frequency of 'foo' is 0
Note that you will always have a collection and you need extract the count from it for a particular word one way or another.

List<String> list = new ArrayList<String>();
list.add("aaa");
list.add("bbb");
list.add("aaa");
Map<String,Integer> countMap = new HashMap();
Set<String> unique = new HashSet<String>(list);
for (String key : unique) {
int accurNO = Collections.frequency(list, key);
countMap.put(key,accurNO);
System.out.println(key + ": " accurNO);
}

The Map answers work, but you can extend this answer to solve more problems.
You create a class that has the field values you need, and put the class in a List.
import java.util.ArrayList;
import java.util.List;
public class WordCount {
private String word;
private int count;
public WordCount(String word) {
this.word = word;
this.count = 0;
}
public void addCount() {
this.count++;
}
public String getWord() {
return word;
}
public int getCount() {
return count;
}
}
class AccumulateWords {
List<WordCount> list = new ArrayList<WordCount>();
public void run() {
list.add(new WordCount("aaa"));
list.add(new WordCount("bbb"));
list.add(new WordCount("ccc"));
// Check for word occurrences here
for (WordCount wordCount : list) {
int accurNO = wordCount.getCount();
System.out.println(wordCount.getWord() + ": " + accurNO);
}
}
}

I would sort the list first to avoid going thru the whole list with Collections.frequency every time. The code will be longer but much more efficient
List<String> list = new ArrayList<String>();
list.add("aaa");
list.add("bbb");
list.add("aaa");
Map<String, Integer> map = new HashMap<String, Integer>();
Collections.sort(list);
String last = null;
int n = 0;
for (String w : list) {
if (w.equals(last)) {
n++;
} else {
if (last != null) {
map.put(last, n);
}
last = w;
n = 1;
}
}
map.put(last, n);
System.out.println(map);
output
{aaa=2, bbb=1}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java Inverted Index program - java

Related

Find the most common word from user input

How can I convert a String into ArrayList by counting occurrence of each characters?

get the two most used words in sentence in java

How do you find words in a text file and print the most frequent word shown using array?

How to Count Repetition of Words in Array List?

Categories

Resources