get the two most used words in sentence in java - java

How do I get the two most used words in a sentence for example here after it count the total number of appearances of all the words it should also display the two most used words
import javax.swing.*;
import java.util.*;
import java.awt.event.*;
import java.util.Map;
import java.util.HashMap;
public class Tokenizer
{
public static void main(String[] args)
{
int index = 0; int tokenCount; int i =0;
Map<String,Integer> wordCount = new HashMap<String,Integer>();
Map<Integer,Integer> letterCount = new HashMap<Integer,Integer>();
String message="The Quick brown fox jumps over the lazy brown dog";
StringTokenizer string = new StringTokenizer(message);
tokenCount = string.countTokens();
System.out.println("Number of tokens = " + tokenCount);
while (string.hasMoreTokens()) {
String word = string.nextToken().toLowerCase();
Integer count = wordCount.get(word);
Integer lettercount = letterCount.get(word);
if(count == null) {
wordCount.put(word, 1);
}
else {
wordCount.put(word, count + 1);
}
}
for (String words : wordCount.keySet())
{System.out.println("Word : " + words + " has count :" +wordCount.get(words));
}
}

Iterate thorough the HashMap and then keep track of the highest counts.
int first, second;
first = second = Integer.MIN_VALUE;
String firstWord, secondWord;
for (Map.Entry<String, Integer> entry : map.entrySet())
{
int count = entry.getValue();
String word = entry.getKey();
if (count > first)
{
second = first;
secondWord = firstWord;
first = count;
firstWord = word;
}
else if (count > second && count != first)
{
second = count;
secondWord = word;
}
}
System.out.println(firstWord + " " + first);
System.out.println(secondWord + " " + second);

You need to iterate over map's entry set.
This will return you entry object which will contain key and max value.
Map.Entry<String, Integer> max = null;
for (Map.Entry<String, Integer> entry : map.entrySet())
{
if (max == null || entry.getValue().compareTo(max .getValue()) > 0)
{
max = entry;
}
}
For second most used word,i would say you can remove the max one and then again from this way,you can retrieve second one.

Related

Handling an Array in Java (two columns)

For instance suppose I have the following String
String S = "5,a\n" +
"6,b\n" +
"9,a";
The format is always the same - one digit, then comma, then one character and then line end character.
For looping each row in String I use
for(String a : S.split("\\n")){}
I want to learn the character with highest amount, when grouped by character. For Instance, there is only one "b", so value is 6; whereas "a" has two lines, so its value is 5 + 9 = 14. Since 14 is maximum here, I want to find out "a" and 14 and save them in variables.
You can do something like below :
public static void main (String[] args) throws java.lang.Exception
{
String S = "5,a\n" +
"6,b\n" +
"9,a";
String[] lines = S.split("\\n");
Map<String, Integer> map = new HashMap<String, Integer>();
for( String t : lines )
{
String[] e = t.split(",");
Integer digit = Integer.parseInt(e[0]);
String c = e[1];
if ( map.get(c) != null )
{
Integer val = map.get(c);
val += digit;
map.put( c, val );
}
else
{
map.put( c, digit );
}
}
int max = 0;
String maxKey = null;
for ( String k : map.keySet() )
{
if ( map.get(k) > max )
{
max = map.get(k);
maxKey = k;
}
}
System.out.println("The maximum key is : " + maxKey );
System.out.println("The maximum value is : " + max );
}
Output is :
The maximum key is : a
The maximum value is : 14
Use a HashMap to store each pair, with the letter as the key. If the entry doesn't exist, put the first number. If it exists, get the entry and add the number, and then put the sum.
import java.util.HashMap;
import java.util.Map;
public class ParseTest {
public static void main(String[] args) {
String S = "5,a\n" + "6,b\n" + "9,a";
String maxKey = null;
int maxVal = 0;
Map<String, Integer> sums = new HashMap<>();
for (String a : S.split("\\n")) {
String[] split = a.split(",");
int value = Integer.parseInt(split[0]);
String key = split[1];
if (sums.containsKey(key)) {
sums.put(key, sums.get(key) + value);
} else {
sums.put(key, value);
}
if (sums.get(key) > maxVal) {
maxVal = sums.get(key);
maxKey = key;
}
}
System.out.println("Max key: " + maxKey + ", Sum: " + maxVal);
}
}
After finishing my answer, I found that many similar answers have been posted out :). Anyway, my solution:
public static void main(String[] args) {
String S = "5,a\n6,b\n9,a";
Map<String, Integer> map = new HashMap<String, Integer>();
String highestAmountChar = "";
int highestAmount = 0;
for (String str : S.split("\\n")) {
String[] amountChar = str.split(",");
if (map.get(amountChar[1]) == null) {
map.put(amountChar[1], Integer.parseInt(amountChar[0]));
} else {
map.put(amountChar[1], map.get(amountChar[1]) + Integer.parseInt(amountChar[0]));
}
if (highestAmount < map.get(amountChar[1])) {
highestAmount = map.get(amountChar[1]);
highestAmountChar = amountChar[1];
}
}
System.out.println("The character " + highestAmountChar + " has highest amount " + highestAmount);
}
You could use something like this without using HashMap or any collection for that matter
import java.util.Arrays;
public class Test {
public static void main(String args[]) {
String S = "5,a\n" +
"6,b\n" +
"9,a";
// Separate the string by number and letter
String[] separated = S.split("\\n");
// Create a new array to store the letters only
char[] letters = new char[separated.length];
// Write the letter
for (int i = 0; i < letters.length; i++) {
letters[i] = separated[i].charAt(2);
}
// Sort them haha
Arrays.sort(letters);
// And now find out which letter is repeated most
// Store the first letter
char previous = letters[0];
// Make it the most repeated one for now
char mostRepeated = letters[0];
int count = 1;
int maxCount = 1;
for (int i = 1; i < letters.length; i++) {
// since the array is sorted if the actual letter is the same as the previous one then keep counting
if (letters[i] == previous)
count++;
else {
if (count > maxCount) {
mostRepeated = letters[i - 1];
maxCount = count;
}
previous = letters[i];
count = 1;
}
}
char answer = count > maxCount ? letters[letters.length-1] : mostRepeated;
// Once you get the letter now just add all the numbers that goes with it
int sum = 0;
for (String s:separated) {
if (s.charAt(2) == answer) {
sum += Character.getNumericValue(s.charAt(0));
}
}
// Print the result by printing the letter and it sum
}
}

Word count duplicated in Java, it counts character only count word duplicated

My question is word count duplicated in Java count duplicates word it count character only enter code here.
I am use core Java program only
public static void getCount(String name) {
Map<Character, Integer> names = new HashMap<Character, Integer>();
for(int i = 0; i < name.length(); i++) {
char c = name.charAt(i);
Integer count = names.get(c);
System.out.println(names.get(c));
System.out.println("the count"+count);
if (count == null) {
count = 0;
}
names.put(c, count + 1);
System.out.println("111111111111");}`
Set<Character> a = names.keySet();
for (Character t : a) {
System.out.println(t + " Ocurred " + names.get(t) + " times");
}
i think this may help you
String name = "banana";
Map<Character, Integer> countMap = new HashMap<>();
name.chars()
.forEach((int i) -> countMap.put((char) i, countMap.getOrDefault((char) i, 0) + 1));
countMap.forEach((Character c,Integer count)->System.out.println("Character: "+c+" count: "+count));
or (I think this will have a better performance)
Map<Integer, Long> countMap2 = name.chars().boxed().collect(Collectors.groupingBy(Integer::intValue, Collectors.counting()));
countMap2.forEach((Integer c, Long count) -> {
if (count > 1) {
System.out.println("Character: " + (char) c.intValue() + " count: " + count);
}
});
The program inputs a sentence and tell returns the highest frequency duplicate word. Please test this code rigorously.
import java.util.*;
public class DupsWords {
public static String countDupsWords(String[] arr){
Hashtable<String,Integer> ht = new Hashtable<String,Integer>();
for(int i=0;i<arr.length;i++){
if(ht.containsKey(arr[i])){
ht.put(arr[i],ht.get(arr[i])+1);
} else{
ht.put(arr[i],1);
}
}
Set<String> keys=ht.keySet();
String result=null;
int max=0;
for(String itr : keys){
if(Integer.parseInt(ht.get(itr).toString())>max){
max=Integer.parseInt(ht.get(itr).toString());
if((ht.contains(max))&&(max>1)){
result=itr;
}
}
}
if(result ==null){
return "No Duplicate";
}
else{
System.out.print("count is "+ max +" for ");
return ("'"+ result +"'" + " as a duplicate word");
}
}
public static void main(String args[ ]){
Scanner scan = new Scanner(System.in);
System.out.println("Enter the String");
String s= scan.nextLine();
String[] arr=s.split(" ");
System.out.print(countDupsWords(arr));
}
}
If the query is for counting the duplicate words in a given string, this is the code:
public static void getCount(String name) {
java.util.StringTokenizer stoken = new java.util.StringTokenizer(name, " ");
boolean flag = true;
if (stoken.countTokens() > 1) {
java.util.Map<String, Integer> wordCountMap = new java.util.HashMap<String, Integer>();
while (stoken.hasMoreElements()) {
String str = stoken.nextElement().toString();
if (wordCountMap.containsKey(str)) {
wordCountMap.put(str, wordCountMap.get(str) + 1);
} else {
wordCountMap.put(str, 1);
}
}
System.out.println("Checking for Duplicates..");
for (String values : wordCountMap.keySet()) {
if (wordCountMap.get(values) > 1) {
flag = false;
System.out.println(values + "\t\t["+ wordCountMap.get(values) + "]");
}
}
}
if (flag) {
System.out.println("No duplicate words");
}
}

Java Inverted Index program

I am writing an inverted index program on java which returns the frequency of terms among multiple documents. I have been able to return the number times a word appears in the entire collection, but I have not been able to return which documents the word appears in. This is the code I have so far:
import java.util.*; // Provides TreeMap, Iterator, Scanner
import java.io.*; // Provides FileReader, FileNotFoundException
public class Run
{
public static void main(String[ ] args)
{
// **THIS CREATES A TREE MAP**
TreeMap<String, Integer> frequencyData = new TreeMap<String, Integer>( );
Map[] mapArray = new Map[5];
mapArray[0] = new HashMap<String, Integer>();
readWordFile(frequencyData);
printAllCounts(frequencyData);
}
public static int getCount(String word, TreeMap<String, Integer> frequencyData)
{
if (frequencyData.containsKey(word))
{ // The word has occurred before, so get its count from the map
return frequencyData.get(word); // Auto-unboxed
}
else
{ // No occurrences of this word
return 0;
}
}
public static void printAllCounts(TreeMap<String, Integer> frequencyData)
{
System.out.println("-----------------------------------------------");
System.out.println(" Occurrences Word");
for(String word : frequencyData.keySet( ))
{
System.out.printf("%15d %s\n", frequencyData.get(word), word);
}
System.out.println("-----------------------------------------------");
}
public static void readWordFile(TreeMap<String, Integer> frequencyData)
{
int total = 0;
Scanner wordFile;
String word; // A word read from the file
Integer count; // The number of occurrences of the word
int counter = 0;
int docs = 0;
//**FOR LOOP TO READ THE DOCUMENTS**
for(int x=0; x<Docs.length; x++)
{ //start of for loop [*
try
{
wordFile = new Scanner(new FileReader(Docs[x]));
}
catch (FileNotFoundException e)
{
System.err.println(e);
return;
}
while (wordFile.hasNext( ))
{
// Read the next word and get rid of the end-of-line marker if needed:
word = wordFile.next( );
// This makes the Word lower case.
word = word.toLowerCase();
word = word.replaceAll("[^a-zA-Z0-9\\s]", "");
// Get the current count of this word, add one, and then store the new count:
count = getCount(word, frequencyData) + 1;
frequencyData.put(word, count);
total = total + count;
counter++;
docs = x + 1;
}
} //End of for loop *]
System.out.println("There are " + total + " terms in the collection.");
System.out.println("There are " + counter + " unique terms in the collection.");
System.out.println("There are " + docs + " documents in the collection.");
}
// Array of documents
static String Docs [] = {"words.txt", "words2.txt",};
Instead of simply having a Map from word to count, create a Map from each word to a nested Map from document to count. In other words:
Map<String, Map<String, Integer>> wordToDocumentMap;
Then, inside your loop which records the counts, you want to use code which looks like this:
Map<String, Integer> documentToCountMap = wordToDocumentMap.get(currentWord);
if(documentToCountMap == null) {
// This word has not been found anywhere before,
// so create a Map to hold document-map counts.
documentToCountMap = new TreeMap<>();
wordToDocumentMap.put(currentWord, documentToCountMap);
}
Integer currentCount = documentToCountMap.get(currentDocument);
if(currentCount == null) {
// This word has not been found in this document before, so
// set the initial count to zero.
currentCount = 0;
}
documentToCountMap.put(currentDocument, currentCount + 1);
Now you're capturing the counts on a per-word and per-document basis.
Once you've completed the analysis and you want to print a summary of the results, you can run through the map like so:
for(Map.Entry<String, Map<String,Integer>> wordToDocument :
wordToDocumentMap.entrySet()) {
String currentWord = wordToDocument.getKey();
Map<String, Integer> documentToWordCount = wordToDocument.getValue();
for(Map.Entry<String, Integer> documentToFrequency :
documentToWordCount.entrySet()) {
String document = documentToFrequency.getKey();
Integer wordCount = documentToFrequency.getValue();
System.out.println("Word " + currentWord + " found " + wordCount +
" times in document " + document);
}
}
For an explanation of the for-each structure in Java, see this tutorial page.
For a good explanation of the features of the Map interface, including the entrySet method, see this tutorial page.
Try adding second map word -> set of document name like this:
Map<String, Set<String>> filenames = new HashMap<String, Set<String>>();
...
word = word.replaceAll("[^a-zA-Z0-9\\s]", "");
// Get the current count of this word, add one, and then store the new count:
count = getCount(word, frequencyData) + 1;
frequencyData.put(word, count);
Set<String> filenamesForWord = filenames.get(word);
if (filenamesForWord == null) {
filenamesForWord = new HashSet<String>();
}
filenamesForWord.add(Docs[x]);
filenames.put(word, filenamesForWord);
total = total + count;
counter++;
docs = x + 1;
When you need to get a set of filenames in which you encountered a particular word, you'll just get() it from the map filenames. Here is the example that prints out all the file names, in which we have encountered a word:
public static void printAllCounts(TreeMap<String, Integer> frequencyData, Map<String, Set<String>> filenames) {
System.out.println("-----------------------------------------------");
System.out.println(" Occurrences Word");
for(String word : frequencyData.keySet( ))
{
System.out.printf("%15d %s\n", frequencyData.get(word), word);
for (String filename : filenames.get(word)) {
System.out.println(filename);
}
}
System.out.println("-----------------------------------------------");
}
I've put a scanner into the main methode, and the word I search for will return the documents the word occurce in. I also return how many times the word occurs, but I will only get it to be the total of times in all of three documents. And I want it to return how many times it occurs in each document. I want this to be able to calculate tf-idf, if u have a total answer for the whole tf-idf I would appreciate. Cheers
Here is my code:
import java.util.*; // Provides TreeMap, Iterator, Scanner
import java.io.*; // Provides FileReader, FileNotFoundException
public class test2
{
public static void main(String[ ] args)
{
// **THIS CREATES A TREE MAP**
TreeMap<String, Integer> frequencyData = new TreeMap<String, Integer>();
Map<String, Set<String>> filenames = new HashMap<String, Set<String>>();
Map<String, Integer> countByWords = new HashMap<String, Integer>();
Map[] mapArray = new Map[5];
mapArray[0] = new HashMap<String, Integer>();
readWordFile(countByWords, frequencyData, filenames);
printAllCounts(countByWords, frequencyData, filenames);
}
public static int getCount(String word, TreeMap<String, Integer> frequencyData)
{
if (frequencyData.containsKey(word))
{ // The word has occurred before, so get its count from the map
return frequencyData.get(word); // Auto-unboxed
}
else
{ // No occurrences of this word
return 0;
}
}
public static void printAllCounts( Map<String, Integer> countByWords, TreeMap<String, Integer> frequencyData, Map<String, Set<String>> filenames)
{
System.out.println("-----------------------------------------------");
System.out.print("Search for a word: ");
String worde;
int result = 0;
Scanner input = new Scanner(System.in);
worde=input.nextLine();
if(!filenames.containsKey(worde)){
System.out.println("The word does not exist");
}
else{
for(String filename : filenames.get(worde)){
System.out.println(filename);
System.out.println(countByWords.get(worde));
}
}
System.out.println("\n-----------------------------------------------");
}
public static void readWordFile(Map<String, Integer> countByWords ,TreeMap<String, Integer> frequencyData, Map<String, Set<String>> filenames)
{
Scanner wordFile;
String word; // A word read from the file
Integer count; // The number of occurrences of the word
int counter = 0;
int docs = 0;
//**FOR LOOP TO READ THE DOCUMENTS**
for(int x=0; x<Docs.length; x++)
{ //start of for loop [*
try
{
wordFile = new Scanner(new FileReader(Docs[x]));
}
catch (FileNotFoundException e)
{
System.err.println(e);
return;
}
while (wordFile.hasNext( ))
{
// Read the next word and get rid of the end-of-line marker if needed:
word = wordFile.next( );
// This makes the Word lower case.
word = word.toLowerCase();
word = word.replaceAll("[^a-zA-Z0-9\\s]", "");
// Get the current count of this word, add one, and then store the new count:
count = countByWords.get(word);
if(count != null){
countByWords.put(word, count + 1);
}
else{
countByWords.put(word, 1);
}
Set<String> filenamesForWord = filenames.get(word);
if (filenamesForWord == null) {
filenamesForWord = new HashSet<String>();
}
filenamesForWord.add(Docs[x]);
filenames.put(word, filenamesForWord);
counter++;
docs = x + 1;
}
} //End of for loop *]
System.out.println("There are " + counter + " terms in the collection.");
System.out.println("There are " + docs + " documents in the collection.");
}
// Array of documents
static String Docs [] = {"Document1.txt", "Document2.txt", "Document3.txt"};
}

Create word count of text using hashmap

I am trying to create a program as a tutorial for myself for hashmaps. I ask the user into text and try to split it into hashmaps and then increase the count if the word repeats. This is my program:
import java.util.*;
import java.lang.*;
import javax.swing.JOptionPane;
import java.io.*;
public class TestingTables
{
public static void main(String args[])
{
{
String s = JOptionPane.showInputDialog("Enter any text.");
String[] splitted = s.split(" ");
HashMap hm = new HashMap();
int x;
for (int i=0; i<splitted.length ; i++) {
hm.put(splitted[i], i);
System.out.println(splitted[i] + " " + i);
if (hm.containsKey(splitted[i])) {
x = ((Integer)hm.get(splitted[i])).intValue();
hm.put(splitted[i], new Integer(x+1)); }
}
}
}
}
When I input "random random random", I get:
random 0
random 1
random 2
What do I need to change so I get:
random 3
Also, do I need to use an iterator to print out the hashmap, or is what I used OK?
Your initialization is wrong hm.put(splitted[i], i).
You should initialize to 0 or to 1 (to count, not to index).
So do this loop first.
for (int i = 0; i < splitted.length; i++) {
if (!hm.containsKey(splitted[i])) {
hm.put(splitted[i], 1);
} else {
hm.put(splitted[i], (Integer) hm.get(splitted[i]) + 1);
}
}
Then just do one more loop (iterate through the keys of the HashMap) and print the counts out.
for (Object word : hm.keySet()){
System.out.println(word + " " + (Integer) hm.get(word));
}
import java.util.*;
import java.lang.*;
import javax.swing.JOptionPane;
import java.io.*;
public class TestingTables
{
public static void main(String args[])
{
{
String s = JOptionPane.showInputDialog("Enter any text.");
String[] splitted = s.split(" ");
Map<String, Integer> hm = new HashMap<String, Integer>();
int x;
for (int i=0; i<splitted.length ; i++) {
if (hm.containsKey(splitter[i])) {
int cont = hm.get(splitter[i]);
hm.put(splitter[i], cont + 1)
} else {
hm.put(splitted[i], 1);
}
}
}
}
Your Map declaration is wrong, remember the correct way to implement a Map.
This should work, its a pretty simple implementation..
Map<String, Integer> hm = new HashMap<String, Integer>();
int x;
for (int i = 0; i < splitted.length; i++) {
if (hm.containsKey(splitted[i])) {
x = hm.get(splitted[i]);
hm.put(splitted[i], x + 1);
} else {
hm.put(splitted[i], 1);
}
}
for (String key : hm.keySet()) {
System.out.println(key + " " + hm.get(key));
}

How to Count Repetition of Words in Array List?

I've these code for searching occurrence in Array-List but my problem is how I can get result
out side of this for loop in integer type cause I need in out side , may be there is another way for finding
occurrence with out using for loop can you help me ?
thank you...
List<String> list = new ArrayList<String>();
list.add("aaa");
list.add("bbb");
list.add("aaa");
Set<String> unique = new HashSet<String>(list);
for (String key : unique) {
int accurNO = Collections.frequency(list, key);
System.out.println(key + ": " accurNO);
}
You should declare a map like Map<String, Integer> countMap = new HashMap<String, Integer>(); before the loop, and populate it within the loop.
Map<String, Integer> countMap = new HashMap<String, Integer>();
for (String key : unique) {
int accurNO = Collections.frequency(list, key);
coutMap.put(key, accurNO);
//...
}
//now you have a map with keys and their frequencies in the list
Set unique = new HashSet(list);
and
Collections.frequency(list, key);
are too much overhead.
Here is how i would do it
List<String> list = new ArrayList<String>();
list.add("aaa");
list.add("bbb");
list.add("aaa");
Map<String, Integer> countMap = new HashMap<>();
for (String word : list) {
Integer count = countMap.get(word);
if(count == null) {
count = 0;
}
countMap.put(word, (count.intValue()+1));
}
System.out.println(countMap.toString());
Output
{aaa=2, bbb=1}
EDIT output one by one: iterate over the set of entries of the map
for(Entry<String, Integer> entry : countMap.entrySet()) {
System.out.println("frequency of '" + entry.getKey() + "' is "
+ entry.getValue());
}
Output
frequency of 'aaa' is 2
frequency of 'bbb' is 1
EDIT 2 No need for looping
String word = null;
Integer frequency = null;
word = "aaa";
frequency = countMap.get(word);
System.out.println("frequency of '" + word + "' is " +
(frequency == null ? 0 : frequency.intValue()));
word = "bbb";
frequency = countMap.get(word);
System.out.println("frequency of '" + word + "' is " +
(frequency == null ? 0 : frequency.intValue()));
word = "foo";
frequency = countMap.get(word);
System.out.println("frequency of '" + word + "' is " +
(frequency == null ? 0 : frequency.intValue()));
Output
frequency of 'aaa' is 2
frequency of 'bbb' is 1
frequency of 'foo' is 0
Note that you will always have a collection and you need extract the count from it for a particular word one way or another.
List<String> list = new ArrayList<String>();
list.add("aaa");
list.add("bbb");
list.add("aaa");
Map<String,Integer> countMap = new HashMap();
Set<String> unique = new HashSet<String>(list);
for (String key : unique) {
int accurNO = Collections.frequency(list, key);
countMap.put(key,accurNO);
System.out.println(key + ": " accurNO);
}
The Map answers work, but you can extend this answer to solve more problems.
You create a class that has the field values you need, and put the class in a List.
import java.util.ArrayList;
import java.util.List;
public class WordCount {
private String word;
private int count;
public WordCount(String word) {
this.word = word;
this.count = 0;
}
public void addCount() {
this.count++;
}
public String getWord() {
return word;
}
public int getCount() {
return count;
}
}
class AccumulateWords {
List<WordCount> list = new ArrayList<WordCount>();
public void run() {
list.add(new WordCount("aaa"));
list.add(new WordCount("bbb"));
list.add(new WordCount("ccc"));
// Check for word occurrences here
for (WordCount wordCount : list) {
int accurNO = wordCount.getCount();
System.out.println(wordCount.getWord() + ": " + accurNO);
}
}
}
I would sort the list first to avoid going thru the whole list with Collections.frequency every time. The code will be longer but much more efficient
List<String> list = new ArrayList<String>();
list.add("aaa");
list.add("bbb");
list.add("aaa");
Map<String, Integer> map = new HashMap<String, Integer>();
Collections.sort(list);
String last = null;
int n = 0;
for (String w : list) {
if (w.equals(last)) {
n++;
} else {
if (last != null) {
map.put(last, n);
}
last = w;
n = 1;
}
}
map.put(last, n);
System.out.println(map);
output
{aaa=2, bbb=1}

Categories