Word frequency in Programming Pearls - java

In "Programming Pearls" I have met the following problem. The question is this: "print words in order of decreasing frequency". As I understand problem is this. Suppose there is a given string array, let's call it s (words I have chosen randomly, it does not matter),
String s[]={"cat","cat","dog","fox","cat","fox","dog","cat","fox"};
We see that string "cat" occurs 4 times, "fox" 3 times and "dog" 2 times. So the desired result will be this:
cat
fox
dog
I have written the following code in Java:
import java.util.*;
public class string {
public static void main(String[] args){
String s[]={"fox","cat","cat","fox","dog","cat","fox","dog","cat"};
Arrays.sort(s);
int counts;
int count[]=new int[s.length];
for (int i=0;i<s.length-1;i++){
counts=1;
while (s[i].equals(s[i+1])){
counts++;
}
count[i]=counts;
}
}
}
I have sorted the array and created a count array where I write the number of occurrences of each word in array.
My problem is that somehow the index of the integer array element and the string array element is not the same. How can I print words according to the maximum elements of the integer array?

To keep track of the count of each word, I would use a Map which maps a word to it's current count.
String s[]={"cat","cat","dog","fox","cat","fox","dog","cat","fox"};
Map<String, Integer> counts = new HashMap<String, Integer>();
for (String word : s) {
if (!counts.containsKey(word))
counts.put(word, 0);
counts.put(word, counts.get(word) + 1);
}
To print the result, go through the keys in the map and get the final value.
for (String word : counts.keySet())
System.out.println(word + ": " + (float) counts.get(word) / s.length);

Related

Java - Sorting a String array by word length [duplicate]

This question already has answers here:
How to sort String array by length using Arrays.sort()
(10 answers)
Closed 5 years ago.
I need to sort the String array into least to greatest order by wordlength. I then need to print out the words next to their value.
Example Output:
1: I, U, K
6: Joseph, Delete
But I need help sorting the array first.
Code:
package Objects;
import java.io.*;
import java.util.*;
import java.util.Arrays;
import java.lang.Object;
public class WordLength {
public static void main(String[] args) throws Exception{
Scanner scan = new Scanner(new File("wordlength.dat"));
int thresh = scan.nextInt(); //Scans the first integer
String[] array = new String[thresh]; //Creates an empty array with length of the threshold
for(int i = 0; i < array.length; i++) {
array[i] = scan.nextLine();
/* I need to sort the array from least to
greatest here so I can use the if statement
below.
*/
//Prints the word array from least to greatest by length
if(array[i].length() == i) {
System.out.print(i + ": " + array[i]);
}
}
}
}
You can use a custom comparator to sort the String array. For example
String [] arrr = {"World","Ho","ABCD"};
Arrays.sort(arrr,new Comparator<String>() {
#Override
public int compare(String o1, String o2) {
return o1.length() - o2.length();
}
});
System.out.println(arrr[0]); \\ Output Ho
One of Arrays.sort overloads accepts Comparator as the second parameter, in which you can implement your own custom comparison function to determine whether a string should be placed before or after another simply by implementing the compare method.
I don't code in Java, but would this pseudocode work?
Grab the list of words.
Loop through them.
Get the length of the current word.
Add that word to that array number: a 5-letter word would go in array element 5.
Continue the loop until all the words have been added to the proper array.
Display each of the array elements.

modifying algorithm to generate unique permutations in a string that contains duplicates

I'm aware of handling the issue with duplicates if I were to use a swap and permute method for generating permutations as shown here.
However, I'm using a different approach where I place current character between any two characters, at the beginning and at the end, of all of the permutations generated without the current character.
How can I modify my code below to give me only unique permutations in a string that contains duplicates
import java.util.ArrayList;
public class Permutations {
public static void main(String[] args) {
String str = "baab";
System.out.println(fun(str, 0));
System.out.println("number of Permutations =="+fun(str, 0).size());
}
static ArrayList<String> fun(String str, int index)
{
if(index == str.length())
{
ArrayList<String> al = new ArrayList<String>();
al.add("");
return al;
}
/* get return from lower frame */
ArrayList<String> rec = fun(str, index+1);
/* get character here */
char c = str.charAt(index);
/* to each of the returned Strings in ArrayList, add str.charAt(j) */
ArrayList<String> ret = new ArrayList<String>();
for(int i = 0;i<rec.size();i++)
{
String here = rec.get(i);
ret.add(c + here);
for(int j = 0;j<here.length();j++)
ret.add(here.substring(0,j+1) + c + here.substring(j+1,here.length()));
}
return ret;
}
}
At the moment, a string such as "bab" generates the following output, which contain abb and bba multiple times.
[bab, abb, abb, bba, bba, bab]
number of Permutations ==6
PS : I do not want to use a hashmap/Set to keep track of my duplicates and see whether they were encountered previously.
When you're iterating through the string and adding the character at each position, if you find a character in the string that is the same as the one you are inserting, break after inserting the new character immediately before it. This means that strings with the same character more than once can only be formed one way (by inserting in reverse order) so duplicates can't happen.
for(int j = 0;j<here.length();j++)
{
if(here.charAt(j) == c)
break;
ret.add(here.substring(0,j+1) + c + here.substring(j+1,here.length()));
}
A general approach to solving these problems involving generating sets without duplicates is to think of a property that only one of each set of duplicates will have, and then enforce that as a constraint. For example in this case the constraint is "all duplicated characters are added in reverse order" (forward order would work just as well, but you'd have to flip the loop direction). For a combination problem where the order isn't important, the constraint could be "items in each list are in ascending order". And so on.

How can I use a string array as key in hash map?

I've made an String array out of a .txt and now want to make a HashMap with this string as key. But I don't want to have the String as one key to one value, I want to have each Information as a new key for the HashMap.
private static String[] readAndConvertInputFile() {
String str = StdIn.readAll();
String conv = str.replaceAll("\'s", "").replaceAll("[;,?.:*/\\-_()\"\'\n]", " ").replaceAll(" {2,}", " ").toLowerCase();
return conv.split(" "); }
So the information in the string is like ("word", "thing", "etc.", "pp.", "thing").
My value should be the frequency of the word in the text. So for example key: "word" value: 1, key: "thing" value: 2 and so on... I'm clueless and would be grateful if someone could help me, at least with the key. :)
You can create a Map while using the String value at each array index as the key, and an Integer as the value to keep track of how many times a word appeared.
Map<String,Integer> map = new HashMap<String,Integer>();
Then when you want to increment, you can check if the Map already contains the key, if it does, increase it by 1, otherwise, set it to 1.
if (occurences.containsKey(word)) {
occurences.put(word, occurences.get(word) + 1);
} else {
occurences.put(word, 1);
}
So, while you are looping over your string array, convert the String to lower case (if you want to ignore case for word occurrences), and increment the map using the if statement above.
for (String word : words) {
word = word.toLowerCase(); // remove if you want case sensitivity
if (occurences.containsKey(word)) {
occurences.put(word, occurences.get(word) + 1);
} else {
occurences.put(word, 1);
}
}
A full example is shown below. I converted to words to lowercase to ignore case when using the key in the map, if you want to keep case, remove the line where I convert it to lowercase.
public static void main(String[] args) {
String s = "This this the has dog cat fish the cat horse";
String[] words = s.split(" ");
Map<String, Integer> occurences = new HashMap<String, Integer>();
for (String word : words) {
word = word.toLowerCase(); // remove if you want case sensitivity
if (occurences.containsKey(word)) {
occurences.put(word, occurences.get(word) + 1);
} else {
occurences.put(word, 1);
}
}
for(Entry<String,Integer> en : occurences.entrySet()){
System.out.println("Word \"" + en.getKey() + "\" appeared " + en.getValue() + " times.");
}
}
Which will give me output:
Word "cat" appeared 2 times.
Word "fish" appeared 1 times.
Word "horse" appeared 1 times.
Word "the" appeared 2 times.
Word "dog" appeared 1 times.
Word "this" appeared 2 times.
Word "has" appeared 1 times.
Yes, you can use an array (regardless of element type) as a HashMap key.
No, shouldn't do so. The behavior is unlikely to be what you want (in general).
In your particular case, I don't see why you even propose using an array as a key in the first place. You seem to want Strings drawn from among your array elements as keys.
You could construct a word frequency table like so:
Map<String, Integer> computeFrequencies(String[] words) {
Map<String, Integer> frequencies = new HashMap<String, Integer>();
for (String word: words) {
Integer wordFrequency = frequencies.get(word);
frequencies.put(word,
(wordFrequency == null) ? 1 : (wordFrequency + 1));
}
return frequencies;
}
In java 8 using stream
String[] array=new String[]{"a","b","c","a"};
Map<String,Integer> map1=Arrays.stream(array).collect(Collectors.toMap(x->x,x->1,(key,value)->value+1));

Java program - Counts all the words from a text file, and counts frequency of each word

I'm a beginner programmer and I'm trying to do one program that opens a text file with a large text inside and then it counts how many words it contains.
Then it should write how many different words are in the text, and write the frecuency of each word in the text.
I had the intention to use one array-string to store all unique words and one int-string to store the frequency.
The program counts the words, but I'm a little bit unsure about how could I write the code correctly to get the list of the words and the frequency them are repeated in the text.
I wrote this:
import easyIO.*;
import java.util.*;
class Oblig3A{
public static void main(String[] args){
int cont = 0;
In read = new In (alice.txt);
In read2 = new In (alice.txt);
while(read.endOfFile() == false)
{
String info = read.inWord();
System.out.println(info);
cont = cont + 1;
}
System.out.println(UniqueWords);
final int AN_WORDS = cont;
String[] words = new String[AN_WORDS];
int[] frequency = new int[AN_WORDS];
int i = 0;
while(les2.endOfFile() == false){
word[i] = read2.inWord();
i = i + 1;
}
}
}
Ok, here is what you need to do:
1. Use a BufferedReader to read the lines of text from the file, one by one.
2. Create a HashMap<String,Integer> to store the word, frequency relations.
3. When you read each line of text, use split() to get all the words in the line of text in an array of String[]
4. Iterate over each word. For each word, retrieve the value from the HashTable. if you get a null value, you have found the word for the first time. Hence, create a new Integer with value 1 and place it back in the HashMap
If you get a non-null value, then increment the value and place it back in the HashMap.
5. Do this till you do not reach EOF.
Done !
You can use a
Map<String, Integer> map = HashMap<String, Integer>();
And then add the words to the map asking if the value is already there. If it is not, add it to the map with a counter initialized to 1.
if(!map.containsKey(word))
{
map.put(word, new Integer("1"));
}
else
{
map.put(word, map.get(word) + new Integer(1));
}
In the end you will have a map with all the words that the file contains and a Integer that represents how many times does the word appear in the text.
You basically need a hash here. In java , you can use a HashMap<String, Integer> which will store words and their frequency.
So when you read in a new word, check it up in the hashMap, say h, and if it exists , increase the frequency or add a new word with frequency = 1.
If you can use a library you may want to consider using a Guava Multiset, it has the counting functionality already built in:
public void count() throws IOException {
Multiset<String> countSet = HashMultiset.create();
BufferedReader bufferedReader = new BufferedReader(new FileReader("alice.txt"));
String line;
while ((line = bufferedReader.readLine()) != null) {
List<String> words = Arrays.asList(line.split("\\W+"));
countSet.addAll(words);
}
bufferedReader.close();
for (Entry<String> entry : countSet.entrySet()) {
System.out.println("word: " + entry.getElement() + " count: " + entry.getCount());
}
}

Print words which occurs more than once from a string

I am trying to find and print the words in a string that occurs more than one. And it works almost. I am however fighting with a small problem. The words a printed out twice since they occur twice in the sentence. I want them printed only once:
This is my code:
public class Main {
/**
* #param args the command line arguments
*/
public static void main(String[] args) {
String sentence = "is this a sentence or is this not ";
String[] myStringArray = sentence.split(" "); //Split the sentence by space.
int[] count = new int[myStringArray.length];
for (int i = 0; i < myStringArray.length; i++){
for (int j = 0; j < myStringArray.length; j++){
if (myStringArray[i].matches(myStringArray[j]))
count[i]++;
//else break;
}
}
for (int i = 0; i < myStringArray.length; i++) {
if (count[i] > 1)
System.out.println("1b. - Tokens that occurs more than once: " + myStringArray[i] + "\n");
}
}
}
You can try for (int i = 0; i < myStringArray.length; i+=2) instead.
break on the first match, after incrementing. then it won't also increment the second match.
Your code has some problems with it.
If you notice, your code will look through the list of n elements n^2 times.
If the occurrence of the word is twice. You will increment each word's count value twice.
What you need to keep track of is the set of words you have already seen, and check if a new word you encounter has already been seen or not.
If you had 3 occurrence of one word in your sentence, you each word would have a count of 3. The 3 is redundant data that doesn't need to be stored for each token, but rather just the word.
All this can be done easily if you know how a Map works.
Here is an implementation that would work.
import java.util.HashMap;
public class Main {
public static void main(String[] args) {
String sentence = "is this a sentence or is this not ";
String[] myStringArray = sentence.split("\\s"); //Split the sentence by space.
Map <String, Integer> wordOccurrences = new HashMap <String, Integer> (myStringArray.length);
for (String word : myStringArray)
if (wordOccurrences.contains(word))
wordOccurrences.put(word, wordOccurrences.get(word) + 1);
else wordOccurrences.put(word, 1);
for (String word : wordOccurrences.keySet())
if (wordOccurrences.get(word) > 1)
System.out.println("1b. - Tokens that occurs more than once: " + word + "\n");
}
}
We want to find the repeating words from an input string. So, I suggest the following approach which is fairly simple:
Make a Hash Map instance. The key (String) will be the word and the value(Integer) will be the frequency of its occurrence.
Split the string using split("\s") method to make an array of only words.
Introduce an Integer type 'frequency' variable with initial value '0'.
Iterate of the string array and after checking frequency, add each element ( or word) to the map (if frequency for that key is 0) or if
the key (word) exists, only increment the frequency by 1.
So you are now left with each word and its frequency.
For example, if input string is "We are getting dirty as this earth is getting polluted. We must stop it."
So, the map will be
{ ("We",2), ("are",1), ("getting",2), ("dirty",1), ("as",1), ("this",1), ("earth",1), ("is",1), ("polluted.",1), ("must",1), ("stop",1), ("it.",1) }
Now you know what is next step and how to use it. I agree with Kaushik.

Categories