how do i count occurrence of words in a line - java

I am fairly new to java. I want to count the occurrences of words in a particular line. So far i can only count the words but no idea how to count occurrences.
Is there a simple way to do this?
Scanner file = new Scanner(new FileInputStream("/../output.txt"));
int count = 0;
while (file.hasNextLine()) {
String s = file.nextLine();
count++;
if(s.contains("#AVFC")){
System.out.printf("There are %d words on this line ", s.split("\\s").length-1);
System.out.println(count);
}
}
file.close();
Output:
There are 4 words on this line 1
There are 8 words on this line 13
There are 3 words on this line 16

Simplest way I can think of is to use String.split("\\s"), which will split based on spaces.
Then have a HashMap containing a word as the key with the value being the number of times it is used.
HashMap<String, Integer> mapOfWords = new HashMap<String, Integer>();
while (file.hasNextLine()) {
String s = file.nextLine();
String[] words = s.split("\\s");
int count;
for (String word : words) {
if (mapOfWords.get(word) == null) {
mapOfWords.put(word, 1);
}
else {
count = mapOfWord.get(word);
mapOfWords.put(word, count + 1);
}
}
}
Implementation you requested to skip strings that contain certain words
HashMap<String, Integer> mapOfWords = new HashMap<String, Integer>();
while (file.hasNextLine()) {
String s = file.nextLine();
String[] words = s.split("\\s");
int count;
if (isStringWanted(s) == false) {
continue;
}
for (String word : words) {
if (mapOfWords.get(word) == null) {
mapOfWords.put(word, 1);
}
else {
count = mapOfWord.get(word);
mapOfWords.put(word, count + 1);
}
}
}
private boolean isStringWanted(String s) {
String[] checkStrings = new String[] {"chelsea", "Liverpool", "#LFC"};
for (String check : checkString) {
if (s.contains(check)) {
return false;
}
}
return true;
}

Try below code, it may solve your problem, in addition you can call String.toLowerCase() before you put it into the hashmap
String line ="a a b b b b a q c c";
...
Map<String,Integer> map = new HashMap<String,Integer>();
Scanner scanner = new Scanner(line);
while (scanner.hasNext()) {
String s = scanner.next();
Integer count = map.put(s,1);
if(count!=null) map.put(s,count + 1);
}
...
System.out.println(map);
Result:
{b=4, c=2, q=1, a=3}

Fastest would be store the splitted data in a ArrayList then iterate on your ArrayList and use [Collections.frequency] (http://www.tutorialspoint.com/java/util/collections_frequency.htm)

Check Guava's Multiset. Their description starts with 'The traditional Java idiom for e.g. counting how many times a word occurs in a document is something like:'. You find some code snippets how to do that without a MultiSet.
BTW: If you only wanted to count the number of words in your string, why not just count the spaces? You could use StringUtils from the apache commons. It's much better than creating an array of the split parts. Also have a look at their implementation.
int count = StringUtils.countMatches(string, " ");

In a given String, occurrences of a given String can be counted using String#indexOf(String, int) and through a loop
String haystack = "This is a string";
String needle = "i";
int index = 0;
while (index != -1) {
index = haystack.indexOf(needle, index + 1);
if (index != -1) {
System.out.println(String.format("Found %s in %s at index %s.", needle, haystack, index));
}
}

Related

Finding the longest word ArrayList /Java

I want to write a method which finds the longest String (word). The output should be the longest word in case of two words with the same lenght the output should be: "More than one longest word".
I used ArrayList and almost had a solution, but something goes wrong. The case is that I have a problem when two words have the same lenght.
The output is :
More than one longest word
More than one longest word
14 incrementation is the longest word
Please check out piece of my code and help me to find the answer :)
public class LongestWord {
public static void main(String[] args) {
ArrayList<String> wordsList = new ArrayList<String>();
wordsList.add("december");
wordsList.add("california");
wordsList.add("cat");
wordsList.add("implementation");
wordsList.add("incrementation");
int largestString = wordsList.get(0).length();
int index = 0;
for (int i = 0; i < wordsList.size(); i++) {
if (wordsList.get(i).length() > largestString) {
largestString = wordsList.get(i).length();
index = i;
}else if(wordsList.get(i).length() == largestString){
largestString = wordsList.get(i).length();
index = i;
System.out.println("More than one longest word");
}
}
System.out.println(largestString +" " + wordsList.get(index) +" is the longest word ");
}
}
The fact is that you can't tell what the biggest word until you have iterated the whole list.
So iterate on the list
if word is bigger than previous largest size : clear list and save word
if word has same size as largest size : save word
if word is smaller : nothing
List<String> wordsList = Arrays.asList(
"december", "california", "cat",
"implementation", "incremntation");
int maxLength = Integer.MIN_VALUE;
List<String> largestStrings = new ArrayList<>();
for (String s : wordsList) {
if (s.length() > maxLength) {
maxLength = s.length();
largestStrings.clear();
largestStrings.add(s);
} else if (s.length() == maxLength) {
largestStrings.add(s);
}
}
if (largestStrings.size() > 1) {
System.out.println("More than one longest word");
System.out.println(largestStrings);
} else {
System.out.println(largestStrings.get(0) + " is the longest word");
}
Gives
More than one longest word
[implementation, incrementation]
azro is right. You can figure out the problem using two iteration. I m not sure but the code below works
for (int i = 0; i < wordsList.size(); i++) {
if (wordsList.get(i).length() > largestString) {
largestString = wordsList.get(i).length();
index = i;
}
}
for (int i = 0; i < wordsList.size(); i++) {
if (wordsList.get(index).length() == wordsList.get(i).length()) {
System.out.println("More than one longest word");
break;
}
}
You can do this with one loop iteration. Storing the longest word(s) as you go.
import java.util.*;
public class Test {
public static void main(String[] args) {
final Collection<String> words = Arrays.asList(
"december", "california", "cat",
"implementation", "incrementation");
final Collection<String> longestWords = findLongestWords(words);
if (longestWords.size() == 1) {
System.out.printf("The longest word is: %s\n", longestWords.iterator().next());
} else if (longestWords.size() > 1) {
System.out.printf("More than one longest word. The longest words are: %s\n", longestWords);
}
}
private static final Collection<String> findLongestWords(final Collection<String> words) {
// using a Set, so that duplicate words are stored only once.
final Set<String> longestWords = new HashSet<>();
// remember the current length of the longest word
int lengthOfLongestWord = Integer.MIN_VALUE;
// iterate over all the words
for (final String word : words) {
// the length of this word is longer than the previously though longest word. clear the list and update the longest length.
if (word.length() > lengthOfLongestWord) {
lengthOfLongestWord = word.length();
longestWords.clear();
}
// the length of this word is currently though to be the longest word, add it to the Set.
if (word.length() == lengthOfLongestWord) {
longestWords.add(word);
}
}
// return an unmodifiable Set containing the longest word(s)
return Collections.unmodifiableSet(longestWords);
}
}
My two cents to make it done in the single loop. Can be improved further.
ArrayList<String> wordsList = new ArrayList<String>();
wordsList.add("december");
wordsList.add("california");
wordsList.add("cat");
wordsList.add("implementation");
wordsList.add("incrementation");
String result;
int length = Integer.MIN_VALUE;
Map<String,String> map = new HashMap<>();
for(String word: wordsList){
if(word.length() >= length) {
length = word.length();
if (map.containsKey(String.valueOf(word.length())) || map.containsKey( "X" + word.length())) {
map.remove(String.valueOf(word.length()));
map.put("X" + word.length(), word);
} else {
map.put(String.valueOf(word.length()), word);
}
}
}
result = map.get(String.valueOf(length)) == null ? "More than one longest word" :
map.get(String.valueOf(length)) + " is the longest word";
System.out.println(result);
Here is one approach. I am using a set to hold the results as there is no reason to include duplicate words if they exist.
iterate over the words
if the current word length is > maxLength, clear the set and add the word, and update maxLength
if equal to the maxLength, just add the word.
List<String> wordsList = List.of("december", "implementation",
"california", "cat", "incrementation");
int maxLength = Integer.MIN_VALUE;
Set<String> results = new HashSet<>();
for (String word : wordsList) {
int len = word.length();
if (len >= maxLength) {
if (len > maxLength) {
results.clear();
maxLength = len;
}
results.add(word);
}
}
System.out.printf("The longest word%s -> %s%n", results.size() > 1 ? "s" : "", results);
prints
The longest words -> [implementation, incrementation]
I changed your code to suggest a different approach to the problem. Honestly, I hope you'll find it fascinating and helpful.
There are two different fashion of it, one that doesn't care about finding more than one longest word (it stamps just the first one - but you can change it as you prefer), and the other one that does.
First solution:
`
import java.util.ArrayList;
import java.util.List;
import java.util.stream.Collectors;
public class LongestWord {
public static void main(String[] args) {
List<String> wordsList = new ArrayList<>();
wordsList.add("december");
wordsList.add("california");
wordsList.add("cat");
wordsList.add("implementation");
wordsList.add("incrementation");
wordsList.stream()
.max(LongestWord::compare)
.ifPresent(a -> System.out.println(a.toUpperCase() + " is the longest word with length of: " + a.length()));
}
private static int compare(String a1, String b1) {
return a1.length() - b1.length();
}
}
`
Second solution:
`
public class LongestWord {
public static void main(String[] args) {
List<String> wordsList = new ArrayList<>();
wordsList.add("december");
wordsList.add("california");
wordsList.add("cat");
wordsList.add("implementation");
wordsList.add("incrementation");
int max_length = wordsList.stream()
.max(LongestWord::compare)
.map(String::length).orElse(0);
List<String> finalWordsList = wordsList.stream()
.filter(word -> word.length() == max_length)
.collect(Collectors.toList());
if (finalWordsList.size() > 1) {
System.out.println("More than one longest word");
} else {
System.out.println(finalWordsList.get(0) + " is the longest word");
}
}
private static int compare(String a1, String b1) {
return a1.length() - b1.length();
}
}
`

Find the most common word from user input

I'm very new to Java creating a software application that allows a user to input text into a field and the program runs through all of the text and identifies what the most common word is. At the moment, my code looks like this:
JButton btnMostFrequentWord = new JButton("Most Frequent Word");
btnMostFrequentWord.addActionListener(new ActionListener() {
public void actionPerformed(ActionEvent e) {
String text = textArea.getText();
String[] words = text.split("\\s+");
HashMap<String, Integer> occurrences = new HashMap<String, Integer>();
for (String word : words) {
int value = 0;
if (occurrences.containsKey(word)) {
value = occurrences.get(word);
}
occurrences.put(word, value + 1);
}
JOptionPane.showMessageDialog(null, "Most Frequent Word: " + occurrences.values());
}
}
This just prints what the values of the words are, but I would like it to tell me what the number one most common word is instead. Any help would be really appreciated.
Just after your for loop, you can sort the map by value then reverse the sorted entries by value and select the first.
for (String word: words) {
int value = 0;
if (occurrences.containsKey(word)) {
value = occurrences.get(word);
}
occurrences.put(word, value + 1);
}
Map.Entry<String,Integer> tempResult = occurrences.entrySet().stream()
.sorted(Map.Entry.<String, Integer>comparingByValue().reversed())
.findFirst().get();
JOptionPane.showMessageDialog(null, "Most Frequent Word: " + tempResult.getKey());
For anyone who is more familiar with Java, here is a very easy way to do it with Java 8:
List<String> words = Arrays.asList(text.split("\\s+"));
Collections.sort(words, Comparator.comparingInt(word -> {
return Collections.frequency(words, word);
}).reversed());
The most common word is stored in words.get(0) after sorting.
I would do something like this
int max = 0;
String a = null;
for (String word : words) {
int value = 0;
if(occurrences.containsKey(word)){
value = occurrences.get(word);
}
occurrences.put(word, value + 1);
if(max < value+1){
max = value+1;
a = word;
}
}
System.out.println(a);
You could sort it, and the solution would be much shorter, but I think this runs faster.
You can either iterate through occurrences map and find the max or
Try like below
String text = textArea.getText();;
String[] words = text.split("\\s+");
HashMap<String, Integer> occurrences = new HashMap<>();
int mostFreq = -1;
String mostFreqWord = null;
for (String word : words) {
int value = 0;
if (occurrences.containsKey(word)) {
value = occurrences.get(word);
}
value = value + 1;
occurrences.put(word, value);
if (value > mostFreq) {
mostFreq = value;
mostFreqWord = word;
}
}
JOptionPane.showMessageDialog(null, "Most Frequent Word: " + mostFreqWord);

Extract word from a line between a specific position and the next delimiter Java

I have a text file which contain many lines, every line contain many words separated by delimiter like "hello,world,I,am,here".
I want to extract some words between position and delimiter for example:
the position is 7 so the string is "world" and if the position was 1 the string will be "hello"
I would recommend using the split() method. With commas delimiting the words you would do this:
String[] words = "hello,world,I,am,here".split(",");
Then you can get the words by position by indexing into the array:
words[3] // would yield "am"
Note that the parameter to split() is a regular expression, so if you aren't familiar with them see the docs here (or google for a tutorial).
Just implement the following code while taking advantage of the method split() that can be used an all Strings objects :
String line = "hello,world,I,am,here";
String[] words = line.split(",");
public static String wordAtPosition(String line, int position) {
String[] words = line.split(",");
int index = 0;
for (String word : words) {
index += word.length();
if (position < index) {
return word;
}
}
return null;
}
Example
String line = "hello,world,I,am,here";
String word = wordAtPosition(line, 7);
System.out.println(word); // prints "world"
First get the substring , then split and get first element from Array.
public class Test {
public static void main(String[] args) throws ParseException {
Test test = new Test();
String t = test.getStringFromLocation("hello,world,I,am,here", 1, ",");
System.out.println(t);
t = test.getStringFromLocation("hello,world,I,am,here", 7, ",");
System.out.println(t);
t = test.getStringFromLocation("hello,world,I,am,here", 6, ",");
System.out.println(t);
}
public String getStringFromLocation(final String input, int position,
String demlimter) {
if (position == 0) {
return null;
}
int absoulutionPosition = position - 1;
String[] value = input.substring(absoulutionPosition).split(demlimter);
return value.length > 0 ? value[0] : null;
}
}
Not the most readable solution but covers corner cases. The split solutions are nice but does not reflect the position in the original string since it skips the ',' from the count
String line = "hello,world,I,am,here";
int position = new Random().nextInt(line.length());
int startOfWord = -1;
int currentComa = line.indexOf(",", 0);
while (currentComa >= 0 && currentComa < position) {
startOfWord = currentComa;
currentComa = line.indexOf(",", currentComa + 1);
}
int endOfWord = line.indexOf(",", position);
if(endOfWord < 0) {
endOfWord = line.length();
}
String word = line.substring(startOfWord + 1, endOfWord);
System.out.println("position " + position + ", word " + word);

Java Unicode Characters

I'm familiar with problems with ascii. The problem is I have no experience with same problems in unicode characters. For example, how to return the word that occurs most frequently given a string array containing words? Thanks in advance!
p.s.: You can always use an array which length is "256" to represent all the characters in ASCII while you can't do that when it comes to unicode. Is HashMap a must and the best way to solve the problem? I heard that there are better ways to solve it. Below is what I can think of:
String str = "aa df ds df df"; // assume they are Unicode
String[] words = str.split(" ");
HashMap<String, Integer> map = new HashMap<String, Integer>();
for (String word : words){
if (map.containsKey(word)){
int f = map.get(word);
map.put(word, f+1);
} else{
map.put(word, 1);
}
}
int max = 0;
String maxWord = "";
for (String word : words){
int f = map.get(word);
if (f > max){
max = f;
maxWord = word;
}
}
System.out.println(maxWord + " " +max);
// Inspired by GameKyuubi. It can be solved using array sort and count the most frequently used word using constatnt space.
Arrays.sort(words);
int max = 0;
int count = 0;
String maxWord = "";
String prev = "";
for (String word : words){
if (prev.equals("") || word.equals(prev)){
count++;
} else{
count = 1;
}
if (max < count){
max = count;
maxWord = word;
}
prev = word;
}
System.out.println(maxWord + " " +max);

How do you find words in a text file and print the most frequent word shown using array?

I'm having trouble of figuring out how to find the most frequent word and the most frequent case-insensitive word for a program. I have a scanner that reads through the text file and a while loop, but still doesn't know how to implement what I'm trying to find. Do I use a different string function to read and print the word out?
Here is my code as of now:
public class letters {
public static void main(String[] args) throws FileNotFoundException {
FileInputStream fis = new FileInputStream("input.txt");
Scanner scanner = new Scanner(fis);
String word[] = new String[500];
while (scanner.hasNextLine()) {
String s = scanner.nextLine();
for (int i = 0; i < s.length(); i++) {
char ch = s.charAt(i);
}
}
String []roll = s.split("\\s");
for(int i=0;i<roll.length;i++){
String lin = roll[i];
//System.out.println(lin);
}
}
This is what I have so far. I need the output to say:
Word:
6 roll
Case-insensitive word:
18 roll
And here is my input file:
#
roll tide roll!
Roll Tide Roll!
ROLL TIDE ROLL!
ROll tIDE ROll!
roll tide roll!
Roll Tide Roll!
ROLL TIDE ROLL!
roll tide roll!
Roll Tide Roll !
#
65-43+21= 43
65.0-43.0+21.0= 43.0
65 -43 +21 = 43
65.0 -43.0 +21.0 = 43.0
65 - 43 + 21 = 43
65.00 - 43.0 + 21.000 = +0043.0000
65 - 43 + 21 = 43
I just need it to find the most occuring word(Which is the maximal consecutive sequence of letters)(which is roll) and print out how many times it is located(which is 6) . If anybody can help me on this, that would be really great! thanks
Consider using a Map<String,Integer> for the word then you can implement this to count words and will be work for any number of words. See Documentation for Map.
Like this (would require modification for case insensitive)
public Map<String,Integer> words_count = new HashMap<String,Integer>();
//read your line (you will have to determine if this line should be split or is equations
//also just noticed that the trailing '!' would need to be removed
String[] words = line.split("\\s+");
for(int i=0;i<words.length;i++)
{
String s = words[i];
if(words_count.ketSet().contains(s))
{
Integer count = words_count.get(s) + 1;
words_count.put(s, count)
}
else
words_count.put(s, 1)
}
Then you have the number of occurrences for each word in the string and to get the most occurring do something like
Integer frequency = null;
String mostFrequent = null;
for(String s : words_count.ketSet())
{
Integer i = words_count.get(s);
if(frequency == null)
frequency = i;
if(i > frequency)
{
frequency = i;
mostFrequent = s;
}
}
Then to print
System.out.println("The word "+ mostFrequent +" occurred "+ frequency +" times");
Start with accumulating all the words into a Map as follows:
...
String[] roll = s.split("\\s+");
for (final String word : roll) {
Integer qty = words.get(word);
if (qty == null) {
qty = 1;
} else {
qty = qty + 1;
}
words.put(word, qty);
}
...
Then you need to figure out which has the biggest score:
String bestWord;
int maxQty = 0;
for(final String word : words.keySet()) {
if(words.get(word) > maxQty) {
maxQty = words.get(word);
bestWord = word;
}
}
System.out.println("Word:");
System.out.println(Integer.toString(maxQty) + " " + bestWord);
And last you need to merge all forms of the same word together:
Map<String, Integer> wordsNoCase = new HashMap<String, Integer>();
for(final String word : words.keySet()) {
Integer qty = wordsNoCase.get(word.toLowerCase());
if(qty == null) {
qty = words.get(word);
} else {
qty += words.get(word);
}
wordsNoCase.put(word.toLowerCase(), qty);
}
words = wordsNoCase;
Then re-run the previous code snippet to find the word with the biggest score.
Try to use HashMap for better results. You need to use BufferedReader and Filereader for taking input file as follows:
FileReader text = new FileReader("file.txt");
BufferedReader textFile = new BufferedReader(text);
The Bufferedreader object textfile needs to passed as a parameter to the method below:
public HashMap<String, Integer> countWordFrequency(BufferedReader textFile) throws IOException
{
/*This method finds the frequency of words in a text file
* and saves the word and its corresponding frequency in
* a HashMap.
*/
HashMap<String, Integer> mapper = new HashMap<String, Integer>();
StringBuffer multiLine = new StringBuffer("");
String line = null;
if(textFile.ready())
{
while((line = textFile.readLine()) != null)
{
multiLine.append(line);
String[] words = line.replaceAll("[^a-zA-Z]", " ").toLowerCase().split(" ");
for(String word : words)
{
if(!word.isEmpty())
{
Integer freq = mapper.get(word);
if(freq == null)
{
mapper.put(word, 1);
}
else
{
mapper.put(word, freq+1);
}
}
}
}
textFile.close();
}
return mapper;
}
The line line.replaceAll("[^a-zA-Z]", " ").toLowerCase().split(" "); is used for replacing all the characters other than alphabets, the it makes all the words in lower case (which solves your case insensitive problem) and then splits the words seperated by spaces.
/*This method finds the highest value in HashMap
* and returns the same.
*/
public int maxFrequency(HashMap<String, Integer> mapper)
{
int maxValue = Integer.MIN_VALUE;
for(int value : mapper.values())
{
if(value > maxValue)
{
maxValue = value;
}
}
return maxValue;
}
The above code returns that value in hashmap which is highest.
/*This method prints the HashMap Key with a particular Value.
*/
public void printWithValue(HashMap<String, Integer> mapper, Integer value)
{
for (Entry<String, Integer> entry : mapper.entrySet())
{
if (entry.getValue().equals(value))
{
System.out.println("Word : " + entry.getKey() + " \nFrequency : " + entry.getValue());
}
}
}
Now you can print the most frequent word along with its frequency as above.
/* i have declared LinkedHashMap containing String as a key and occurrences as a value.
* Creating BufferedReader object
* Reading the first line into currentLine
* Declere while-loop & splitting the currentLine into words
* iterated using for loop. Inside for loop, i have an if else statement
* If word is present in Map increment it's count by 1 else set to 1 as value
* Reading next line into currentLine
*/
public static void main(String[] args) {
Map<String, Integer> map = new LinkedHashMap<String, Integer>();
BufferedReader reader = null;
try {
reader = new BufferedReader(new FileReader("F:\\chidanand\\javaIO\\Student.txt"));
String currentLine = reader.readLine();
while (currentLine!= null) {
String[] input = currentLine.replaceAll("[^a-zA-Z]", " ").toLowerCase().split(" ");
for (int i = 0; i < input.length; i++) {
if (map.containsKey(input[i])) {
int count = map.get(input[i]);
map.put(input[i], count + 1);
} else {
map.put(input[i], 1);
}
}
currentLine = reader.readLine();
}
String mostRepeatedWord = null;
int count = 0;
for (Entry<String, Integer> m:map.entrySet())
{
if(m.getValue() > count)
{
mostRepeatedWord = m.getKey();
count = m.getValue();
}
}
System.out.println("The most repeated word in input file is : "+mostRepeatedWord);
System.out.println("Number Of Occurrences : "+count);
} catch (IOException e) {
e.printStackTrace();
} finally {
try {
reader.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
}

Categories