Java - Word Frequency - java

I've created a Java program in Eclipse. The program counts the frequency of each word. For example if the user entered 'I went to the shop' the program would produce the output '1 1 1 2' that is 1 word of length 1 ('I') 1 word of length 2 ('to') 1 word of length 3 ('the') and 2 words of length 4 ('went' , 'shop').
These are the results I'm getting. I don't want the output with a 0 to be shown. How can I hide these and only have the results with 1,2,3,4,5 shown.
The cat sat on the mat
words[1]=0
words[2]=1
words[3]=5
words[4]=0
words[5]=0
import java.util.Scanner;
import java.io.*;
public class mallinson_Liam_8
{
public static void main(String[] args) throws Exception
{
Scanner scan = new Scanner(new File("body.txt"));
while(scan.hasNext())
{
String s;
s = scan.nextLine();
String input = s;
String strippedInput = input.replaceAll("\\W", " ");
System.out.println("" + strippedInput);
String[] strings = strippedInput.split(" ");
int[] counts = new int[6];
int total = 0;
String text = null;
for (String str : strings)
if (str.length() < counts.length)
counts[str.length()] += 1;
for (String s1 : strings)
total += s1.length();
for (int i = 1; i < counts.length; i++){
System.out.println("words["+ i + "]="+counts[i]);
StringBuilder sb = new StringBuilder(i).append(i + " letter words: ");
for (int j = 1; j <= counts[i]; j++) {
}}}}}

I know you asked for Java, but just for comparison, here is how I'd do it in Scala:
val s = "I went to the shop"
val sizes = s.split("\\W+").groupBy(_.length).mapValues(_.size)
// sizes = Map(2 -> 1, 4 -> 2, 1 -> 1, 3 -> 1)
val sortedSizes = sizes.toSeq.sorted.map(_._2)
// sortedSizes = ArrayBuffer(1, 1, 1, 2)
println(sortedSizes.mkString(" "))
// outputs: 1 1 1 2

Simply add a check before you print...
for (int i = 1; i < counts.length; i++) {
if (counts[i] > 0) { //filter out 0-count lengths
System.out.println("words["+ i + "]="+counts[i]);
}

Add an if-statement that checks if the number of words of length 'i' is equal to 0.
If that is true, don't show it, if it is not, show it.
for (int i =0; i < counts.length; i++) {
if (counts[i] != 0) {
System.out.println("words[" + i + "]="+counts[i]);
}
}
Edit:
bbill beat me to it. Our answers both work.

I'd use the Java8 streaming API.
See my example:
// import java.nio.file.*;
import java.util.*;
import java.util.stream.Collectors;
public class CharacterCount {
public static void main(String[] args) {
// define input
String input = "I went to the shop";
// String input = new String(Files.readAllBytes(Paths.get("body.txt")));
// calculate output
String output =
// split input by whitespaces and other non-word-characters
Arrays.stream(input.split("\\W+"))
// group words by length of word
.collect(Collectors.groupingBy(String::length))
// iterate over each group of words
.values().stream()
// count the words for this group
.map(List::size)
// join all values into one, space separated string
.map(Object::toString).collect(Collectors.joining(" "));
// print output to console
System.out.println(output);
}
}
It outputs:
1 1 1 2

Related

How to find all consecutive letters with count from string provided by user?

I am trying to write a code in Java which will find all the consecutive letters in string provided by user and also provide its count.
For example:
User has provided string: "aaastt rr".
I am expecting the result as below:
a - 3
t - 2
r - 2
I have written below code as per my understanding but not getting the result as expected.
import java.util.Scanner;
public class ConsecutiveCharacters {
public static void main(String[] args) {
Scanner sc = new Scanner(System.in);
System.out.println("Enter string: ");
char s[] = sc.nextLine().toCharArray();
int count = 1;
for(int i =0;i<s.length-1;i++){
if(s[i]==s[i+1]){
count++;
System.out.println(s[i] + "-" + count);
}
}
}
}
I am getting result:
a-2
a-3
t-4
r-5
which is not I am expecting.
Please have a look, and let me know where I am missing.
Many Thanks in advance.
You are never resetting your counter when you run into a new character within the array.
Use the starting character and increment as you go and change the character whenever a new one is found and only print the previous char and count if the count is greater than 1. Note the edge case where the last of the characters are consecutive.
Scanner sc = new Scanner(System.in);
System.out.println("Enter string: ");
char s[] = sc.nextLine().toCharArray();
HashMap<Character, Integer> charsFound = new HashMap<>();
int count = 1;
char c = s[0];
for(int i = 1;i < s.length; i++)
{
//check the edge case where the last of the array is consecutive chars
if(c==s[i] && count >= 1 && s.length - 1 == i)
{
if(!charsFound.containsKey(c))
charsFound.put(c, ++count);
else if(charsFound.get(c) < ++count)
charsFound.put(c, count);
}
//increment the count if the character is the same one
else if(c==s[i])
{
count++;
}
//consecutive chain is broken, reset the count and our current character
else
{
if(count > 1)
{
if(!charsFound.containsKey(c))
charsFound.put(c, count);
else if(charsFound.get(c) < count)
charsFound.put(c, count);
}
//reset your variables for a new character
c = s[i];
count = 1;
}
}
for (char knownCharacters : charsFound.keySet())
if (charsFound.get(knownCharacters) > 1)
System.out.println(knownCharacters + "-" + charsFound.get(knownCharacters));
Output
Enter string:
aabbbt s.r r rr
a-2
b-3
r-2
Enter string:
aaastt rr
a-3
t-2
r-2
Enter string:
aayy t t t.t ty ll fffff
a-2
y-2
l-2
f-5
Enter string:
aa b aa c aaaaa
a-5

How do I get each number a user inputs as set of numbers in string with each number separated by a space?

For example, the user enters "1 2 3 4", how do I extract those four numbers and put them into separate spots in an array?
I'm just a beginner so please excuse my lack of knowledge.
for (int i = 0; i < students; i++) {
scanner.nextLine();
tempScores[i] = scanner.nextLine();
tempScores[i] = tempScores[i] + " ";
tempNum = "";
int scoreCount = 0;
for (int a = 0; a < tempScores[i].length(); a++) {
System.out.println("Scorecount " + scoreCount + " a " + a );
if (tempScores[i].charAt(a) != ' ') {
tempNum = tempNum + tempScores[i].charAt(a);
} else if (tempScores[i].charAt(a) == ' ') {
scores[scoreCount] = Integer.valueOf(tempNum);
tempNum = "";
scoreCount ++;
}
}
You can use String.split(String) which takes a regular expression, \\s+ matches one or more white space characters. Then you can use Integer.parseInt(String) to parse the String(s) to int(s). Finally, you can use Arrays.toString(int[]) to display your int[]. Something like
String line = "1 2 3 4";
String[] tokens = line.split("\\s+");
int[] values = new int[tokens.length];
for (int i = 0; i < tokens.length; i++) {
values[i] = Integer.parseInt(tokens[i]);
}
System.out.println(Arrays.toString(values));
Outputs
[1, 2, 3, 4]
If you are very sure that the numbers will be separated by space then you could just use the split() method in String like below and parse individually :
String input = sc.nextLine(); (Use an sc.hasNextLine() check first)
if (input != null || !input.trim().isEmpty()) {
String [] numStrings = input.split(" ");
// convert the numbers as String to actually numbers by using
Integer.parseInt(String num) method.
}

Digit Frequency In A String

I am supposed to do this :
For an input number print frequency of each number in the order of its occurrence.For eg :
Input:56464
Output:
Number-Frequency
5 -1
6 -2
4 -2
I cannot use any other libraries except java.lang and Scanner to input
So I tried this :
package practice2;
import java.util.Scanner;
public class DigitFrequency2
{
private static Scanner sc;
public static void main(String[] args)
{
sc = new Scanner(System.in);
System.out.println("Enter an integer number");
String sb = sc.nextLine();
System.out.println("Number\tFrequency");
int i,x,c = 0;
for(i=0;i<sb.length();i++)
{
c = 0;
for(x = i+1;x<sb.length();x++)
{
if(sb.charAt(i) == sb.charAt(x) && sb.charAt(i) != '*' && sb.charAt(x) != '*')
{
c++;
sb.replace(sb.charAt(x),'*');
}
}
if(c>0)
{
System.out.println(sb.charAt(i)+" \t"+c);
}
}
}
}
Number Frequency
6 1
4 1
Where am I going wrong please help.
Simple way is this. Won't bother commenting as it is clear whats going on.
Scanner in = new Scanner(System.in);
while (true) {
System.out.print("Input String: ");
String line = in.nextLine();
while (!line.isEmpty()) {
char c = line.charAt(0);
int length = line.length();
line = line.replace(String.valueOf(c), "");
System.out.println(c + " " + (length - line.length()));
}
}
There are few problems with sb.replace(sb.charAt(x),'*');:
replace replaces all characters, not just first one which is why your c can't be grater than 1.
Strings are immutable so since replace can't edit original string, it returns new one with replaced characters which you can store back in sb reference.
Anyway if you would be able to use other Java resources beside java.lang.* or java.util.Scanner simple approach would be using Map which will map character with number of its occurrences. Very helpful here is merge method added in Java 8 allows us to pass key initialValue combination of old and new value
So your code can look like:
String sb = ...
Map<Character, Integer> map = new TreeMap<>();
for (char ch : sb.toCharArray()) {
map.merge(ch, 1, Integer::sum);
}
map.forEach((k, v) -> System.out.println(k + "\t" + v));
Problem is that as mentioned, String is immutable, so String.replace() just returns a new string and it does not (cannot) modify the original. Either you should use StringBuilder, or store the returned value (e.g. sb = sb.replace(sb.charAt(x),'*');).
Going further, since you initialize c with 0, it will stay 0 if there is no other occurrence of the character in question (sb.charAt(i)), so your algorithm won't detect and print digits that occur only once (because later you only print if c > 0).
Counting occurrences (frequency) of characters or digits in a string is a simple operation, it does not require to create new strings and it can be done by looping over the characters only once.
Here is a more efficient solution (one of the fastest). Since digits are in the range '0'..'9', you can create an array in which you count the occurrences, and by looping over the characters only once. No need to replace anything. Order of occurrence is "remembered" in another order char array.
char[] order = new char[10];
int[] counts = new int[10];
for (int i = 0, j = 0; i < sb.length(); i++)
if (counts[sb.charAt(i) - '0']++ == 0)
order[j++] = sb.charAt(i); // First occurrence of the digit
And print in order, until the order array is filled:
System.out.println("Number\tFrequency");
for (int i = 0; order[i] != 0; i++)
System.out.println(order[i] + "\t" + counts[order[i] - '0']);
Example output:
Enter an integer number
56464
Number Frequency
5 1
6 2
4 2
For completeness here's the complete main() method:
public static void main(String[] args) {
System.out.println("Enter an integer number");
String sb = new Scanner(System.in).nextLine();
char[] order = new char[10];
int[] counts = new int[10];
for (int i = 0, j = 0; i < sb.length(); i++)
if (counts[sb.charAt(i) - '0']++ == 0)
order[j++] = sb.charAt(i); // First occurrence of the digit
System.out.println("Number\tFrequency");
for (int i = 0; order[i] != 0; i++)
System.out.println(order[i] + "\t" + counts[order[i] - '0']);
}
Note:
If you would want to make your code safe against invalid inputs (that may contain non-digits), you could use Character.isDigit(). Here is only the for loop which is safe against any input:
for (int i = 0, j = 0; i < sb.length(); i++) {
char ch = sb.charAt(i);
if (Character.isDigit(ch)) {
if (counts[ch - '0']++ == 0)
order[j++] = ch; // First occurrence of ch
}
}
This should be a good code to print frequency using user input:
public static void main(String args[])
{
System.out.println("Please enter numbers ");
String time = in.nextLine(); //USER INPUT
time = time.replace(":", "");
char digit[] = {time.charAt(0), time.charAt(1), time.charAt(2), time.charAt(3)};
int[] count = new int[digit.length];
Arrays.sort(digit);
for (int i = 0; i < digit.length; i++)
{
count[i]++;
if (i + 1 < digit.length)
{
if (digit[i] == digit[i + 1])
{
count[i]++;
i++;
}
}
}
for (int i = 0; i < digit.length; i++)
{
if (count[i] > 0)
{
System.out.println(digit[i] + " appears " + count[i]+" time(s)");
}
}
}

Java I want to print the number of characters in a string

I am working on printing the number of characters taken from a users input. So lets say the user enters here is a random test which totals 17 characters. Here is what I have thus far only printing the words in separate lines.
import java.text.*;
import java.io.*;
public class test {
public static void main (String [] args) throws IOException {
BufferedReader input = new BufferedReader(new InputStreamReader(System.in));
String inputValue;
inputValue = input.readLine();
String[] words = inputValue.split("\\s+");
for (int i = 0; i < words.length; i++) {
System.out.println(words[i]);
}
}
}
str.replaceAll("\\s+","");removes all whitespaces in str and assigns the resultant string to str
str.length() returns number of characters in String str
So when you get the input from user, do this
inputValue=inputValue.replaceAll("\\s+","");
System.out.println(inputValue.length());
Change your for...loop to this:
int total = 0;
for (int i = 0; i < words.length; i++) {
total += words[i].length();
}
System.out.println(total);
Essentially, we're looping through the array of words, getting each word's length, then adding that number of characters to the total counter.
I think we can avoid iteration over words length if we assume, string is separated by blanks only. Here is an example:
public static void main(String args[]) {
String test = "here is a random test";
String[] array = test.split("\\s+");
int size = array.length > 0 ? (test.length() - array.length + 1) : test.length();
System.out.println("Size:" + size);
}
to get the total count you have to assign each words count to a variable. Then print it after for loop.
int count =0;
for (int i = 0; i < words.length; i++) {
count = count + words[i].length();
}
System.out.println(count );
Fewmodification done,
length printed for words and user input.
public static void main(String[] args) throws IOException {
BufferedReader input = new BufferedReader(new InputStreamReader(System.in));
String inputValue;
inputValue = input.readLine();
String[] words = inputValue.split("\\s+");
System.out.println("Length of user input = " + inputValue.length());
for (int i = 0 ; i < words.length ; i++) {
System.out.println(words[i]);
System.out.println("Length of word = " + words[i].length());
}
}
Output
here is a random test
Length of user input = 21
here
Length of word = 4
is
Length of word = 2
a
Length of word = 1
random
Length of word = 6
test
Length of word = 4
You can do something like this if you are concerned only with whitespaces:
inputValue = input.readLine();
int len = inputValue.replaceAll(" ", "").length(); //replacing won't effect the original string and will also replace spaces.
System.out.println(len);
System.out.println(inputValue);
so the o/p would be for sample you provided:
17
here is a random test

Word count from text

This is my code to work out the length of a word:
public class WordCount {
public static void main (String args []) {
String text;
text = "Java";
System.out.println (text);
//Work out the length
String [] input = text.split(" ");
int MaxWordLength = 0;
int WordLength = 0;
for (int i = 0; i < input.length; i++)
{
MaxWordLength = input[i].length();
WordLength = MaxWordLength;
} //End of working out length
//Work out no. of words
int[] intWordCount = new int[WordLength + 1];
for(int i = 0; i < input.length; i++) {
intWordCount[input[i].length()]++; }
for (int i = 1; i < intWordCount.length; i++) {
System.out.println("There are " + intWordCount[i] + " words of length " + MaxWordLength);
}
}
}
The problem I am having is that when it prints out the length of the word, I get these results:
Java
There are 0 words of length 4
There are 0 words of length 4
There are 0 words of length 4
There are 1 words of length 4
But when I change the text to "J" this prints out:
J
There are 1 words of length 1
Any idea why it's doing that?
P.S. I'm kind of new to Java and any help would be appreciated.
I am not sure if you want to count letter or word because your code counts letter to me.
Just you need to change this line from
String [] input = text.split(" ");
to
String [] input = text.split("");
and your program works perfectly.
input: Java
output: There are 4 letters of length 1 <- Hope this is the expected result for you
Source: Splitting words into letters in Java
You can achieve this in better and less headache by using Lambda in Java
Code:
import java.util.*;
public class LambdaTest
{
public static void main (String[] args)
{
String[] st = "Hello".split("");
Collection myList = Arrays.asList(st);
System.out.println("your word has " + myList.stream().count() + "letters");
}
}
Output:
your word has 5 letters CLEARLY in length 1
My answer when you cleared what your issue is
Code:
public class WordCount
{
public static void main (String[] args)
{
String text ="";
int wordLenght = 0;
text = "Java is awesome for Me";
System.out.println (text);
String [] input = text.split(" ");
List<Integer> list = new ArrayList<>();
for (int i = 0; i < input.length; i++)
{
list.add(input[i].length());
}
Set<Integer> unique = new HashSet<Integer>(list);
for (Integer length : unique) {
System.out.println("There are " + Collections.frequency(list, length) + " words of length " + length);
}
}
}
output:
There are 2 words of length 2
There are 1 words of length 3
There are 1 words of length 4
There are 1 words of length 7
Note: Read about HashSet and Set in Java
Source: http://javarevisited.blogspot.com/2012/06/hashset-in-java-10-examples-programs.html
Let's walk through this:
public class WordCount {
public static void main (String args []) {
String text;
text = "Java";
text is equal to "Java".
System.out.println (text);
Prints "Java"
//Work out the length
String [] input = text.split(" ");
This splits the string "Java" on spaces, of which there are none. So input (which I'd recommend be renamed to something more indicative, like inputs) is equal to an array of one element, and that one element is equal to "Java".
int MaxWordLength = 0;
int WordLength = 0;
for (int i = 0; i < input.length; i++)
{
MaxWordLength = input[i].length();
For each element, of which there is only one, MaxWordLength is set to the length of the first (and only) element, which is "Java"...whose length is 4.
WordLength = MaxWordLength;
So WordLength is now equal to 4.
} //End of working out length
//Work out no. of words
int[] intWordCount = new int[WordLength + 1];
This creates an int array of [WordLength + 1] elements (which is equal to [4 + 1], or 5), where each is initialized to zero.
for(int i = 0; i < input.length; i++) {
intWordCount[input[i].length()]++; }
For each element in input, of which there is only one, this sets the input[i].length()-th element--the fifth, since input[i] is "Java" and it's length is four--to itself, plus one (because of the ++).
Therefore, after this for loop, the array is now equal to [0, 0, 0, 0, 1].
for (int i = 1; i < intWordCount.length; i++) {
System.out.println("There are " + intWordCount[i] + " words of length " + MaxWordLength);
So this naturally prints the undesired output.
}
}
}
Your output is different when the input is only "J", because the intWordCount array is shortened to input[i].length() elements, which is now 1. But the value of the last element is still set to "itself plus one", and "itself" is initialized to zero (as all int-array elements are), and then incremented by one (with ++).
for (int i = 1; i < intWordCount.length; i++) {
System.out.println("There are " + intWordCount[i] + " words of length " + MaxWordLength);
}
1) You print out words with intWordCount[i] == 0, which is why you have the "There are 0 words of length X"
2) System.out.println("There are " ... + MaxWordLength); should probably be System.out.println("There are " ... + i);, so you have "There are 0 words of length 1" , "There are 0 words of length 2", etc
I know this question has been solved long time ago, but here is another solution using new features of Java 8. Using Java streams the whole exercise can be written in one line:
Arrays.asList(new String[]{"Java my love"}) //start with a list containing 1 string item
.stream() //start the stream
.flatMap(x -> Stream.of(x.split(" "))) //split the string into words
.map((String x) -> x.length()) //compute the length of each word
.sorted((Integer x, Integer y) -> x-y) //sort words length (not necessary)
.collect(Collectors.groupingBy(x -> x, Collectors.counting())) //this is tricky: collect results to a map: word length -> count
.forEach((x,y) -> {System.out.println("There are " + y + " word(s) with " + x + " letter(s)");}); //now print each result
Probably in few year time this would be a preferred method for solving such problems. Anyway it is worth knowing that such alternative exists.
To count words in text with we used Pattern class with while loop:
I. Case Sensitive word counts
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class CountWordsInText {
public static void main(String[] args) {
String paragraph = "I am at office right now."
+ "I love to work at office."
+ "My Office located at center of kathmandu valley";
String searchWord = "office";
Pattern pattern = Pattern.compile(searchWord);
Matcher matcher = pattern.matcher(paragraph);
int count = 0;
while (matcher.find()) {
count++;
}
System.out.println(count);
}
}
II. Case Insensitive word counts
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class CountWordsInTextCaseInsensitive {
public static void main(String[] args) {
String paragraph = "I am at office right now."
+ "I love to work at oFFicE."
+"My OFFICE located at center of kathmandu valley";
String searchWord = "office";
Pattern pattern = Pattern.compile(searchWord, Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher(paragraph);
int count = 0;
while (matcher.find())
count++;
System.out.println(count);
}
}
Idk, but using the length method as much as you have to figure out how the length mechanism works is like defining a word using the word. It's an honorable conquest figuring out how the length method works, but you should probably avoid using the length method.

Categories