I am trying to make a word counter in java. I'm trying to count words by separating them with spaces.
I've managed to get rid of the spaces before or after a sentence with the trim function. However, I haven't been able to adjust for the case that the user types more than one space in between two words. For example, so far the string "hello world" with multiple spaces between hello and world, would output a word count greater than two. This is the code that I have tried so far to fix this problem.
public void countWord(){
String tokens[] = userInput.trim().split(" ");
int counter = tokens.length;
for(int i = 0; i < tokens.length; ++i) {
if(Objects.equals(" ", tokens[i])) {
--counter;
}
}
System.out.printf("Total word count is: %d", counter);
}
As you can see I create a word counting integer that holds the number of tokens created. Then I try and look for a token that only contains " " then decrement the word count by the amount of those strings. However this is not solving my problem.
Try regex to split
userInput.split("\\s+");
You've already split() on spaces, so there will be no more spaces in any of the tokens as split() returns:
the array of strings computed by splitting this string around matches of the given regular expression
(Emphasis mine)
However if there are extra spaces in your String there will be extra tokens, which will throw off the length. Instead use split("\\s+"). Then just return the length of the Array, as split() already will return all the tokens separated by spaces, which will be all the words:
System.out.printf("Total word count is: %d", tokens.length);
Which will print 5 for the test String
"Hello this is a String"
If you are intended to count the words, give a try to one of the followings:
Among those that others mentioned.
Here, this solution uses StringTokenizer.
String words = "The Hello World word counter by using StringTokenizer";
StringTokenizer st = new StringTokenizer(words);
System.out.println(st.countTokens()); // => 8
This way you can take an advantage of regexp by using it to split the string by words
String words = "The Hello World word counter by using regex";
int counter = words.split("\\w+").length;
System.out.println(counter); // => 8
Use Scanner for your own counter method:
public static int counter(String words) {
Scanner scanner = new Scanner(words);
int count = 0;
while(scanner.hasNext()) {
count += 1;
scanner.next();
}
return count;
}
If you want to count the spaces as you said in the title, you can use StringUtils from Commons
int count = StringUtils.countMatches("The Hello World space counter by using StringUtils", " ");
System.out.println(count);
Or if you use Spring the SpringUtils is also available for you.
int count = StringUtils.countOccurrencesOf("The Hello World space counter by using Spring-StringUtils", " ");
System.out.println(count);
I think you can easily fix it by checking if a tokens[i].equals(""). Thus checking if a word is an empty string. Since splitting on a space when using multiple spaces creates empty string objects in the array, this should work.
Why don't you get rid of all occurrences of 2 or more adjacent spaces and then split:
String tokens[] = userInput.trim().replaceAll("\\s+", " ").split(" ");
Related
When I input any sentence, my output returns that any string is a palindrome, and I think that my replaceAll calls aren't working in some cases. This is likely due to error my part, because using the Scanner class in Java is new for me (more used to input from C++ and Python3). I added comments to make it clearer what my intentions were when writing the program.
import java.util.Scanner;
public class PalindromeTest
{
public static void main (String[] args)
{
Scanner stringScan = new Scanner(System.in); //Scanner for strings, avoids reading ints as strings
Scanner intScan = new Scanner(System.in); //Scanner for ints, avoids reading strings as ints
String forwardPal = ""; //Variables for the rest of the program
String reversePal = "";
String trimForward = "";
char tempChar;
int revCount;
int revPalLength;
int quit;
while (true) //Loop to keep the program running, problem is in here
{
System.out.println("Please enter a word or a sentence."); //Prompts user to enter a word or sentence, I assume that the program is counting
forwardPal = stringScan.nextLine();
trimForward = forwardPal.replaceAll(" " , ""); //Trims the forwardPal string of characters that are not letters
trimForward = trimForward.replaceAll("," , "");
trimForward = trimForward.replaceAll("." , "");
trimForward = trimForward.replaceAll("!", "");
trimForward = trimForward.replaceAll(":", "");
trimForward = trimForward.replaceAll(";", "");
revPalLength = trimForward.length() ; //Makes the reverse palindrome length equal to the length of the new trimmed string entered
for (revCount = revPalLength - 1; revCount >= 0; revCount--) //Loop to count the reverse palindrome and add each character to the string reversePal iteratively
{
tempChar = trimForward.charAt(revCount);
reversePal += tempChar;
System.out.println(reversePal);
}
if (trimForward.equalsIgnoreCase(reversePal)) //Makes sure that the palindrome forward is the same as the palindrome backwards
{
System.out.println("Congrats, you have a palindrome"); //Output if the sentence is a palindrome
}
else
{
System.out.println("Sorry, that's not a palindrome"); //Output if the sentence isn't a palindrome
}
System.out.println("Press -1 to quit, any other number to enter another sentence."); //Loops to ask if the user wants to continue
quit = intScan.nextInt(); //Checks if the user input a number
if (quit == -1) //If the user inputs -1, quit the program and close the strings
{
stringScan.close();
intScan.close();
break;
}
}
}
}
Your problem is this line
trimForward = trimForward.replaceAll("." , "");
That function takes the first argument as a regex, and the dot means that it is replacing all characters as a "".
Instead, use
trimForward = trimForward.replace("." , "");
In fact, all your lines should be using #replace instead of #replaceAll as none of them take advantage of regex. Only use those if you plan to take advantage of it.
Or in fact, if you do want to use a regex, this is a nice one which does all of that in one neat line.
trimForward = forwardPal.replaceAll("[ .!;:]" , "");
I hope this was of some help.
The simplest fix is to use replace() instead of replaceAll().
replace() replaces all occurences of the given plain text.
replaceAll() replaces all occurences of the given regex.
Since some of your replacements have special meaning in regex (specifically the dot ., which means any character), you cant use replaceAll() as you are currently (you'd have to escape the dot).
It's common, and quite reasonable, to assume that replace() replaces one occurrence and replaceAll() replaces all occurrences. They are poorly named methods, because they interpret their parameters differently, yet both replace all matches.
As an aside, you may find this briefer solution of interest:
forwardPal = forwardPal.replaceAll("\\W", ""); // remove non-word chars
boolean isPalindrome = new StringBuilder(forwardPal).reverse().toString().equals(forwardPal);
Say i have a simple sentence as below.
For example, this is what have:
A simple sentence consists of only one clause. A compound sentence
consists of two or more independent clauses. A complex sentence has at
least one independent clause plus at least one dependent clause. A set
of words with no independent clause may be an incomplete sentence,
also called a sentence fragment.
I want only first 10 words in the sentence above.
I'm trying to produce the following string:
A simple sentence consists of only one clause. A compound
I tried this:
bigString.split(" " ,10).toString()
But it returns the same bigString wrapped with [] array.
Thanks in advance.
Assume bigString : String equals your text. First thing you want to do is split the string in single words.
String[] words = bigString.split(" ");
How many words do you like to extract?
int n = 10;
Put words together
String newString = "";
for (int i = 0; i < n; i++) { newString = newString + " " + words[i];}
System.out.println(newString);
Hope this is what you needed.
If you want to know more about regular expressions (i.e. to tell java where to split), see here: How to split a string in Java
If you use the split-Method with a limiter (yours is 10) it won't just give you the first 10 parts and stop but give you the first 9 parts and the 10th place of the array contains the rest of the input String. ToString concatenates all Strings from the array resulting in the whole input String. What you can do to achieve what you initially wanted is:
String[] myArray = bigString.split(" " ,11);
myArray[10] = ""; //setting the rest to an empty String
myArray.toString(); //This should give you now what you wanted but surrouned with array so just cut that off iterating the array instead of toString or something.
This will help you
String[] strings = Arrays.stream(bigstring.split(" "))
.limit(10)
.toArray(String[]::new);
Here is exactly what you want:
String[] result = new String[10];
// regex \s matches a whitespace character: [ \t\n\x0B\f\r]
String[] raw = bigString.split("\\s", 11);
// the last entry of raw array is the whole sentence, need to be trimmed.
System.arraycopy(raw, 0, result , 0, 10);
System.out.println(Arrays.toString(result));
I'm a Java beginner, so please bear with me if this is an extremely easy answer.
Say I have code that looks like this:
String str;
String [] splits;
str = "The words never line up in such a way ";
splits = str.split(" ");
for (int i = 0; i < splits.length; i++)
System.out.println(splits[i]);
What does Java do at the end of the string? After "way" there is a space; since there is no value after the space does Java decide not to split again?
Thanks so much!
According to the Java documentation for split(), http://docs.oracle.com/javase/7/docs/api/java/lang/String.html#split(java.lang.String),
The split(String r) is equivalent to the split(String r, 0) method, which will ignore and not include any blank trailing empty strings. Specifically from the docs:
"Trailing empty strings are therefore not included in the resulting
array."
So the last element in the array after the split will be "way"
You can confirm this by executing the code you mentioned.
You will not get any trailing space after delimiter if you use split method. Example
class Main
{
public static void main (String[] args)
{
String str;
String [] splits;
str = "The words never line up in such a way "; // some empty string after delimiter at end
splits = str.split(" ");
for (int i = 0; i < splits.length; i++)
System.out.println(splits[i]);
System.out.println("END");
}
}
OUTPUT
The
words
never
line
up
in
such
a
way
END
see no splitted string for end delimiters.
Now
class Main
{
public static void main (String[] args)
{
String str;
String [] splits;
str = "The words never line up in such a way yeah";
splits = str.split(" ");
for (int i = 0; i < splits.length; i++)
System.out.println(splits[i]);
System.out.println("END");
}
}
OUTPUT
The
words
never
line
up
in
such
a
way
yeah
END
see an extra string after delimiter which is also a empty string but not the trailing, so it will be in the array.
I´ve been looking at javadoc and here what it says about String.split:
This method works as if by invoking the two-argument split method with the given expression and a limit argument of zero. Trailing empty strings are therefore not included in the resulting array.
It seems that this method calls .split with two arguments:
The limit parameter controls the number of times the pattern is
applied and therefore affects the length of the resulting array. If
the limit n is greater than zero then the pattern will be applied at
most n - 1 times, the array's length will be no greater than n, and
the array's last entry will contain all input beyond the last matched
delimiter. If n is non-positive then the pattern will be applied as
many times as possible and the array can have any length. If n is zero
then the pattern will be applied as many times as possible, the array
can have any length, and trailing empty strings will be discarded.
thanks
I am trying to write a regular expression that will count the number of times two words co-occur within a certain proximity (within 5 words of each other) in a string, without double counting words.
For example, if I had a string:
"The man liked his big hat. The hat was very big."
In this case, the regex should see the "big hat" in the first sentence and the "hats are big" in the second sentence, returning a total of 2. Note that in the second sentence, there are several words between "hat" and "big", they also appear in a different order than the first sentence, but they still occur within a 5-word window.
If regular expressions are not the correct way to approach this problem, please let me know what I should try instead.
A bit like Stephen C but using library classes to assist in the mechanics.
String input = "The man liked his big hat. The hat was very big";
int proximity = 5;
// split input into words
String[] words = input.split("[\\W]+");
// create a Deque of the first <proximity> words
Deque<String> haystack = new LinkedList<String>(Arrays.asList(Arrays.copyOfRange(words, 0, proximity)));
// count duplicates in the first <proximity> words
int count = haystack.size() - new HashSet<String>(haystack).size();
System.out.println("initial matches: " + count);
// process the rest of the words
for (int i = proximity; i < words.length; i++) {
String word = words[i];
System.out.println("matching '" + word + "' in [" + haystack + "]");
if (haystack.contains(word)) {
System.out.println("matched word " + word + " at index " + i);
count++;
}
// remove the first word
haystack.removeFirst();
// add the current word
haystack.addLast(word);
}
System.out.println("total matches:" + count);
If regular expressions are not the correct way to approach this problem, please let me know what I should try instead.
Regexes might work, but they are not the best way to do this.
A better way to do this is to break the input string into a sequence of words (e.g. using String.split(...)) and then loop through the sequence something like this:
String[] words = input.split("\\s");
int count = 0;
for (int i = 0; i < words.length; i++) {
if (words[i].equals("big")) {
for (int j = i + 1; j < words.length && j - i < 5; j++) {
if (words[j].equals("hat")) {
count++;
}
}
}
}
// And repeat for "hat" followed by "big".
You may need to vary that depending on exactly what you are trying to count, but that's the general idea.
If you need to do this for many, many combinations of words, then it would be worth looking for a more efficient solution. But as a once-off or low volume use-case, simplest is best.
Gee... all that code in the other answers... how about this one line solution:
int count = input.split("big( \\b.*?){1,5}hat").length + input.split("hat( \\b.*?){1,5}big").length - 2;
This regex will match each occurence of two words co-occur within 5 words of each other
([a-zA-Z]+)(?:[^ ]* ){0,5}\1[^a-zA-Z]
([a-zA-Z]+) will match word if you can etheir match [0-9] in your words you can replace ([a-zA-Z0-9]+).
(?:[^ ]* ){0,5} to match between 0 and 5 words
\1[^a-zA-Z] to match the repetition of your word
Then you can use this with a Pattern and find each occurence of repetited word
Pls I want to keep a count of every word from a file, and this count should not include non letters like the apostrophe, comma, fullstop, question mark, exclamation mark, e.t.c. i.e just letters of the alphabet.
I tried to use a delimiter like this, but it didn't include the apostrophe.
Scanner fileScanner = new Scanner("C:\\MyJavaFolder\\JavaAssignment1\\TestFile.txt");
int totalWordCount = 0;
//Firstly to count all the words in the file without the restricted characters
while (fileScanner.hasNext()) {
fileScanner.useDelimiter(("[.,:;()?!\" \t\n\r]+")).next();
totalWordCount++;
}
System.out.println("There are " + totalWordCount + " word(s)");
//Then later I create an array to store each individual word in the file for counting their lengths.
Scanner fileScanner2 = new Scanner("C:\\MyJavaFolder\\JavaAssignment1\\TestFile.txt");
String[] words = new String[totalWordCount];
for (int i = 0; i < totalWordCount; ++i) {
words[i] = fileScanner2.useDelimiter(("[.,:;()?!\" \t\n\r]+")).next();
}
This doesn't seem to work !
Please how can I go about this ?
Seems to me that you don't want to filter using anything but spaces and end lines. For example the word "they're" would return as two words if you're using a ' to filter your number of words. Here's how you could change your original code to make it work.
Scanner fileScanner = new Scanner(new File("C:\\MyJavaFolder\\JavaAssignment1\\TestFile.txt"));
int totalWordCount = 0;
ArrayList<String> words = new ArrayList<String>();
//Firstly to count all the words in the file without the restricted characters
while (fileScanner.hasNext()) {
//Add words to an array list so you only have to go through the scanner once
words.add(fileScanner.next());//This defaults to whitespace
totalWordCount++;
}
System.out.println("There are " + totalWordCount + " word(s)");
fileScanner.close();
Using the Pattern.compile() turns your string into a regular expression. The '\s' character is predefined in the Pattern class to match all white space characters.
There is more information at
Pattern Documentation
Also, make sure to close your Scanner classes when you're done. This could prevent your second scanner from opening.
Edit
If you want to count the letters per word you can add the following code to the above code
int totalLetters = 0;
int[] lettersPerWord = new int[words.size()];
for (int wordNum = 0; wordNum < words.size(); wordNum++)
{
String word = words.get(wordNum);
word = word.replaceAll("[.,:;()?!\" \t\n\r\']+", "");
lettersPerWord[wordNum] = word.length();
totalLetters = word.length();
}
I have tested this code and it appears to work for me. The replaceAll, according to the JavaDoc uses a regular expression to match so it should match any of those characters and essentially remove it.
The Delimiter is not a regular expression, so with your example it is looking for things split between "[.,:;()?!\" \t\n\r]+"
You can either use regexp instead of the Delimiter
using the regexp class with the group method may be what your looking for.
String pattern = "(.*)[.,:;()?!\" \t\n\r]+(.*)";
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(test);
if (m.find( )) {
System.out.println("Found value: " + m.group(1) );
}
Play with those classes and you will see it is much more similar to what you need
You could try this regex in your delimiter:
fileScanner.useDelimiter(("[^a-zA-Z]|[^\']")).next();
This will use any non-letter character OR non apostrophe as a delimiter. That way your words will include the apostrophe but not any other non-letter character.
Then you'll have to loop through each word and check for apostrophe's and account for them if you want the length to be accurate. You could just remove each apostrophe and the length will match the number of letters in the word, or you could create word objects with their own length fields, so that you can print the word as is, and know the number of letter characters in that word.