Java - Writing a syllable counter based on specifications

Java - Writing a syllable counter based on specifications - java

Specification for a syllable:
Each group of adjacent vowels (a, e, i, o, u, y) counts as one syllable (for example, the "ea" in "real" contributes one syllable, but the "e...a" in "regal" counts as two syllables). However, an "e" at the end of a word doesn't count as a syllable. Also each word has at least one syllable, even if the previous rules give a count of zero.
My countSyllables method:
public int countSyllables(String word) {
int count = 0;
word = word.toLowerCase();
for (int i = 0; i < word.length(); i++) {
if (word.charAt(i) == '\"' || word.charAt(i) == '\'' || word.charAt(i) == '-' || word.charAt(i) == ',' || word.charAt(i) == ')' || word.charAt(i) == '(') {
word = word.substring(0,i)+word.substring(i+1, word.length());
}
}
boolean isPrevVowel = false;
for (int j = 0; j < word.length(); j++) {
if (word.contains("a") || word.contains("e") || word.contains("i") || word.contains("o") || word.contains("u")) {
if (isVowel(word.charAt(j)) && !((word.charAt(j) == 'e') && (j == word.length()-1))) {
if (isPrevVowel == false) {
count++;
isPrevVowel = true;
}
} else {
isPrevVowel = false;
}
} else {
count++;
break;
}
}
return count;
}
The isVowel method which determines if a letter is a vowel:
public boolean isVowel(char c) {
if (c == 'a' || c == 'e' || c == 'i' || c == 'o' || c == 'u') {
return true;
} else {
return false;
}
}
According to a colleague, this should result in 528 syllables when used on this text, but I can seem to get it to equal that and I don't know which of us is correct. Please help me develop my method into the correct algorithm or help show this is correct. Thank you.

One of the problem might be that you call to lover case method on the input, but you do not assign it.
So if you change
word.toLowerCase();
to
word = word.toLowerCase();
will help for sure.

I have just invented a new way to count syllables in Java.
My new library, The Lawrence Style Checker, can be viewed here: https://github.com/troywatson/Lawrence-Style-Checker
I counted your syllables for every word using my program and displayed the results here: http://pastebin.com/LyiBTcbb
With my dictionary method of counting syllables I got: 528 syllables total.
This is the exact number the questioner gave of the correct number of syllables. Yet I still dispute this number for reasons described below:
Strike rate: 99.4% correct
Words wrong: 2 / 337 words
Words wrong and wrong syllable counts: {resinous: 4, aardwolf: 3}
Here is my code:
Lawrence lawrence = new Lawrence();
// Turn the text into an array of sentences.
String sentences = ""
String[] sentences2 = sentences.split("(?<=[a-z])\\.\\s+");
int count = 0;
for (String sentence : sentences2) {
sentence = sentence.replace("-", " "); // split double words
for (String word : sentence.split(" ")) {
// Get rid of punctuation marks and spaces.
word = lawrence.cleanWord(word);
// If the word is null, skip it.
if (word.length() < 1)
continue;
// Print out the word and it's syllable on one line.
System.out.print(word + ",");
System.out.println(lawrence.getSyllable(word));
count += lawrence.getSyllable(word);
}
}
System.out.println(count);
bam!

This should be easily doable with some Regex:
Pattern p = Pattern.compile("[aeiouy]+?\w*?[^e]");
String[] result = p.split(WHAT_EVER_THE_INPUT_IS);
result.length
Please note, that it is untested.

Not a direct answer (and I would give you one if I thought it was constructive, my count is about 238 in the last try) but I will give you a few hints that will be fundamental to creating the answer:
Divide up your problem: Read lines, then split the lines up into words, then count the syllables for each word. Afterwords, count them up for all the lines.
Think about the order of things: first find all the syllables, and count each one by "walking" through the word. Factor in the special cases afterwards.
During design, use a debugger to step through your code. Chances are pretty high you make common mistakes like the toUpperCase() method. Better find those errors, nobody will create perfect code the first time around.
Print to console (advanced users use a log and keep the silenced log lines in the final program). Make sure to mark the println's using comments and remove them from the final implementation. Print things like line numbers and syllable counts so you can visually compare them with the text.
If you have advanced a bit, you may use Matcher.find (regular expressions) using a Pattern to find the syllables. Regular expressions are difficult beasts to master. One common mistake is have them do too much in a go.
This way you can quickly scan the text. One of the things you quickly will find out is that you will have to deal with the numbers in the text. So you need to check if a word is actually a word, otherwise, by your rules, it will have at least a single syllable.
If you have the feeling you are repeating things, like the isVowel and String.contains() methods using the same set of characters, you are probably doing something wrong. Repetition in source code is code smell.
Using regexps, I counted about 238 (in the 4th go), but I haven't really checked each and every syllable (of course).
1 14
2 17
3 17
4 15
5 15
6 14
7 16
8 19
9 17
10 17
11 16
12 19
13 18
14 15
15 18
16 15
17 16
18 17
19 16
20 17
21 17
22 19
23 17
24 16
25 17
26 17
27 16
28 17
29 15
30 17
31 19
32 23
33 0
--- total ---
538

I would strongly suggest that you use Java's String API to its full ability. For example, consider String.split(String regex):
http://docs.oracle.com/javase/7/docs/api/java/lang/String.html#split%28java.lang.String%29
This takes a String, and a regular expression, then returns an array of all the substrings, using your regular expression as a delimeter. If you make your regular expression match all consonants or whitespace, then you will end up with an array of Strings which are either empty (and therefore do not represent a consonant) or a sequence of vowels (which do represent a consonant). Count up the latter, and you will have a solution.
Another alternative which also takes advantage of the String API and regular expressions is replaceAll:
http://docs.oracle.com/javase/7/docs/api/java/lang/String.html#replaceAll%28java.lang.String,%20java.lang.String%29
In this case, you want a regular expression which takes the form [optional anything which isn't a vowel][one or more vowels][optional anything which isn't a vowel]. Run this regular expression on your String, and replace it with a single character (eg "1"). The end result is that each syllable will be replaced by a single character. Then all you need to do is String.length() and you'll know how many syllables you had.
Depending on the requirements of your solution, these may not work. If this is a homework question relating to algorithm design, this is almost certainly not the preferred answer, but it does have the benefit of being concise and makes good use of the built-in (and therefore highly optimized) Java APIs.

private static int countSyllables(String word)
{
//System.out.print("Counting syllables in " + word + "...");
int numSyllables = 0;
boolean newSyllable = true;
String vowels = "aeiouy";
char[] cArray = word.toCharArray();
for (int i = 0; i < cArray.length; i++)
{
if (i == cArray.length-1 && Character.toLowerCase(cArray[i]) == 'e'
&& newSyllable && numSyllables > 0) {
numSyllables--;
}
if (newSyllable && vowels.indexOf(Character.toLowerCase(cArray[i])) >= 0) {
newSyllable = false;
numSyllables++;
}
else if (vowels.indexOf(Character.toLowerCase(cArray[i])) < 0) {
newSyllable = true;
}
}
//System.out.println( "found " + numSyllables);
return numSyllables;
}
Another implementation can be found at below pastebin link:
https://pastebin.com/q6rdyaEd

This is my implementation for counting syllables
protected int countSyllables(String word)
{
// getNumSyllables method in BasicDocument (module 1) and
// EfficientDocument (module 2).
int syllables = 0;
word = word.toLowerCase();
if(word.contains("the ")){
syllables ++;
}
String[] split = word.split("e!$|e[?]$|e,|e |e[),]|e$");
ArrayList<String> tokens = new ArrayList<String>();
Pattern tokSplitter = Pattern.compile("[aeiouy]+");
for (int i = 0; i < split.length; i++) {
String s = split[i];
Matcher m = tokSplitter.matcher(s);
while (m.find()) {
tokens.add(m.group());
}
}
syllables += tokens.size();
return syllables;
}
It works fine for me.

Related

A java string exercise i came across

Look for patterns like "zip" and "zap" in the string -- length-3, starting with 'z' and ending with 'p'. Return a string where for all such words, the middle letter is gone, so "zipXzap" yields "zpXzp"
Here is a solution i got from someone:
public class Rough {
public static void main(String [] args){
StringBuffer mat = new StringBuffer("matziplzdpaztp");
for(int i = 0; i < mat.length() - 2; ++i){
if (mat.charAt(i) == 'z' & mat.charAt(i + 2) == 'p'){
mat.deleteCharAt(i + 1);
}
}
System.out.println(mat);
}
}
But why is it that the for loop condition (i < mat.length() -2) is not (i < mat.length())????

Because in the loop:
if (mat.charAt(i) == 'z' & mat.charAt(i + 2) == 'p'){
// -----------------------------------^^^^^
If i were bound by i < mat.length(), then i + 2 would be out of bounds.

Because you don't have to reach the end of your sentence since your words are at least three letters long.

"2" stands for "the length except the first word",you just need to check all the positions in the string variable , and treat the positions as the first word of the substring , so just ignore the "length of the substring without the first word".
in your case , the length of "z*p" is 3, you just check all the position in the string , and treat the position as z to check something ,so just ignore "*p" ,which has length 2.

mat.length() will give length 14 and if you check for mat.charAt(i + 2) at the end it will give java.lang.StringIndexOutOfBoundsException because the string counts from index 0 not from 1. If you still want to use mat.length() you have to replace the AND '&' operator with short circuit AND '&&' operator in if condition.

Counting special letters with array

So I have a program that counts the number of occurrences of each letter in a string, and for that I use
int[] charAmount = new int[30];
for(int i = 0; i<text.length(); i++){
char sign = text.charAt(i);
int value = sign;
if(value >= 97 && value <= 122){
charAmount[value-97]++; // 97 = 'a'
}
This works fine, but I also need to cover the letters 'æ' (230), 'ø' (248) & 'å' (229). How can I "assign" those three letters to the 26, 27 & 29th index of the charAmount array without using if tests or a switch?
EDIT: The code presented above is not the whole block, I also have a switch for the letters in question, but I am looking for a better solution.
BONUS PROBLEM: When I try to enter a string like "æææ" or something, the value of 'æ' is suddenly 8216. I use a Scanner to read the input.

Try this after your if:
else if (value == 230)
charAmount[26]++;
else if (value == 248)
charAmount[27]++;
else if (value == 229)
charAmount[29]++;
You can also do an array of the chars and their associations, the array will look like this:
spChars_to_Chars =
0 => 96
1 => 97
...
229 => 29
230 => 26
...
248 => 27
And then just do this in your if:
charAmount[spChars_to_Chars[value]]++;

If you can use external libraries, I would rather try using Apache commons. They provide a function to count the matches of a given substring (your characters) in a bigger string:
StringUtils.countMatches(...)

Regex for number range from 0 to 31 excluding preceding zeros

I have written regex from numbers from 0 to 31. It shall not allow preceding zeros.
[0-2]\\d|/3[0-2]
But it also allows preceding zeros.
01 invalid
02 invalid
Can some tell me how to fix this.

You can use the following regex:
^(?:[0-9]|[12][0-9]|3[01])$
See demo
Your regex - [0-2]\\d|/3[0-2] - contains 2 alternatives: 1) [0-2]\\d matches a digit from 0-2 range first and then any 1 digit (with \\d), and 2) /3[0-2] matches /, then 3 and then 1 digit from 0-2 range. What is important is that without anchors (^ and $) this expression will match substrings in longer strings, and will match 01 in 010.
Since there has been some discussion about shorthand classes, here is a version with the shorthand class and here is also an example with matches() that requires full input to match and thus we do not need explicit anchors:
String pttrn = "(?:\\d|[12]\\d|3[01])";
System.out.println("31".matches(pttrn));
See demo
Note that the backslash should be doubled here.

You can try with the following pattern:
^(?:[12]?[0-9]|3[01])$

Just another non-Regex approach with data validation before attempting to convert a String to int. Here we are validating that the data is at least 1 character that is a digit, or the data is 2 characters that are digits and the first character is not a 0.
public static void main(String[] args) throws Exception {
List<String> data = new ArrayList() {{
add("01"); // Bad
add("1A"); // Bad
add("123"); // Bad
add("31"); // Good
add("-1"); // Bad
add("32"); // Bad
add("0"); // Good
add("15"); // Good
}};
for (String d : data) {
boolean valid = true;
if (d.isEmpty()) {
valid = false;
} else {
char firstChar = d.charAt(0);
if ((d.length() == 1 && Character.isDigit(firstChar)) ||
(d.length() == 2 &&
(Character.isDigit(firstChar) && firstChar != '0' &&
Character.isDigit(d.charAt(1))))) {
int myInt = Integer.parseInt(d);
valid = (0 <= myInt && myInt <= 31);
} else {
valid = false;
}
}
System.out.println(valid ? "Valid" : "Invalid");
}
}
Results:
Invalid
Invalid
Invalid
Valid
Invalid
Invalid
Valid
Valid

Another option:
\\b(?:[12]?\\d|3[12])\\b
Demo

This regex does not use none-capturing group:
^(\d|[12]\d|3[01])$
Explanation:
^ - start of line \d - single digit 0-9 or [12]\d - tens
and twenties or 3[01] - thirty and thirty one $ - line end
Java DEMO

It is harder to maintain code with regex in it: see When you should not use Regular Expressions
In order to make your code more maintainable and easier for other developers to jump into and support, maybe you could consider converting your String to an Integer and then testing the value?
if((!inputString.startsWith("0") && inputString.length() == 2) || inputString.length() == 1){
Integer myInt = Integer.parseInt(inputString);
if( 0 <= myInt && myInt <= 31){
//execute logic...
}
}
you could also easily break this out into a utility method that is very descriptive such as:
private boolean isBetween0And31Inclusive(String inputString){
try{
if((!inputString.startsWith("0") && inputString.length() == 2) || inputString.length() == 1){
Integer myInt = Integer.parseInt(inputString);
if(0 <= myInt && myInt <= 31){
return true;
}
}
return false;
}catch(NumberFormatException exception){
return false;
}
}

How do I check if a char is a vowel?

This Java code is giving me trouble:
String word = <Uses an input>
int y = 3;
char z;
do {
z = word.charAt(y);
if (z!='a' || z!='e' || z!='i' || z!='o' || z!='u')) {
for (int i = 0; i==y; i++) {
wordT = wordT + word.charAt(i);
} break;
}
} while(true);
I want to check if the third letter of word is a non-vowel, and if it is I want it to return the non-vowel and any characters preceding it. If it is a vowel, it checks the next letter in the string, if it's also a vowel then it checks the next one until it finds a non-vowel.
Example:
word = Jaemeas then wordT must = Jaem
Example 2:
word=Jaeoimus then wordT must =Jaeoim
The problem is with my if statement, I can't figure out how to make it check all the vowels in that one line.

Clean method to check for vowels:
public static boolean isVowel(char c) {
return "AEIOUaeiou".indexOf(c) != -1;
}

Your condition is flawed. Think about the simpler version
z != 'a' || z != 'e'
If z is 'a' then the second half will be true since z is not 'e' (i.e. the whole condition is true), and if z is 'e' then the first half will be true since z is not 'a' (again, whole condition true). Of course, if z is neither 'a' nor 'e' then both parts will be true. In other words, your condition will never be false!
You likely want &&s there instead:
z != 'a' && z != 'e' && ...
Or perhaps:
"aeiou".indexOf(z) < 0

How about an approach using regular expressions? If you use the proper pattern you can get the results from the Matcher object using groups. In the code sample below the call to m.group(1) should return you the string you're looking for as long as there's a pattern match.
String wordT = null;
Pattern patternOne = Pattern.compile("^([\\w]{2}[AEIOUaeiou]*[^AEIOUaeiou]{1}).*");
Matcher m = patternOne.matcher("Jaemeas");
if (m.matches()) {
wordT = m.group(1);
}
Just a little different approach that accomplishes the same goal.

Actually there are much more efficient ways to check it but since you've asked what is the problem with yours, I can tell that the problem is you have to change those OR operators with AND operators. With your if statement, it will always be true.

So in event anyone ever comes across this and wants a easy compare method that can be used in many scenarios.
Doesn't matter if it is UPPERCASE or lowercase. A-Z and a-z.
bool vowel = ((1 << letter) & 2130466) != 0;
This is the easiest way I could think of. I tested this in C++ and on a 64bit PC so results may differ but basically there's only 32 bits available in a "32 bit integer" as such bit 64 and bit 32 get removed and you are left with a value from 1 - 26 when performing the "<< letter".
If you don't understand how bits work sorry i'm not going go super in depth but the technique of
1 << N is the same thing as 2^N power or creating a power of two.
So when we do 1 << N & X we checking if X contains the power of two that creates our vowel is located in this value 2130466. If the result doesn't equal 0 then it was successfully a vowel.
This situation can apply to anything you use bits for and even values larger then 32 for an index will work in this case so long as the range of values is 0 to 31. So like the letters as mentioned before might be 65-90 or 97-122 but since but we keep remove 32 until we are left with a remainder ranging from 1-26. The remainder isn't how it actually works, but it gives you an idea of the process.
Something to keep in mind if you have no guarantee on the incoming letters it to check if the letter is below 'A' or above 'u'. As the results will always be false anyways.
For example teh following will return a false vowel positive. "!" exclamation point is value 33 and it will provide the same bit value as 'A' or 'a' would.

For starters, you are checking if the letter is "not a" OR "not e" OR "not i" etc.
Lets say that the letter is i. Then the letter is not a, so that returns "True". Then the entire statement is True because i != a. I think what you are looking for is to AND the statements together, not OR them.
Once you do this, you need to look at how to increment y and check this again. If the first time you get a vowel, you want to see if the next character is a vowel too, or not. This only checks the character at location y=3.

String word="Jaemeas";
String wordT="";
int y=3;
char z;
do{
z=word.charAt(y);
if(z!='a'&&z!='e'&&z!='i'&&z!='o'&&z!='u'&&y<word.length()){
for(int i = 0; i<=y;i++){
wordT=wordT+word.charAt(i);
}
break;
}
else{
y++;
}
}while(true);
here is my answer.

I have declared a char[] constant for the VOWELS, then implemented a method that checks whether a char is a vowel or not (returning a boolean value). In my main method, I am declaring a string and converting it to an array of chars, so that I can pass the index of the char array as the parameter of my isVowel method:
public class FindVowelsInString {
static final char[] VOWELS = {'a', 'e', 'i', 'o', 'u'};
public static void main(String[] args) {
String str = "hello";
char[] array = str.toCharArray();
//Check with a consonant
boolean vowelChecker = FindVowelsInString.isVowel(array[0]);
System.out.println("Is this a character a vowel?" + vowelChecker);
//Check with a vowel
boolean vowelChecker2 = FindVowelsInString.isVowel(array[1]);
System.out.println("Is this a character a vowel?" + vowelChecker2);
}
private static boolean isVowel(char vowel) {
boolean isVowel = false;
for (int i = 0; i < FindVowelsInString.getVowel().length; i++) {
if (FindVowelsInString.getVowel()[i] == vowel) {
isVowel = true;
}
}
return isVowel;
}
public static char[] getVowel() {
return FindVowelsInString.VOWELS;
}
}

Java characters count in an array

Another problem I try to solve (NOTE this is not a homework but what popped into my head), I'm trying to improve my problem-solving skills in Java. I want to display this:
Students ID #
Carol McKane 920 11
James Eriol 154 10
Elainee Black 462 12
What I want to do is on the 3rd column, display the number of characters without counting the spaces. Give me some tips to do this. Or point me to Java's robust APIs, cause I'm not yet that familiar with Java's string APIs. Thanks.

It sounds like you just want something like:
public static int countNonSpaces(String text) {
int count = 0;
for (int i = 0; i < text.length(); i++) {
if (text.charAt(i) != ' ') {
count++;
}
}
return count;
}
You may want to modify this to use Character.isWhitespace instead of only checking for ' '. Also note that this will count pairs outside the Basic Multilingual Plane as two characters. Whether that will be a problem for you or not depends on your use case...

Think of solving a problem and presenting the answer as two very different steps. I won't help you with the presentation in a table, but to count the number of characters in a String (without spaces) you can use this:
String name = "Carol McKane";
int numberOfCharacters = name.replaceAll("\\s", "").length();
The regular expression \\s matches all whitespace characters in the name string, and replaces them with "", or nothing.

Probably the shortest and easiest way:
String[][] students = { { "Carol McKane", "James Eriol", "Elainee Black" }, { "920", "154", "462" } };
for (int i = 0 ; i < students[0].length; i++) {
System.out.println(students[0][i] + "\t" + students[1][i] + "\t" + students[0][i].replace( " ", "" ).length() );
}
replace(), replaces each substring (" ") of your string and removes it from the result returned, from this temporal string, without spaces, you can get the length by calling length() on it...
The String name will remain unchanged.
http://docs.oracle.com/javase/7/docs/api/java/lang/String.html
cheers

To learn more about it you should watch the API documentation for String and Character
Here some examples how to do:
// variation 1
int count1 = 0;
for (char character : text.toCharArray()) {
if (Character.isLetter(character)) {
count1++;
}
}
This uses a special short from of "for" instruction. Here's the long form for better understanding:
// variation 2
int count2 = 0;
for (int i = 0; i < text.length(); i++) {
char character = text.charAt(i);
if (Character.isLetter(character)) {
count2++;
}
}
BTW, removing whitespaces via replace method is not a good coding style to me and not quite helpful for understanding how string class works.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.