Regex to count the number of syllables in a text - java

I searched the whole internet and to my sadness found that there is no correct implementation of count of syllables in a text using regex on the internet. First I would like to clear the definition of a syllable:
Syllables are defined as: a contiguous sequence of vowels, except for a lone "e" at the end of a word if the word has another set of contiguous vowels, makes up one syllable. y is considered a vowel.
I used the following regex expression statement (with split in Java):
import java.io.FileReader;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.Map;
import java.util.Map.Entry;
import java.util.Scanner;
class Graph {
private Map<Integer, ArrayList<Integer>> adjLists;
private int numberOfVertices;
private int numberOfEdges;
public Graph(int V){
adjLists = new HashMap<>(V);
for(int i=0; i<V; i++){
adjLists.put(i, new ArrayList<Integer>());
}
this.numberOfVertices = V;
this.numberOfEdges = 0;
}
public int getNumberOfEdges(){
return this.numberOfEdges;
}
public int getNumberOfVertices(){
return this.numberOfVertices;
}
public void addVertex(){
adjLists.put(getNumberOfVertices(), new ArrayList<Integer>());
this.numberOfVertices++;
}
public void addEdge(int u, int v){
adjLists.get(u).add(v);
adjLists.get(v).add(u);
this.numberOfEdges++;
}
public ArrayList<Integer> getNeighbours(int u){
return new ArrayList<Integer>(adjLists.get(u));
}
public void printTheGraph() {
for(Entry<Integer, ArrayList<Integer>> list: adjLists.entrySet()){
System.out.print(list.getKey()+": ");
for(Integer i: list.getValue()){
System.out.print(i+" ");
}
System.out.println();
}
}
}
#SuppressWarnings("resource")
public class AdjacencyListGraphTest {
public static void main(String[] args) throws Exception {
FileReader reader = new FileReader("graphData");
Scanner in = new Scanner(reader);
int E, V;
V = in.nextInt();
E = in.nextInt();
Graph graph = new Graph(V);
for(int i=0; i<E; i++){
int u, v;
u = in.nextInt();
v = in.nextInt();
graph.addEdge(u, v);
}
graph.printTheGraph();
}
}
But thet didn't work.
The main problem is to how the last 'e' rule is to be figured out using regex. Only the regex expression would suffice. Thank you.
P.S: People unknown to the topic please don't point to other stackoverflow questions as none of them has a correct implemented answer.

This gives you a number of syllables vowels in a word:
public int getNumVowels(String word) {
String regexp = "[bcdfghjklmnpqrstvwxz]*[aeiouy]+[bcdfghjklmnpqrstvwxz]*";
Pattern p = Pattern.compile(regexp);
Matcher m = p.matcher(word.toLowerCase());
int count = 0;
while (m.find()) {
count++;
}
return count;
}
You can call it on every word in your string array:
String[] words = getText().split("\\s+");
for (String word : words ) {
System.out.println("Word: " + word + ", vowels: " + getNumVowels(word));
}

Related

Checking numbers in a string

Notice: I know that there are tons of ways to make this simpler, but it is not allowed. I am bounded to plain, basic java, loops and hand written methods.
Even arrays are not allowed.Regex as well.
Task is to check for numbers in each word of a sentence,find the word with the greatest number which is at the same time POWER OF 3.
I did everything here and it works fine until I enter something like this.
asdas8 dasjkj27 asdjkj64 asdjk333 asdjkj125
I receive output 64 instead of 125, because it stops checking when it reaches first number WHICH IS NOT POWER OF 3.
How can I continue the iteration till the end of my sentence and avoid stopping when I reach non power of 3 number ,how to modify this code to achieve that ?
Edit: But if I enter more than one word after the one that FAILS THE CONDITION, it will work just fine.
for instance:
asdas8 dasjkj27 asdjkj64 asdjk333 asdjkj125 asdash216
Here is my code:
public class Nine {
static int num(String s) { // method to change string to int
int b = 0;
int o = 0;
for (int i = s.length() - 1; i >= 0; i--) {
char bi = s.charAt(i);
b += (bi - '0') * (int) Math.pow(10, o);
o++;
}
return b;
}
static boolean thirdPow(int a) {
boolean ntrec = false;
if (Math.cbrt(a) % 1 == 0)
ntrec = true;
return ntrec;
}
static int max(int a, int b) {
int max= 0;
if (a > b)
max= a;
else
max= b;
System.out.print(max);
return max;
}
static String search(String r) {
String current= ""; // 23aa64
String currentA= "";
String br = ""; // smjestamo nas broj iz rijeci 23
int bb = 0; // nas pretvoreni string u broj
int p = 0;
for (int i = 0; i < r.length(); i++) {
current+= r.charAt(i);
if (r.charAt(i) == ' ') {
for (int j = 0; j < current.length(); j++) {
while ((int) current.charAt(j) > 47 && (int) current.charAt(j) < 58) {
br += current.charAt(j);
j++;
}
bb = num(br);
System.out.println("Third pow" + thirdPow(bb));
if (thirdPow(bb)) {
p = max(p, bb);
}
br = "";
}
current= "";
}
}
String pp = "" + p;
String finalRes= "";
for (int u = 0; u < r.length(); u++) {
currentA+= r.charAt(u);
if (r.charAt(u) == ' ') {
if (currentA.contains(pp))
finalRes+= currentA;
currentA= "";
}
}
System.out.println(p);
return finalRes;
}
public static void main(String[] args) {
Scanner scan = new Scanner(System.in);
System.out.print("Enter sentence: ");
String r = scan.nextLine();
System.out.println("Our string is : " + search(r));
}
}
I am assuming that each word is separated by an empty space and containing non-Integers.
Usage of regular expressions will certainly reduce the code complexity, Let's try this code: -
String input = "asdas8 dasjkj27 asdjkj64 asdjk333 asdjkj125";
String[] extractWords = r.split(" "); //extracting each words
int[] numbers = new int[extractWords.length]; // creating an Integer array to store numbers from each word
int i=0;
for(String s : extractWords) {
numbers[i++] = Integer.parseInt(s.replaceAll("\\D+", "")); // extracting numbers
}
Now, the "numbers" array will contain [8, 27, 64, 333, 125]
You can use your logic to find a maximum among them. Hope this helps.
You can just do what I am doing. First split the sentence to chunks of words. I am doing it based on spaces, hence the in.split("\\s+"). Then find the numbers from these words. On these numbers check for the highest number only if it is a power of 3.
/* package whatever; // don't place package name! */
import java.util.*;
import java.lang.*;
import java.io.*;
import java.util.regex.*;
/* Name of the class has to be "Main" only if the class is public. */
class Ideone
{
static boolean isPowOfThree(int num)
{
int temp = (int)Math.pow(num, 1f/3);
return (Math.pow(temp, 3) == num);
}
public static void main (String[] args) throws java.lang.Exception
{
Scanner sc = new Scanner(System.in);
String in = sc.nextLine();
String[] words = in.split("\\s+");
String maxWord = ""; //init default word
int maxNum = -1; //init default num
for(String word : words)
{
Pattern p = Pattern.compile("\\d+");
Matcher m = p.matcher(word);
while (m.find())
{
String num = m.group();
if(isPowOfThree(Integer.parseInt(num)))
{
if(Integer.parseInt(num) > maxNum)
{
maxNum = Integer.parseInt(num);
maxWord = word;
}
}
}
}
if(maxNum > -1)
{
System.out.println("Word is : " + maxWord);
}
else
{
System.out.println("No word of power 3");
}
}
}
The problem can be solved using \\d+ regular expression with Matcher and Pattern API in Java.
package com.company;
import java.util.ArrayList;
import java.util.Collections;
import java.util.List;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Main {
public static void main(String[] args) {
String i = "asdas8 dasjkj278 asdjkj64 asdjk333 asdjkj125";
Matcher matcher = Pattern.compile("\\d+").matcher(i);
List<Integer> numbers = new ArrayList<>();
while (matcher.find()){
numbers.add(Integer.parseInt(matcher.group()));
}
Collections.sort(numbers);
Collections.reverse(numbers);
Integer power3 = 0;
for (Integer n : numbers) {
if (isPowOfThree(n)) {
power3 = n;
break;
}
}
System.out.println(power3);
}
static boolean isPowOfThree(int num) {
int temp = (int)Math.pow(num, 1f/3);
return (Math.pow(temp, 3) == num);
}
}
Upon using \\d+ regular expression we get all the digits in the given string for every iteration of while(matcher.find()). Once we collect all the numbers in the given string, we need to reverse sort the collection. If we iterate over this collection, the first number that we find is the largest number which is a power of 3, since the collection is already sorted in descending order.
Brother use
*string.split(" ");*
to form an array of strings and then iterate through the array and parse the numbers using regex
^[0-9]
or
\d+
and then find the biggest number from the array as simple as that. Brother proceeds step by step then your code will run faster.

How do you identify if a char entry is a two or more digit number?

What I mean to say is, if I have an array which is delimited by spaces how can I distinguish if two consecutive chars are two or more digit numbers instead?
Bear with me I'm still pretty new to programming in general.
this is what I have so far:
import java.util.*;
public class calc
{
public static String itemList;
public static String str;
public static void main(String[] args)
{
Scanner sc = new Scanner(System.in);
str = sc.nextLine();
delimitThis();
sc.close();
}
public static void delimitThis()// Delimiter to treat variable and numbers as seperate
{
List<String> items = Arrays.asList(str.split("\\s+"));
System.out.println(items);
for (int i = 0; i < str.length(); i++)
{
itemList = items.get(i);
category();
}
}
public static void category()////For Filtering between vars and constants and functions
{
for (int x = 0; x < itemList.length(); x++)
{
char ite = itemList.charAt(x);
if(Character.isDigit(ite))
{
System.out.println(ite + "is numeric"); //Will be replaced by setting of value in a 2 dimensional list
}
}
}
First of all, I want to fix your mistakes:
Mistake 1:
// bad
for (int i = 0; i < str.length(); i++)
{
itemList = items.get(i);
category();
}
You are traversing through List<String> items, but str.length is being used. It is wrong. To {print the item then do category()} for every item in items, the code should be:
// fixed
for (int i = 0; i < items.size(); i++)
{
itemList = items.get(i);
category();
}
Mistake 2:
for (int x = 0; x < itemList.length(); x++)
{
System.out.println(itemList);
}
I'm not sure what you wanted to do here. It's just that your code does not make sense to me. I assume you wanted to print line every character from itemList, the code should look like this:
for (int x = 0; x < itemList.length(); x++)
{
System.out.println(itemList.charAt(x));
}
Done with the mistakes. Now checking an a string whether it contains 2 digit numbers or more, we can use String.matches() with regular expression:
if(itemList.matches("\\d\\d+")){
System.out.println(itemList + " is a two or more digit number");
}else{
System.out.println(itemList + " is NOT a two or more digit number");
}
The code looks like this in the end:
import java.util.*;
public class Calc
{
public static String itemList;
public static String str;
public static void main(String[] args)
{
Scanner sc = new Scanner(System.in);
str = sc.nextLine();
delimitThis();
sc.close();
}
public static void delimitThis()// Delimiter to treat variable and numbers as seperate
{
List<String> items = Arrays.asList(str.split("\\s+"));
System.out.println(items);
for (int i = 0; i < items.size(); i++)
{
itemList = items.get(i);
category();
}
}
public static void category()////For Filtering between vars and constants and functions
{
for (int x = 0; x < itemList.length(); x++)
{
System.out.println(itemList.charAt(x));
}
// is 2 digit number or not?
if(itemList.matches("\\d\\d+")){
System.out.println(itemList + " is a two or more digit number");
}else{
System.out.println(itemList + " is NOT a two or more digit number");
}
}
}
To check whether a string is at least a 2 digit number, use this regex: \d{2,}.
public static boolean is2OrMoreDigits(String s) {
Pattern p = Pattern.compile("\\d{2,}");
return o.matcher(s).matches();
}

Duplicate Encoder codewars java Junit exception

I am doing a kata on Codewars named "Duplicate Encoder".
The code I have written does its job correctly, but junit(4.12) insists it does not for some reason. Both on the website and in my IDE (Eclipse). I have no idea why that is. Could someone shine some light on this issue? Thanks.
The class to be tested:
package com.danman;
import java.util.*;
public class Person {
static String encode(String word){
word = word.toLowerCase();
List<String> bank = new ArrayList<>();
StringBuilder wordTwo = new StringBuilder("");
//1: create a list of all unique elements in the string
for (int n = 0; n < word.length(); n++) {
String temp = word.substring(n, n+1);
if (temp.equals(" ")){continue;}
bank.add(temp);
}
for (int r = 0; r <word.length(); r++){
List<String> bankTwo = bank;
Iterator<String> it = bankTwo.iterator();
String tempTwo = word.substring(r, r+1);
int count = 0;
//2: iterate through the list of elements and append the appropriate token to the StringBuilder
while (it.hasNext()){
if (it.next().equals(tempTwo)){
++count;
}
}
if (count <= 1){
wordTwo.append("(");
} else {
wordTwo.append(")");
}`enter code here`
}
word = wordTwo.toString();
return word;
}
public static void main(String[] args) {
Person rinus = new Person();
System.out.println(rinus.encode("Prespecialized"));
}
Junit file:
package com.danman;
import org.junit.Test;
import static org.junit.Assert.assertEquals;
public class PersonTest {
#Test
public void test() {
assertEquals(")()())()(()()(", Person.encode("Prespecialized"));
assertEquals("))))())))", Person.encode(" ()( "));
}
As far as I understand your code, first assert is ok. I don't know why encoding " ()( " should return "))))())))". You iterate through bank list of characters in given string (spaces are excluded in that list), checking whether there is more than one occurence of each character from the word in the bank list. When you check if there is more than one space, the answer will be no, appending (, because count value will equal 0 (due to spaces being excluded from the bank list).
The second assert should rather be
assertEquals("((()()(((", Person.encode(" ()( "));
Maybe you need this
import java.util.ArrayList;
import java.util.Iterator;
public class DuplicateEncoder {
static String encode(String word) {
word=word.toUpperCase();
ArrayList<String> stack1 =new ArrayList<>();
StringBuilder stringBuilder = new StringBuilder();
for(int i=0;i<word.length();i++){
String t = word.substring(i,i+1);
stack1.add(t);
}
for(int i=0;i<word.length();i++){
Iterator<String> iterator =stack1.iterator();
String t = word.substring(i,i+1);
int count=0;
while(iterator.hasNext()){
if(iterator.next().equals(t)){
count++;
}
}
if(count>1){
stringBuilder.append(")");
}
else stringBuilder.append("(");
}
word=stringBuilder.toString();
return word;}
public static void main(String[] args) {
encode("Pup");
}
}

Finding the count of a given word using a wordclass

I have an assignment where I have to create 3 classes Oblig6(main method), Word(Ord) and Wordlist(Ordliste). I have to find the number of times a word is repeated in a text using the word class.
I have a problem formulating the following segment. I need the word class to make a new object of the given word if it's already in the wordlist(ArrayLis ordliste), and then next time it finds the same word in the text, it has to add 1 to the total amount for that specific object defined by Ord(String s). I know that i'm creating a new object every time it finds a word that is in the wordlist, i need a suggestion on how to formulate it correctly.
Here is my code.
The wordlist class, the main problem is in void fraOrdtilOrdliste.
import java.util.Scanner;
import java.io.File;
import java.util.ArrayList;
public class Ordliste {
private ArrayList<String> ord = new ArrayList<String>();
private ArrayList<String> ordliste = new ArrayList<String>();
private int i = 0;
private int totalord = 0;
private int antallforekomster = 0;
// Reads the provided txt file and puts the words into a word list
void lesBok(String filnavn) throws Exception {
File file = new File(filnavn);
Scanner innlestfil = new Scanner(file);
while (innlestfil.hasNextLine()) {
ord.add(innlestfil.nextLine());
}
}
// Reads ord arryalist and compares the words to ordliste arraylist, adds them if they are not inn it all ready
//If they are there, crates a new Ord(String s)object of that words and adds to amount.
void fraOrdtilOrdliste () {
ordliste.add(ord.get(i));
for (i=0;i<ord.size();i++) {
Boolean unik = true;
for (int j = 0; j<ordliste.size();j++) {
if (ordliste.get(j).equalsIgnoreCase(ord.get(i))) {
unik = false;
new Ord(ordliste.get(j)).oekAntall();
}
}
if (unik) {
ordliste.add(ord.get(i));
}
}
}
// Using the Ord class as a counter for this method. If the word is registerd beforhand it will add 1.
void leggTilOrd(String s) {
for (i = 0; i < ord.size(); i++) {
if (ord.get(i).equalsIgnoreCase(s)) {
ord.add(i, s);
System.out.println("Suksess");
} else if (!ord.get(i).equalsIgnoreCase(s)) {
new Ord(s).oekAntall();
System.out.println("Antall okt");
return;
}
}
}
// Searches for the word in the wordlist and returns null if it does not exist.
Ord finnOrd(String s) {
for (i = 0; i < ord.size(); i++) {
if (!s.equalsIgnoreCase(ord.get(i))) {
System.out.println("null");
return null;
} else if (s.equalsIgnoreCase(ord.get(i))) {
System.out.println("Fant ordet");
}
}
return null;
}
// Prints out the total amount of words in the word list.
int antallOrd() {
for (i = 0; i < ordliste.size(); i++) {
totalord++;
}
System.out.println("Antall ord i ordlisten er: " + totalord);
return totalord;
}
// Counts the total amounts of a word in the word list.
int antallForekomster(String s){
antallforekomster= new Ord(s).hentAntall();
System.out.println("Ordet forekommer " + antallforekomster + " ganger");
return antallforekomster;
}
Hers is the word class.
ok, let me give it a shot, because i am not even sure i am reading your code correctly.
a) Define a class Word that has one member variable for the count and one member variable for the String word.
b) In your wordlist class have a member variable that is a List. Every time you parse a word out, loop through the List comparing the string you have with the string of the word. If matches, increment the count in the word class.
The loop sounds really ineffecient, but if you use a List then thats all you can really do. So your performance is basically O(nsquare) where n is the number of words in the text given.
WordList Class :
public class WordList {
static List<Word> words = new ArrayList<Word>();
public static void countWord(String inputWord) {
for (Word word : words) {
if (word.getWord().equals(inputWord)) {
word.setCount(word.getCount() + 1);
} else {
Word newWord = new Word();
newWord.setWord(inputWord);
newWord.setCount(1);
words.add(newWord);
}
}
}
}
Word Class :
public class Word {
String word;
int count;
public String getWord() {
return word;
}
public void setWord(String word) {
this.word = word;
}
public int getCount() {
return count;
}
public void setCount(int count) {
this.count = count;
}
}

Java: I've created a list of word objects to include the name and the frequency, but having trouble updating the frequency

I'm working on a project which has a dictionary of words and I'm extracting them and adding them to an ArrayList as word objects. I have a class called Word as below.
What I'm wondering is how do I access these word objects to update the frequency? As part of this project, I need to only have one unique word, and increase the frequency of that word by the number of occurrences in the dictionary.
Word(String word)
{
this.word = word;
this.freq = 0;
}
public String getWord() {
return word;
}
public int getFreq() {
return freq;
}
public void setFreq() {
freq = freq + 1;
}
This is how I am adding the word objects to the ArrayList...I think it's ok?
String pattern = "[^a-zA-Z\\s]";
String strippedString = line.replaceAll(pattern, "");
line = strippedString.toLowerCase();
StringTokenizer st = new StringTokenizer(line);
while (st.hasMoreTokens())
{
String newWord = st.nextToken();
word.add(new Word(newWord));
count++;
}
Instead of an ArrayList use a Bag. This keeps the counts for you.
Use a map to store the words and the Word Object. Ideally a hashset is enough to do this. But internally a hashset is going to use a HashMap anyway. The following piece of code will also be useful for you to increase the frequency of the words that you had already inserted.
Map<String, Word> wordsMap = new HashMap<String, Word>();
String pattern = "[^a-zA-Z\\s]";
String strippedString = line.replaceAll(pattern, "");
line = strippedString.toLowerCase();
StringTokenizer st = new StringTokenizer(line);
while (st.hasMoreTokens())
{
String newWord = st.nextToken();
if(!wordsMap.containsKey(newWord)){
wordsMap.put(newWord, new Word(newWord));
}else{
Word existingWord = wordsMap.get(newWord);
existingWord.setFreq();
}
count++;
}
I would solve the problem with the following code:
import java.util.ArrayList;
import java.util.Collections;
import java.util.Comparator;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
public class Word {
private final String word;
private int frequency;
public Word(String word) {
this.word = word;
this.frequency = 0;
}
public String getWord() {
return word;
}
public int getFrequency() {
return frequency;
}
public void increaseFrequency() {
frequency++;
}
I didn't call this method setFrequency because it is not a real setter method. For a real setter method, you would pass it exactly one parameter.
public static List<Word> histogram(String sentence) {
First, compute the frequency of the individual words.
String[] words = sentence.split("\\W+");
Map<String, Word> histo = new HashMap<String, Word>();
for (String word : words) {
Word w = histo.get(word);
if (w == null) {
w = new Word(word);
histo.put(word, w);
}
w.increaseFrequency();
}
Then, sort the words such that words with higher frequency appear first.
If the frequency is the same, the words are sorted almost alphabetically.
List<Word> ordered = new ArrayList<Word>(histo.values());
Collections.sort(ordered, new Comparator<Word>() {
public int compare(Word a, Word b) {
int fa = a.getFrequency();
int fb = b.getFrequency();
if (fa < fb)
return 1;
if (fa > fb)
return -1;
return a.getWord().compareTo(b.getWord());
}
});
return ordered;
}
Finally, test the code with a simple example.
public static void main(String[] args) {
List<Word> freq = histogram("a brown cat eats a white cat.");
for (Word word : freq) {
System.out.printf("%4d %s\n", word.getFrequency(), word.getWord());
}
}
}
You can use a google collections' Multiset of String instead of the Word class

Categories