Vowel regexp in jflex - java

So I did an exercise using jflex, which is about counting the amount of words from an input text file that contains more than 3 vowels. What I end up doing was defining a token for word, and then creating a java function that receives this text as input, and check each character. If its a vowel I add up the counter and then I check if its greater than 3, if it is I add up the counter of the amount of words.
What I want to know, if there is a regexp that could match a word with more than 3 vowels. I think it would be a cleaner solution. Thanks in advance.
tokens
Letra = [a-zA-Z]
Palabra = {Letra}+

Very simple. Use this if you want to check that a word contains at least 3 vowels.
(?i)(?:[a-z]*[aeiou]){3}[a-z]*
You only care it that contains at least 3 vowels, so the rest can be any alphabetical characters. The regex above can work in both String.matches and Matcher loop, since the valid word (contains at least 3 vowels) cannot be substring of an invalid word (contains less than 3 vowels).
Out of the question, but for consonant, you can use character class intersection, which is a unique feature to Java regex [a-z&&[^aeiou]]. So if you want to check for exactly 3 vowels (for String.matches):
(?i)(?:[a-z&&[^aeiou]]*[aeiou]){3}[a-z&&[^aeiou]]*
If you are using this in Matcher loop:
(?i)(?<![a-z])(?:[a-z&&[^aeiou]]*[aeiou]){3}[a-z&&[^aeiou]]*(?![a-z])
Note that I have to use look-around to make sure that the string matched (exactly 3 vowels) is not part of an invalid string (possible when it has more than 3 vowels).

Since you yourself wrote a Java method, this can be done as follows in the same:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class VowelChecker {
private static final Pattern vowelRegex = Pattern.compile("[aeiouAEIOU]");
public static void main(String[] args) {
System.out.println(checkVowelCount("aeiou", 3));
System.out.println(checkVowelCount("AEIWW", 3));
System.out.println(checkVowelCount("HeLlO", 3));
}
private static boolean checkVowelCount(String str, int threshold) {
Matcher matcher = vowelRegex.matcher(str);
int count = 0;
while (matcher.find()) {
if (++count > threshold) {
return true;
}
}
return false;
}
}
Here threshold defines the number of vowels you are looking for (since you are looking for greater than 3, hence 3 in the main method). The output is as follows:
true
false
false
Hope this helps!
Thanks,
EG

I ended up using this regexp I came up. If anyone has a better feel free to post
Cons = [bcdBCDfghFGHjklmnJKLMNpqrstPQRSTvwxyzVWXYZ]
Vocal = [aeiouAEIOU]
Match = {Cons}*{Vocal}{Cons}*{Vocal}{Cons}*{Vocal}{Cons}*{Vocal}({Cons}*{Vocal}*|{Vocal}*{Cons}*) | {Vocal}{Cons}*{Vocal}{Cons}*{Vocal}{Cons}*{Vocal}({Cons}*{Vocal}*|{Vocal}*{Cons}*)

Related

How can I split a string without knowing the split characters a-priori?

For my project I have to read various input graphs. Unfortunately, the input edges have not the same format. Some of them are comma-separated, others are tab-separated, etc. For example:
File 1:
123,45
67,89
...
File 2
123 45
67 89
...
Rather than handling each case separately, I would like to automatically detect the split characters. Currently I have developed the following solution:
String str = "123,45";
String splitChars = "";
for(int i=0; i < str.length(); i++) {
if(!Character.isDigit(str.charAt(i))) {
splitChars += str.charAt(i);
}
}
String[] endpoints = str.split(splitChars);
Basically I pick the first row and select all the non-numeric characters, then I use the generated substring as split characters. Is there a cleaner way to perform this?
Split requires a regexp, so your code would fail for many reasons: If the separator has meaning in regexp (say, +), it'll fail. If there is more than 1 non-digit character, your code will also fail. If you code contains more than exactly 2 numbers, it will also fail. Imagine it contains hello, world - then your splitChars string becomes " , " - and your split would do nothing (that would split the string "test , abc" into two, nothing else).
Why not make a regexp to fetch digits, and then find all sequences of digits, instead of focussing on the separators?
You're using regexps whether you want to or not, so let's make it official and use Pattern, while we are at it.
private static final Pattern ALL_DIGITS = Pattern.compile("\\d+");
// then in your split method..
Matcher m = ALL_DIGITS.matcher(str);
List<Integer> numbers = new ArrayList<Integer>();
// dont use arrays, generally. List is better.
while (m.find()) {
numbers.add(Integer.parseInt(m.group(0)));
}
//d+ is: Any number of digits.
m.find() finds the next match (so, the next block of digits), returning false if there aren't any more.
m.group(0) retrieves the entire matched string.
Split the string on \\D+ which means one or more non-digit characters.
Demo:
import java.util.Arrays;
public class Main {
public static void main(String[] args) {
// Test strings
String[] arr = { "123,45", "67,89", "125 89", "678 129" };
for (String s : arr) {
System.out.println(Arrays.toString(s.split("\\D+")));
}
}
}
Output:
[123, 45]
[67, 89]
[125, 89]
[678, 129]
Why not split with [^\d]+ (every group of nondigfit) :
for (String n : "123,456 789".split("[^\\d]+")) {
System.out.println(n);
}
Result:
123
456
789

Why all the test cases are not getting passed for the below question?

The Coding Question which I am trying to solve is this. I tried to solve but not all the test cases passed I am not able to find what could be the reason?
Identify possible words: Detective Bakshi while solving a case stumbled upon a letter which had many words whose one character was missing i.e. one character in the word was replaced by an underscore. For e.g.“Fi_er”. He also found thin strips of paper which had a group of words separated by colons, for e.g. “Fever:filer:Filter:Fixer:fiber:fibre:tailor:offer”. He could figure out that the word whose one character was missing was one of the possible words from the thin strips of paper. Detective Bakshi has approached you (a computer programmer) asking for help in identifying the possible words for each incomplete word.
You are expected to write a function to identify the set of possible words.
The function identifyPossibleWords takes two strings as input
where,
input1 contains the incomplete word, and
input2 is the string containing a set of words separated by colons.
The function is expected to find all the possible words from input2 that can replace the incomplete word input1, and return the result in the format suggested below.
Example1 -
input1 = “Fi_er”
input2 = “Fever:filer:Filter:Fixer:fiber:fibre:tailor:offer”
output string should be returned as “FILER:FIXER:FIBER”
Note that –
The output string should contain the set of all possible words that can replace the incomplete word in input1
all words in the output string should be stored in UPPER-CASE
all words in the output string should appear in the order in which they appeared in input2, i.e. in the above example we have FILER followed by FIXER followed by FIBER.
While searching for input1 in input2, the case of the letters are ignored, i.e “Fi_er” matches with “filer” as well as “Fixer” as well as “fiber”.
IMPORTANT: If none of the words in input2 are possible candidates to replace input1, the output string should contain the string “ERROR-009”
Assumption(s):
Input1 will contain only a single word with only 1 character replaced by an underscore “_”
Input2 will contain a series of words separated by colons and NO space character in between
Input2 will NOT contain any other special character other than underscore and alphabetic characters.
My solution for the question is:
import java.io.*;
import java.util.*;
class UserMaincode
{
public String indentifyPossibleWords(String input1, String input2)
{
input1=input1.toUpperCase();
input2=input2.toUpperCase();
String arr1[]=input1.split("_");
String arr2[]=input2.split(":");
StringBuilder sb=new StringBuilder("");
for(int i=0;i<arr2.length;i++){
if(arr2[i].matches(arr1[0]+"."+arr1[1])){
sb.append(arr2[i]+":");
}
}
if(sb.length()!=0){
sb.deleteCharAt(sb.length()-1);
}
String s=sb.toString();
if(s==""){
return "ERROR-009";
}
return s;
}
}
But some of hidden testcases did not pass. Where could be the problem.
I found one code from web which passes all the test case. Please refer this link for that.
https://www.csinfo360.com/2020/01/cracking-coding-interview-step-11.html
There are many ways to achieve the result as expected in the mentioned problem. Since; you've mentioned regex in the tag; therefore I'll try to provide a possible solution using regex. Although; this can be achieved without them too.
Proposed Procedure:
1. Create a regex from the given input1 i.e. replace the _ present anywhere inside input1 with regex dot (.) meta-character.
2. Split the string based on :.
3. Keep a count of length of spliced array of input2.
4. for each item in input2:
5. match using the regex formed in step 1
If successful
append to upper-cased result.
else:
increment the counter.
6. if counter == length of spliced array i.e. no match found
return "ERROR-009"
else
return the appended result.
Implementation of the above procedure in java:
import java.util.regex.Pattern;
import java.util.regex.Matcher;
public class Main
{
public static void main(String[] args) {
System.out.println(identifyPossibleWords("Fi_er", "Fever:filer:Filter:Fixer:fiber:fibre:tailor:offer"));
// Fever:fiqqer:Filter:Fixxer:fibber:fibre:tailor:offer return ERROR-009
}
public static String identifyPossibleWords(String input1, String input2){
input1 = input1.replace("_", ".");
StringBuilder sb = new StringBuilder();
int counter = 0;
int lengthOfInput2 = input2.split(":").length;
final Pattern pattern = Pattern.compile(input1, Pattern.CASE_INSENSITIVE);
for(String str: input2.split(":")){
Matcher matcher = pattern.matcher(str);
if(matcher.matches())sb.append(matcher.group(0).toUpperCase() + "\n"); // \n to print in new line. You can edit the code accordingly.
else counter++;
}
if(counter == lengthOfInput2)return "ERROR-009";
return sb.toString();
}
}
You can find the sample run of the above implementation in here.
easy fix--->
input1=input1.toUpperCase();
input2=input2.toUpperCase();
String arr1[]=input1.split("_");
String arr2[]=input2.split(":");
StringBuilder sb=new StringBuilder("");
for(int i=0;i<arr2.length;i++){
if(arr2[i].matches(arr1[0]+"."+arr1[1])){
sb.append(arr2[i]+":");
}
}
if(sb.length()!=0){
sb.deleteCharAt(sb.length()-1);
}
String x = "ERROR-009";
String s=sb.toString();
if(sb.length()==0){ // this
return x.toString();
}
return s;
}
}

How to match two string using java Regex

String 1= abc/{ID}/plan/{ID}/planID
String 2=abc/1234/plan/456/planID
How can I match these two strings using Java regex so that it returns true? Basically {ID} can contain anything. Java regex should match abc/{anything here}/plan/{anything here}/planID
If your "{anything here}" includes nothing, you can use .*. . matches any letter, and * means that match the string with any length with the letter before, including 0 length. So .* means that "match the string with any length, composed with any letter". If {anything here} should include at least one letter, you can use +, instead of *, which means almost the same, but should match at least one letter.
My suggestion: abc/.+/plan/.+/planID
If {ID} can contain anything I assume it can also be empty.
So this regex should work :
str.matches("^abc.*plan.*planID$");
^abc at the beginning
.* Zero or more of any Character
planID$ at the end
I am just writing a small code, just check it and start making changes as per you requirement. This is working, check for your other test cases, if there is any issue please comment that test case. Specifically I am using regex, because you want to match using java regex.
import java.util.regex.Matcher;
import java.util.regex.Pattern;
class MatchUsingRejex
{
public static void main(String args[])
{
// Create a pattern to be searched
Pattern pattern = Pattern.compile("abc/.+/plan/.+/planID");
// checking, Is pattern match or not
Matcher isMatch = pattern.matcher("abc/1234/plan/456/planID");
if (isMatch.find())
System.out.println("Yes");
else
System.out.println("No");
}
}
If line always starts with 'abc' and ends with 'planid' then following way will work:
String s1 = "abc/{ID}/plan/{ID}/planID";
String s2 = "abc/1234/plan/456/planID";
String pattern = "(?i)abc(?:/\\S+)+planID$";
boolean b1 = s1.matches(pattern);
boolean b2 = s2.matches(pattern);

How to make a regular expression match based on a condition?

I'm trying to make a conditional regex, I know that there are other posts on stack overflow but there too specific to the problem.
The Question
How can I create a regular expression that only looks to match something given a certain condition?
An example
An example of this would be if we had a list of a string(this is in java):
String nums = "42 36 23827";
and we only want to match if there are the same amount of x's at the end of the string as there are at the beginning
What we want in this example
In this example, we would want a regex that checks if there are the same amount of regex's at the end as there are in the beginning. The conditional part: If there are x's at the beginning, then check if there are that many at the end, if there are then it is a match.
Another example
An example of this would be if we had a list of numbers (this is in java) in string format:
String nums = "42 36 23827";
and we want to separate each number into a list
String splitSpace = "Regex goes here";
Pattern splitSpaceRegex = Pattern.compile(splitSpace);
Matcher splitSpaceMatcher = splitSpaceRegex.matcher(text);
ArrayList<String> splitEquation = new ArrayList<String>();
while (splitSpaceMatcher.find()) {
if (splitSpaceMatcher.group().length() != 0) {
System.out.println(splitSpaceMatcher.group().trim());
splitEquation.add(splitSpaceMatcher.group().trim());
}
}
How can I make this into an array that looks like this:
["42", "36", "23827"]
You could try making a simple regex like this:
String splitSpace = "\\d+\\s+";
But that exludes the "23827" because there is no space after it.
and we only want to match if there are the same amount ofx`'s at the end of the string as there are at the beginning
What we want in this example
In this example, we would want a regex that checks if it is the end of the string; if it is then we don't need the space, otherwise, we do. As #YCF_L mentioned we could just make a regex that is \\b\\d\\b but I am aiming for something conditional.
Conclusion
So, as a result, the question is, how do we make conditional regular expressions? Thanks for reading and cheers!
There are no conditionals in Java regexes.
I want a regex that checks if there are the same amount of regex's at the end as there are in the beginning. The conditional part: If there are x's at the beginning, then check if there are that many at the end, if there are then it is a match.
This may or may not be solvable. If you want to know if a specific string (or pattern) repeats, that can be done using a back reference; e.g.
^(\d+).+\1$
will match a line consisting of an arbitrary number digits, any number of characters, and the same digits matched at the start. The back reference \1 matches the string matched by group 1.
However if you want the same number of digits at the end as at the start (and that number isn't a constant) then you cannot implement this using a single (Java) regex.
Note that some regex languages / engines do support conditionals; see the Wikipedia Comparison of regular-expression engines page.
I would like to use split which accept regex like so :
String[] split = nums.split("\\s+"); // ["42", "36", "23827"]
If you want to use Pattern with Matcher, then you can use String \b\d+\b with word boundaries.
String regex = "\\b\\d+\\b";
By using word boundaries, you will avoid cases where the number is part of the word, for example "123 a4 5678 9b" you will get just ["123", "4578"]
I do not see the "conditional" in the question. The problem is solvable with a straight forward regular expression: \b\d+\b.
regex101 demo
A fully fledged Java example would look something like this:
import java.util.ArrayList;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
class Ideone {
public static void main(String args[]) {
final String sample = "123 45 678 90";
final Pattern pattern = Pattern.compile("\\b\\d+\\b");
final Matcher matcher = pattern.matcher(sample);
final ArrayList<String> results = new ArrayList<>();
while (matcher.find()) {
results.add(matcher.group());
}
System.out.println(results);
}
}
Output: [123, 45, 678, 90]
Ideone demo

Find total number of occurrences of a substring

Suppose I want to find total number of occurrences of following substring.
Any substring that starts with 1 followed by any(0 or more) number of 0's and then followed by 1.
I formed a regular expression for it: 1[0]*1
Then I used the Pattern and Matcher class of java to do the rest of the work.
import java.util.regex.*;
class P_m
{
public static void main(String []args)
{
int s=0;
Pattern p=Pattern.compile("1[0]*1");
Matcher matcher=p.matcher("1000010101");
while(matcher.find())
++s;
System.out.println(s);
}
}
But the problem is when we have two consecutive substrings that overlap, the above code outputs answer 1 less than actual number of occurrences. For example in above code output is 2 whereas it should be 3. Can I modify above code to return the correct output.
Use a positive lookahead:
"10*(?=1)"
This matches the same pattern as you described (starts with 1, followed by zero or more 0, followed by 1), but the difference is that the final 1 is not included in the match. This way, that last 1 is not "consumed" by the match, and it can participate in further matches, effectively allowing the overlap that you asked for.
Pattern p = Pattern.compile("10*(?=1)");
Matcher matcher = p.matcher("1000010101");
int s = 0;
while (matcher.find()) ++s;
System.out.println(s);
Outputs 3 as you wanted.

Categories