How to find words having given letter using Java Regex - java

public class Homework {
public static void main(String[] args) {
String words[] = { "Abendessen", "Affe", "Affen", "aber", "anders", "Attacke", "arrangieren", "Art", "Asien",
"Bund", "Arten", "Biene", "Abend", "baden", "suchen", "A1rten", "Abend-Essen" };
Pattern pattern = Pattern.compile("[aA][a-z[n]+a-z]*");
for (int i = 0; i < words.length; i++) {
Matcher matcher = pattern.matcher(words[i]);
if (matcher.find()) {
System.out.println("OK: " + words[i]);
}
}
}
}
Filters for words beginning with a or A and having an n in the word. These words may only consist of letters and have only small letters starting with the second letter.
These words should be matched: Abendessen, Affen, anders, arrangieren, Asien, Arten, Abend
I've tried this regular expression above carelessly and believe that's wrong too.

Your current pattern [aA][a-z[n]+a-z]* reads as:
Character class [aA], character class [a-z[n]+. It is then followed by a-z]* which will match an a, -, z and ] repeated 0+ times.
That would for example match Abendessena-z]
What you might do is to start the match with a or A and repeat 2 times [a-z] 0+ times and make sure that there is a n in the middle:
\b[aA][a-z]*n[a-z]*\b
Explanation
\b Word boundary
[aA] Match a or A
[a-z]* Match 0+ times a-z
n Match n
[a-z]* Match 0+ times a-z
\b Word boundary
You might also use anchors ^ and $ to assert that start and the end of the string instead of \b
Regex demo

Related

Java regex repeating capture groups

Considering the following string: "${test.one}${test.two}" I would like my regex to return two matches, namely "test.one" and "test.two". To do that I have the following snippet:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegexTester {
private static final Pattern pattern = Pattern.compile("\\$\\{((?:(?:[A-z]+(?:\\.[A-z0-9()\\[\\]\"]+)*)+|(?:\"[\\w/?.&=_\\-]*\")+)+)}+$");
public static void main(String[] args) {
String testString = "${test.one}${test.two}";
Matcher matcher = pattern.matcher(testString);
while (matcher.find()) {
for (int i = 0; i <= matcher.groupCount(); i++) {
System.out.println(matcher.group(i));
}
}
}
}
I have some other stuff in there as well, because I want this to also be a valid match ${test.one}${"hello"}.
So, basically, I just want it to match on anything inside of ${} as long as it either follows the format: something.somethingelse (alphanumeric only there) or something.somethingElse() or "something inside of quotations" (alphanumeric plus some other characters). I have the main regex working, or so I think, but when I run the code, it finds two groups,
${test.two}
test.two
I want the output to be
test.one
test.two
Basically, your regex main problem is that it matches only at the end of string, and you match many more chars that just letters with [A-z]. Your grouping also seem off.
If you load your regex at regex101, you will see it matches
\$\{
( - start of a capturing group
(?: - start of a non-capturing group
(?:[A-z]+ - start of a non-capturing group, and it matches 1+ chars between A and z (your first mistake)
(?:\.[A-z0-9()\[\]\"]+)* - 0 or more repetitions of a . and then 1+ letters, digits, (, ), [, ], ", \, ^, _, and a backtick
)+ - repeat the non-capturing group 1 or more times
| - or
(?:\"[\w/?.&=_\-]*\")+ - 1 or more occurrences of ", 0 or more word, /, ?, ., &, =, _, - chars and then a "
)+ - repeat the group pattern 1+ times
) - end of non-capturing group
}+ - 1+ } chars
$ - end of string.
To match any occurrence of your pattern inside a string, you need to use
\$\{(\"[^\"]*\"|\w+(?:\(\))?(?:\.\w+(?:\(\))?)*)}
See the regex demo, get Group 1 value after a match is found. Details:
\$\{ - a ${ substring
(\"[^\"]*\"|\w+(?:\(\))?(?:\.\w+(?:\(\))?)*) - Capturing group 1:
\"[^\"]*\" - ", 0+ chars other than " and then a "
| - or
\w+(?:\(\))? - 1+ word chars and an optional () substring
(?:\.\w+(?:\(\))?)* - 0 or more repetitions of . and then 1+ word chars and an optional () substring
} - a } char.
See the Java demo:
String s = "${test.one}${test.two}\n${test.one}${test.two()}\n${test.one}${\"hello\"}";
Pattern pattern = Pattern.compile("\\$\\{(\"[^\"]*\"|\\w+(?:\\(\\))?(?:\\.\\w+(?:\\(\\))?)*)}");
Matcher matcher = pattern.matcher(s);
while (matcher.find()){
System.out.println(matcher.group(1));
}
Output:
test.one
test.two
test.one
test.two()
test.one
"hello"
You could use the regular expression
(?<=\$\{")[a-z]+(?="\})|(?<=\$\{)[a-z]+\.[a-z]+(?:\(\))?(?=\})
which has no capture groups. The characters classes [a-z] can be modified as required provided they do not include a double-quote, period or right brace.
Demo
Java's regex engine performs the following operations.
(?<=\$\{") # match '${"' in a positive lookbehind
[a-z]+ # match 1+ lowercase letters
(?="\}) # match '"}' in a positive lookahead
| # or
(?<=\$\{) # match '${' in a positive lookbehind
[a-z]+ # match 1+ lowercase letters
\.[a-z]+ # match '.' followed by 1+ lowercase letters
(?:\(\))? # optionally match `()`
(?=\}) # match '}' in a positive lookahead

How do I write a multi-regex line?

I'm trying to write a line of regex that performs the following:
A string variable that can contain only:
The letters a to z (upper and lowercase) (zero or many times)
The hyphen character (zero or many times)
The single quote character (zero or one time)
The space character (zero or one time)
Tried searching through many regex websites
.matches("([a-zA-Z_0-9']*(\\s)?)(-)?"))
This allows close to what I want, however you cant start typing a-z anymore after you have typed in space character. So it's sequential in a way. I want the validation to allow for any sequence of those factors.
Expected:
Allowed to type a string that has any amount of a-zA-Z, zero to one space, zero to one dash, anywhere throughout the string.
This is a validation for that
"^(?!.*\\s.*\\s)(?!.*'.*')[a-zA-Z'\\s-]*$"
Expanded
^ # Begin
(?! .* \s .* \s ) # Max single whitespace
(?! .* ' .* ' ) # Max single, single quote
[a-zA-Z'\s-]* # Optional a-z, A-Z, ', whitespace or - characters
$ # End
I guess,
^(?!.*([ ']).*\\1)[A-Za-z' -]*$
might work OK.
Here,
(?!.*([ ']).*\\1)
we are trying to say that, if there was horizontal space (\h) or single quote (') twice in the string, exclude those, which we would be then keeping only those with zero or one time of repetition.
Test
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegularExpression{
public static void main(String[] args){
final String regex = "^(?!.*([ ']).*\\1)[A-Za-z' -]*$";
final String string = "abcAbc- ";
final Pattern pattern = Pattern.compile(regex);
final Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
System.out.println("Full match: " + matcher.group(0));
for (int i = 1; i <= matcher.groupCount(); i++) {
System.out.println("Group " + i + ": " + matcher.group(i));
}
}
}
}
Output
Full match: abcAbc-
Group 1: null
If you wish to simplify/modify/explore the expression, it's been explained on the top right panel of regex101.com. If you'd like, you can also watch in this link, how it would match against some sample inputs.
RegEx Circuit
jex.im visualizes regular expressions:

Regex to find a string containing more than a single whitespace with no leading/trailing whitespace

Currently i have
Pattern p = Pattern.compile("\s");
boolean invalidChar = p.matcher(text).find();
I want it to return true only when i have more than a single whitespace.
Also there should not be any whitespace in the beginning or ending of string.
So some valid/invalid text would be
12 34 56 = valid
ab-34 56 = valid
ab 34 = invalid
12 34 53 = invalid
Without regex..
public class Answ {
public static boolean isValid(String s) {
return !s.contains(" "); //two white spaces
}
public static void main(String[] args) {
String st1 = "12 34 56";
System.out.println(isValid(st1));
}
}
Try this:
(^\s{1,}|\s{2,}|\s$)
Final:
Pattern p = Pattern.compile("(^\s{1,}|\s{2,}|\s$)");
Since there can't be whitespace at the start and end of the string, and there cannot be two or more consecutive whitespaces inside, you may use
boolean isValid = s.matches("\\S+(?:\\s\\S+)*");
This expression will match the following:
^ (implicit in matches that anchors the match by default, i.e. the whole string must match the regex pattern) - the start of the string
\S+ - 1 or more chars other than whitespaces
(?:\s\S+)* - zero or more sequences of:
\s - a single whitespace
\S+ - 1 or more chars other than whitespaces
$ (implicit in matches) - the end of the string.
See the regex demo.
You can use this pattern:
Pattern p = Pattern.compile("(?<!\\S)(?!\\S)");
Matcher m = p.matcher(text);
boolean invalidChar = m.find();
or boolean isValid = !m.find(), as you want.
Where (?<!\\S) means "not preceded by a non-whitespace" (that includes a preceding whitespace or the start of the string) and (?!\\S) "not followed by a non-whitespace" (that includes a following whitespace or the end of the string).
These two lookarounds describe all possible cases:
successive white-spaces (matches the position between the first two white-spaces)
white-space at the beginning or at the end
empty string
Try this:
boolean invalidChar = text.matches("\\S(?!.*\\s\\s).*\\S");
Explanation:
\\S - the match begins with a non-whitespace character
(?!.*\\s\\s) - negative lookahead assertion to ensure there are no instances of two whitespace characters next to each other
.* - matches 0 or more of any character
\\S - the match ends with a non-whitespace character
Note: the matches("regex") method returns true only if the regex matches the entire text string.

Java Regex "-[0-9]{0,}" seems to match "-abc"

Regex:
"-[0-9]{0,}"
String:
"-abc"
According to the test here, that should not happen. I assume I'm doing something wrong in my code.
Code:
public static void main(String[] args) {
String s = "-abc";
String regex = "-[0-9]{0,}";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(s);
while (matcher.find()) {
if (matcher.group().length() == 0)
break;
// get the number less the dash
int beginIndex = matcher.start();
int endIndex = matcher.end();
String number = s.substring(beginIndex + 1, endIndex);
s = s.replaceFirst(regex, "negative " + number);
}
System.out.println(s);
}
Some context: The speech synthesis program I use cannot pronounce numbers with a leading negative sign, so it must be replaced with the word "negative".
-[0-9]{0,}
means your sting must have -, then could be 0 or more numbers.
so -abc is 0 number case
you didn't specify ^ and $, so your regex matches foo-bar or lll-0 even abc- as well
{0,} has exactly the same meaning as *. You regexp thus means "a dash that can be followed by digits". -abc contains a dash, so the pattern get found.
-\d+ should suit your needs better (don't forget to escape the backslash for java: -\\d+).
If you want the whole string to match the pattern, anchor your regexp with ^ and $: ^-\d+$.

Non-greedy Regular Expression in Java

I have next code:
public static void createTokens(){
String test = "test is a word word word word big small";
Matcher mtch = Pattern.compile("test is a (\\s*.+?\\s*) word (\\s*.+?\\s*)").matcher(test);
while (mtch.find()){
for (int i = 1; i <= mtch.groupCount(); i++){
System.out.println(mtch.group(i));
}
}
}
And have next output:
word
w
But in my opinion it must be:
word
word
Somebody please explain me why so?
Because your patterns are non-greedy, so they matched as little text as possible while still consisting of a match.
Remove the ? in the second group, and you'll get
word
word word big small
Matcher mtch = Pattern.compile("test is a (\\s*.+?\\s*) word (\\s*.+\\s*)").matcher(test);
By using \\s* it will match any number of spaces including 0 spaces. w matches (\\s*.+?\\s*). To make sure it matches a word separated by spaces try (\\s+.+?\\s+)

Categories