Java Regex ? (expr){num} confusion? - java

I'm trying to identify strings which contain exactly one integer.
That is exactly one string of contiguous digits e.g. "1234" (no dots, no commas).
So I thought this should do it: (This is with the Java String Escapes included):
(\\d+){1,}
So the "\d+" correctly a string of contiguous digits. (right?)
I included this expression as a sub-expression within "(" and ")" and then I'm trying to say "only one of these sub-expressions.
Here's the result of ( matcher.find() ) of checking various strings:
(note the regex from now on is'raw' here - NOT Java String Escaped).
Pattern:(\d+){1,}
Input String Result
1 true
XX-1234 true
do-not-match-no-integers false
do-not-match-1234-567 true
do-not-match-123-456 true
It seems the '1' in the pattern is applying to the "+\d" string, rather than the number of those contiguous strings.
Because if I change the number from 1 to 4; I can see the result change to the following:
Pattern:(\d+){4,}
Input String Result
1 false
XX-1234 true
do-not-match-no-integers false
do-not-match-1234-567 true
do-not-match-123-456 false
What am I missing here ?
Out of interest - if I take off the "(" and ")" altogether - I'm getting a different result again
Pattern:\d+{4,}
Input String Result
1 true
XX-1234 true
do-not-match-no-integers false
do-not-match-1234-567 true
do-not-match-123-456 true

Matcher.find() will try to find a match inside the String. You should try Matcher.matches() instead to see if the pattern fits in all the string.
In this way, the pattern you need is \d+
EDIT:
Seems that I misunderstood the question. One way to find if the String has only one integer, using the same pattern is:
int matchCounter = 0;
while (Matcher.find() || matchCounter < 2){
matchCounter++;
}
return matchCounter == 1

This is the regex:
^[^\d]*\d+[^\d]*$
That's zero or more non digits, followed by a substring of digits and then zero or more non digits again until the end of the string. Here is the java code (with escaped slashes):
class MainClass {
public static void main(String[] args) {
String regex="^[^\\d]*\\d+[^\\d]*$";
System.out.println("1".matches(regex)); // true
System.out.println("XX-1234".matches(regex)); // true
System.out.println("XX-1234-YY".matches(regex)); // true
System.out.println("do-not-match-no-integers".matches(regex)); // false
System.out.println("do-not-match-1234-567".matches(regex)); // false
System.out.println("do-not-match-123-456".matches(regex)); // false
}
}

You can use the RegEx ^\D*?(\d+)\D*?$
^\D*? makes sure there is no digits between the start of your line and your first group
(\d+) matches your digits
\D*?$ makes sure there is no digits between the your first group and the end of your line
Demo.
So, for your Java String, it would be : ^\\D*?(\\d+)\\D*?$

I think you will have to make sure your regex considers the entire string, using ^ and $.
To do that, you could match zero or more non-digits, followed by 1 or more digits, and then zero or more non-digits.
The following should do the trick:
^[^\d]*(\d+)[^\d]*$
Here it is on regex101.com: https://regex101.com/r/CG0RiL/2

Edit: As pointed out by Veselin Davidov my regex isn't correct.
If i understand you right you want it only to say true when the entire String matches the pattern. yes?
Then you have to call matcher.matches();
Also i think your pattern must be just \d+.
If you have problem with regex i can recommend you https://regex101.com/ it explains you why it matches something and gives you a quick preview.
I use it every time i have to write regex.

Related

String with N digits

I need some help with regex in Java.
We have string, and I want that String.matches give me "true" if our string contains N digits.
For example(N = 12):
+012345678900 - true
0123-4567-0000 - true;
but:
+0123456789 - false
0123-4567-000000 - false.
I tried this one (.*[0-9].*){N}and this one ^(.*[0-9].*){N}$. But it was incorrectly.
You may try this,
^(?:\\D*\\d){12}\\D*$
matches method won't need anchors, so
(?:\\D*\\d){12}\\D*
would be enough..
\\D matches any character but not of digit. So (?:\\D*\\d){12} ensures that there must be any no of non-dgit chars but it must contain exactly 12 digits. Last \\D* matches zero or more non-digit characters.

Why won't this string regex match?

I have a string and a simple pattern (a string with a wildcard). When I use the match function I would it expect it to return true for my text, but it doesn't it returns false.
String text = "test_1_2_3";
String pattern = "test_*"
text.matches(pattern);//this returns false
_* will matches the character _ literally between zero and more times ,instead you need .* that match any character between zero and more times:
"test_.*"
Demo
pattern = "test_*" means "test" and 0 or more "_"
Because your test_* pattern, combined with Matcher#matches, will match a whole input (i.e. from start to end), that matches the following conditions:
starts with test
followed by (and ending with) 0 instance of _, or more (greedy-quantified here).
Using Matcher#find would return true in this case, since it would match a partial test_.
So, your matches invocation would return true with the given Pattern, with inputs such as:
test_
test__
... and so on.
See API.
Your regexp will match test followed by zero or more '_' character.
I think you want this:
String text = "test_1_2_3";
String pattern = "test_.*";

How to match the first character in a String with a regexp?

I need a regular expression to evaluate if the first character of a word is a lowercase letter or not.
I have this java code: Character.toString(charcter).matches("[a-z?]")
For example if I have those words the result would be:
a13 => true
B54 => false
&32 => false
I want to match only one letter and I don't know if I need to use "?", "." or "{1}" after or inside "[a-z]"
There is a built in way to do this without regexes.
Character.isLowerCase(string.charAt(0))
Please use this for your needs: /^[a-z]/
You want to match if there's exactly one lowercase letter. As #Luiggi Medonza stated, you really do/should not need Regular Expressions for this, but if you want to use them, you most likely want this pattern:
[a-z]{1}
What ? does is an optional match. You want a strict match of length 1, so you need {1}.
#Ted Hopp mentioned that you don't need the {1}. Your entire match should look like this:
entire_string.matches("^[a-z].+$")
Again, using built-in string methods will be much faster/better to use.
Here I got similar requirement like in a string first character should alphabet from a-z or A-Z. than the user can type anything like number or some limited symbols.
Solution
public static boolean designationValidate(String n) {
int l = n.length();
if (l >= 4) {
Pattern pattern = Pattern.compile("^[a-zA-Z][a-zA-Z0-9-() ]*$");
Matcher matcher = pattern.matcher(n);
return (matcher.find() && matcher.group().equals(n));
} else
return false;
}
in above example I am validation minimum character should more than 3 length and start with alphabet. If you want any other symbols you can enter there.
The method will return true if expressions match otherwise return false.
May this will helpful for you.

Java regular expression: A-Z and - or _, but only once

I've only dabbled in regular expressions and was wondering if someone could help me make a Java regex, which matches a string with these qualities:
It is 1-14 characters long
It consists only of A-Z, a-z and the letters _ or -
The symbol - and _ must be contained only once (together) and not at the start
It should match
Hello-Again
ThisIsValid
AlsoThis_
but not
-notvalid
Not-Allowed-This
Nor-This_thing
VeryVeryLongStringIndeed
I've tried the following regex string
[a-zA-Z^\\-_]+[\\-_]?[a-zA-Z^\\-_]*
and it seems to work. However, I'm not sure how to do the total character limiting part with this approach. I've also tried
[[a-zA-Z]+[\\-_]?[a-zA-Z]*]{1,14}
but it matches (for example) abc-cde_aa which it shouldn't.
This ought to work:
(?![_-])(?!(?:.*[_-]){2,})[A-Za-z_-]{1,14}
The regex is quite complex, let my try and explain it.
(?![_-]) negative lookahead. From the start of the string assert that the first character is not _ or -. The negative lookahead "peeks" of the current position and checks that it doesn't match [_-] which is a character group containing _ and -.
(?!(?:.*[_-]){2,}) another negative lookahead, this time matching (?:.*[_-]){2,} which is a non capturing group repeated at least two times. The group is .*[_-], it is any character followed by the same group as before. So we don't want to see some characters followed by _ or - more than once.
[A-Za-z_-]{1,14} is the simple bit. It just says the characters in the group [A-Za-z_-] between 1 and 14 times.
The second part of the pattern is the most tricky, but is a very common trick. If you want to see a character A repeated at some point in the pattern at least X times you want to see the pattern .*A at least X times because you must have
zzzzAzzzzAzzzzA....
You don't care what else is there. So what you arrive at is (.*A){X,}. Now, you don't need to capture the group - this just slows down the engine. So we make the group non-capturing - (?:.*A){X,}.
What you have is that you only want to see the pattern once, so you want not to find the pattern repeated two or more times. Hence it slots into a negative lookahead.
Here is a testcase:
public static void main(String[] args) {
final String pattern = "(?![_-])(?!(?:.*[_-]){2,})[A-Za-z_-]{1,14}";
final String[] tests = {
"Hello-Again",
"ThisIsValid",
"AlsoThis_",
"_NotThis_",
"-notvalid",
"Not-Allow-This",
"Nor-This_thing",
"VeryVeryLongStringIndeed",
};
for (final String test : tests) {
System.out.println(test.matches(pattern));
}
}
Output:
true
true
true
false
false
false
false
false
Things to note:
the character - is special inside character groups. It must go at the start or end of a group otherwise it specifies a range
lookaround is tricky and often counter-intuitive. It will check for matches without consuming, allowing you to test multiple conditions on the same data.
the repetition quantifier {} is very useful. It has 3 states. {X} is repeated exactly X times. {X,} is repeated at least X times. And {X, Y} is repeated between X and Y times.
To check if string is in form XXX-XXX where -XXX or _XXX part is optional you can use
[a-zA-Z]+([-_][a-zA-Z]*)?
which is similar to what you already had
[[a-zA-Z]+[\\-_]?[a-zA-Z]*]
but you made crucial mistake and wrapped it entirely in [...] which makes it character class, and that is not what you wanted.
To check if matched part has only 1-14 length you can use look-ahead mechanism. Just place
(?=.{1,14}$)
at start of your regex to make sure that part from start of match till end of it (represented by $) contains of any 1-14 characters.
So your final regex can look like
String regex = "(?=.{1,14}$)[a-zA-Z]+([-_][a-zA-Z]*)?";
Demo
String [] data = {
"Hello-Again",
"ThisIsValid",
"AlsoThis_",
"-notvalid",
"Not-Allowed-This",
"Nor-This_thing",
"VeryVeryLongStringIndeed",
};
for (String s : data)
System.out.println(s + " : " + s.matches(regex));
Output:
Hello-Again : true
ThisIsValid : true
AlsoThis_ : true
-notvalid : false
Not-Allowed-This : false
Nor-This_thing : false
VeryVeryLongStringIndeed : false

Negative Lookaround Regex - Only one occurrence - Java

I am trying to find if a string contains only one occurrence of a word ,
e.g.
String : `jjdhfoobarfoo` , Regex : `foo` --> false
String : `wewwfobarfoo` , Regex : `foo` --> true
String : `jjfffoobarfo` , Regex : `foo` --> true
multiple foo's may happen anywhere in the string , so they can be non-consecutive,
I test the following regex matching in java with string foobarfoo, but it doesn't work and it returns true :
static boolean testRegEx(String str){
return str.matches(".*(foo)(?!.*foo).*");
}
I know this topic may seem duplicate , but I am surprised because when I use this regex : (foo)(?!.*foo).* it works !
Any idea why this happens ?
Use two anchored look-aheads:
static boolean testRegEx(String str){
return str.matches("^(?=.*foo)(?!.*foo.*foo.*$).*");
}
A couple of key points are that there is a negative look-ahead to check for 2 foo's that is anchored to start, and importantly containes an end of input.
If you want to check if a string contains another string exactly once, here are two possible solutions, (one with regex, one without)
static boolean containsRegexOnlyOnce(String string, String regex) {
Matcher matcher = Pattern.compile(regex).matcher(string);
return matcher.find() && !matcher.find();
}
static boolean containsOnlyOnce(String string, String substring) {
int index = string.indexOf(substring);
if (index != -1) {
return string.indexOf(substring, index + substring.length()) == -1;
}
return false;
}
All of them work fine. Here's a demo of your examples:
String str1 = "jjdhfoobarfoo";
String str2 = "wewwfobarfoo";
String str3 = "jjfffoobarfo";
String foo = "foo";
System.out.println(containsOnlyOnce(str1, foo)); // false
System.out.println(containsOnlyOnce(str2, foo)); // true
System.out.println(containsOnlyOnce(str3, foo)); // true
System.out.println(containsRegexOnlyOnce(str1, foo)); // false
System.out.println(containsRegexOnlyOnce(str2, foo)); // true
System.out.println(containsRegexOnlyOnce(str3, foo)); // true
You can use this pattern:
^(?>[^f]++|f(?!oo))*foo(?>[^f]++|f(?!oo))*$
It's a bit long but performant.
The same with the classical example of the ashdflasd string:
^(?>[^a]++|a(?!shdflasd))*ashdflasd(?>[^a]++|a(?!shdflasd))*$
details:
(?> # open an atomic group
[^f]++ # all characters but f, one or more times (possessive)
| # OR
f(?!oo) # f not followed by oo
)* # close the group, zero or more times
The possessive quantifier ++ is like a greedy quantifier + but doesn't allow backtracks.
The atomic group (?>..) is like a non capturing group (?:..) but doesn't allow backtracks too.
These features are used here for performances (memory and speed) but the subpattern can be replaced by:
(?:[^f]+|f(?!oo))*
The problem with your regex is that the first .* initially consumes the whole string, then backs off until it finds a spot where the rest of the regex can match. That means, if there's more than one foo in the string, your regex will always match the last one. And from that position, the lookahead will always succeed as well.
Regexes that you use for validating have to be more precise than the ones you use for matching. Your regex is failing because the .* can match the sentinel string, 'foo'. You need to actively prevent matches of foo before and after the one you're trying to match. Casimir's answer shows one way to do that; here's another:
"^(?>(?!foo).)*+foo(?>(?!foo).)*+$"
It's not quite as efficient, but I think it's a lot easier to read. In fact, you could probably use this regex:
"^(?!.*foo.*foo).+$"
It's a great deal more inefficient, but a complete regex n00b would probably figure out what it does.
Finally, notice that none of theses regexes--mine or Casimir's--uses lookbehinds. I know it seems like the perfect tool for the job, but no. In fact, lookbehind should never be the first tool you reach for. And not just in Java. Whatever regex flavor you use, it's almost always easier to match the whole string in the normal way than it is to use lookbehinds. And usually much more efficient, too.
Someone answered the question, but deleted it ,
The following short code works correctly :
static boolean testRegEx(String str){
return !str.matches("(.*?foo.*){0}|(.*?foo.*){2,}");
}
Any idea on how to invert the result inside the regex itself ?

Categories