Difference between "\\d+" and "\\d++" in java regex [duplicate] - java

This question already has answers here:
What is the difference between [0-9]+ and [0-9]++?
(2 answers)
Closed 2 years ago.
In java, what's the difference between "\\d+" and "\\d++"?
I know ++ is a possessive quantifier, but what's the difference in matching the numeric string?
What string can match "\\d+" but can't with "\\d++"?
Possessive quantifier seems to be significant with quantifier ".*" only. Is it true?

Possessive quantifiers will not back off, even if some backing off is required for the overall match.
So, for example, the regex \d++0 can never match any input, because \d++ will match all digits, including the 0 needed to match the last symbol of the regex.

\d+ Means:
\d means a digit (Character in the range 0-9), and + means 1 or more times. So, \d+ is 1 or more digits.
\d++ Means from Quantifiers
This is called the possessive quantifiers and they always eat the entire input string, trying once (and only once) for a match. Unlike the greedy quantifiers, possessive quantifiers never back off, even if doing so would allow the overall match to succeed.

Related

What does regular expression (?<=[\\S])[\\S]*\\s* do? [duplicate]

This question already has an answer here:
Learning Regular Expressions [closed]
(1 answer)
Closed 3 years ago.
I saw a regex expression in this other stackoverflow question but I didn't understand the meaning of each part.
String[] split = s.split("(?<=[\\S])[\\S]*\\s*");
The result of this is the Acronym of a sentence.
In order to understand a chaining regex expression should I start reading it from left to right or viceversa? How can I identify (or limit) each part?
Thank you for your answers.
(?<=[\\S]) states that the match should be preceded by \\S, that is, anything except for a space.
[\\S]* states that the regex should match zero or more non-space characters
\\s* matches zero or more spaces.
In essence, the regex finds a non-space character, and matches all non-space characters in front of it, along with the spaces after them.
The regex matches ohandas<space><space> and aramchand<space> from Mohandas Karamchand G
Thus, after using these matches to split the string, you end up with {"M", "K", "G"}
Note the two spaces that the regex matches after Mohandas, because the \\s* part matches zero or more spaces
To clarify suspircius regular expression you may use the websites https://regexr.com/ or https://regex101.com/
Both mark parts with colors and explain what they do. But you have to replace the double backslashes by single backslashes.

Range of characters in regex [duplicate]

This question already has an answer here:
Restricting character length in a regular expression
(1 answer)
Closed 4 years ago.
Regex: ^[a-zA-Z]+(?:[\\s'.-]*[a-zA-Z]+)*$
I want add another validation on it i.e. minimum 3 characters and maximum 15 characters.
Regex: ^([a-zA-Z]+(?:[\\s'.-]*[a-zA-Z]+)*){3,28}$
This is validating for minimum characters but not for maximum characters.
Any help is appreciated.
You could use a positive lookahead (?=.{3,15}$ to check if the string has a length from 3 - 15 characters.
Because the minimum length of the string is 3 and has to start and end with a-zA-Z you can combine the 2 character classes in the middle in this case.
I think your pattern could be simplified by removing the repetition of the group due to the positive lookahead to:
^(?=.{3,15}$)[a-zA-Z]+[\\s'.a-zA-Z-]*[a-zA-Z]+$
Explanation
^ Start of the string
(?=.{3,15}$) Positive lookahead to assert the lenght 3-15
[a-zA-Z]+ Match 1+ times a lower/upper case char a-z
[\\s'.a-zA-Z-]* Charater class to match any of the listed 0+ times
[a-zA-Z]+ Match 1+ times a lower/upper case char a-z
$ End of the string
See the Java demo

Regex for arithmetic expression [duplicate]

This question already has answers here:
In a java regex, how can I get a character class e.g. [a-z] to match a - minus sign?
(5 answers)
Closed 5 years ago.
The regex -?\d+ [+|-|*|/] -?\d+ matches expression 1 + 3 without any problems also 1 + -2 without any problems, but I don't know why it does not match 1 - 2. Could you explaing why it does not match the - char correctly?
By my regex I wanted to achieve:
optional - at the beginning
string of digits
whitespace then operator then whitespace
optional - before second stringof digits
A - unescaped in the middle of a character class creates a range. You can escape it or move it to the start or end of the character class. You also don't need/want the |s I'd guess.
You currently make a range between | and | which doesn't really make sense. You also could just use grouping instead of a character class.
(\+|-|\*|/)
With this approach the + and * need to be escaped because they are quantifiers when outside a character class.

What exactly does .*? do in regex? ".*?([a-m/]*).*" [duplicate]

This question already has answers here:
Greedy vs. Reluctant vs. Possessive Qualifiers
(7 answers)
Closed 9 years ago.
For ".*?([a-m/]*).*" matching the string "fall/2005", I thought the ".*" will match any character 0 or more times. However, since there is a ? following .*, it only matches for 0 or 1 repetitions. So I thought .*? will match 'f' but I'm wrong.
What is wrong in my logic?
The ? here acts as a 'modifier' if I can call it like that and makes .* match the least possible match (termed 'lazy') until the next match in the pattern.
In fall/2005, the first .*? will match up to the first match in ([a-m/]*), which is just before f. Hence, .*? matches 0 characters so that ([a-m/]*) will match fall/ and since ([a-m/]*) cannot match anymore, the next part of the pattern .* matches what's left in the string, meaning 2005.
In contrast to .*([a-m/]*).*, you would have .* match as much as possible first (meaning the whole string) and try to go back to make the other terms match. Except that the problem is with the other quantifiers being able to match 0 characters as well, so that .* alone will match the whole string (termed 'greedy').
Maybe a different example will help.
.*ab
In:
aaababaaabab
Here, .* will match as much characters as possible and then try to match ab. Thus, .* will match aaababaaab and the remainder will be matched by ab.
.*?ab
In:
aaababaaabab
Here, .*? will match as little as possible until it can match ab in that regex. The first occurrence of ab is here:
aaababaaabab
^^
And so, .*? matches aa while ab will match ab.
In regex:
? : Occurs no or one times, ? is short for {0,1}
*? : ? after a quantifier makes it a reluctant quantifier, it tries to find the smallest match.
Suppose if you have a string input like this
this is stackoverflow
and you use regex
.*
so output will be
this is stackoverflow
but if you use regex
.*?
your out put will be
this
So from the above example it is clear that if you use .* it will give you whole string.
to prevent this if you want only first cherector before space you should use .*?
For more practical knowledge you can check http://regexpal.com/
The ? (question mark) is considered lazy here or so called not greedy.
Read Greedy vs. reluctant vs. possessive quantifiers
Your regular expression:
.*? any character except newline \n (0 or more times)
(matching the least amount possible)
( group and capture to \1:
[a-m/]* any character of: 'a' to 'm', '/' (0 or more times)
(matching the most amount possible)
) end of \1
.* any character except newline \n (0 or more times)
(matching the most amount possible)

Why does my regex containing \d{1,} together with a negative lookahead still match, where it shouldn't?

I'm trying to match a coordinate pair in a String using a Regex in Java. I explicitly want to exclude strings using negative lookahead.
to be matched:
558,228
558,228,
558,228,589
558,228,A,B,C
NOT to be matched:
558,228,<Text>
The Regex ^558,228(?!,<).* does the job, while ^\d{1,},\d{1,}(?!,<).* doesn't. It's the same regex with the metacharacter \d instead of values. Any ideas why?
The reason is the .* part at the end. It matches everything that wasn't matched earlier.
In combination with \d{1,}, which allows to match less than 3 digits, it will go like this:
^\d{1,},\d{1,}(?!,<) will match 558,22 and .* will match the remaining part 8,<Text>.
The problem is the \d{1,} part in combination with the .* at the end.
In your case
558,228,<Text>
The ^\d{1,},\d{1,}(?!,<) matches ">558,22" and the .* matches the rest "8,<Text>"
You can solve this using the possessive quanitifier ++
^\d+,\d++(?!,<)(.*)
See it here online on Regexr
\d++ is a seldom used possessive quantifier, which is here useful. ++ means match at least once as many as you can and do not backtrack. That means it will not give back the digits once it has found them.
Java Quantifier tutorial

Categories