Java Regex negative lookahead wrong match - java

I'm looking for strings where the two first digits are present (in any order) in the digits that follow the space character.First I tried
(\d)(\d)\s\d*(\1|\2)\d*[\1\2&&[^\3]][\d]*
but it seems that I can't use brackets with backreferences.I tried using the lookahead feature instead with
(\d)(\d)\s\d*(\1|\2)\d*(?!\3(\1|\2))\d*
but I isn't right.The idea was "look for two digits, followed by a space, followed by zero or more digits, followed by either of the captured digits, followed by zero or more digits, followed by one of the captured digits which ISN'T the one I got before, followed by zero or more digits".21 20329 is a match.Why?How do I look for the strings I need?

This is simpler.
^(\d)(\d) (?=.*?\1)(?=.*?\2)\d+
See demo
The first lookahead ensures that the digit captured by Group 1 is present somewhere later in the string.
The second lookahead ensures that the digit captured by Group 2 is present somewhere later in the string.
If these conditions are met, the \d+ eats up all the digits after the space.

Related

Why does backreferencing capturing groups work for multiple digit numbers in Java?

Let's say that you have a string:
String string = "ab #1?AZa$ab #1?AZa$"
You're trying to verify that the tenth is a non-whitespace character, and that the twentieth character is the same as the tenth. Furthermore, there is corresponding verification with the 1st and 11th, the 2nd and 12th, the 3rd and 13th, etc. each with their own separate requirements (the full list is here) so you have to use 10 capturing groups. I found that the following regex still works to validate the aforementioned string:
string.matches("^([a-z])(\\w)(\\s)(\\W)(\\d)(\\D)([A-Z])([a-zA-Z])([aeiouAEIOU])(\\S)\\1\\2\\3\\4\\5\\6\\7\\8\\9\\10$") //returns true
My question regards the last backreference:
\\10
Shouldn't this be interpreted as "match with the first character" and then "match with 0" (the digit)? I don't see how this is interpreted as "match with the tenth character" without somehow grouping the 1 and 0 together into 10. Puzzlingly, surrounding the 1 and 0 with parentheses does not work.
The behavior for Java is documented in Pattern:
In this class, \1 through \9 are always interpreted as back references, and a larger number is accepted as a back reference if at least that many subexpressions exist at that point in the regular expression, otherwise the parser will drop digits until the number is smaller or equal to the existing number of groups or it is one digit.

Regex format for a particular Match

I am trying to write a regex for the following format
PA-123456-067_TY
It's always PA, followed by a dash, 6 digits, another dash, then 3 digits, and ends with _TY
Apparently, when I write this regex to match the above format it shows the output correctly
^[^[PA]-]+-(([^-]+)-([^_]+))_([^.]+)
with all the Negation symbols ^
This does not work if I write the regex in the below format without negation symbols
[[PA]-]+-(([-]+)-([_]+))_([.]+)
Can someone explain to me why is this so?
The negation symbol means that the character cannot be anything within the specified class. Your regex is much more complicated than it needs to be and is therefore obfuscating what you really want.
You probably want something like this:
^PA-(\d+)-(\d+)_TY$
... which matches anything that starts with PA-, then includes two groups of numbers separated by a dash, then an underscore and the letters TY. If you want everything after the PA to be what you capture, but separated into the three groups, then it's a little more abstract:
^PA-(.+)-(.+)_(.+)$
This matches:
PA-
a capture group of any characters
a dash
another capture group of any characters
an underscore
all the remaining characters until end-of-line
Character classes [...] are saying match any single character in the list, so your first capture group (([^-]+)-([^_]+)) is looking for anything that isn't a dash any number of times followed by a dash (which is fine) followed by anything that isn't an underscore (again fine). Having the extra set of parentheses around that creates another capture group (probably group 1 as it's the first parentheses reached by the regex engine)... that part is OK but probably makes interpreting the answer less intuitive in this case.
In the re-write however, your first capture group (([-]+)-([_]+)) matches [-]+, which means "one or more dashes" followed by a dash, followed by any number of underscores followed by an underscore. Since your input does not have a dash immediately following PA-, the entire regex fails to find anything.
Putting the PA inside embedded character classes is also making things complicated. The first part of your first one is looking for, well, I'm not actually sure how [^[PA]-]+ is interpreted in practice but I suspect it's something like "not either a P or an A or a dash any number of times". The second one is looking for the opposite, I think. But you don't want any of that, you just want to start without anything other than the actual sequence of characters you care about, which is just PA-.
Update: As per the clarifications in the comments on the original question, knowing you want fixed-size groups of digits, it would look like this:
^PA-(\d{6})-(\d{3})_TY$
That captures PA-, then a 6-digit number, then a dash, then a 3-digit number, then _TY. The six digit number and 3 digit numbers will be in capture groups 1 and 2, respectively.
If the sizes of those numbers could ever change, then replace {x} with + to just capture numbers regardless of max length.
according to your comment this would be appropriate PA-\d{6}-\d{3}_TY
EDIT: if you want to match a line use it with anchors: ^PA-\d{6}-\d{3}_TY$

How to match a string in this way?

I need to check if a String matches this specific pattern.
The pattern is:
(Numbers)(all characters allowed)(numbers)
and the numbers may have a comma ("." or ",")!
For instance the input could be 500+400 or 400,021+213.443.
I tried Pattern.matches("[0-9],?.?+[0-9],?.?+", theequation2), but it didn't work!
I know that I have to use the method Pattern.match(regex, String), but I am not being able to find the correct regex.
Dealing with numbers can be difficult. This approach will deal with your examples, but check carefully. I also didn't do "all characters" in the middle grouping, as "all" would include numbers, so instead I assumed that finding the next non-number would be appropriate.
This Java regex handles the requirements:
"((-?)[\\d,.]+)([^\\d-]+)((-?)[\\d,.]+)"
However, there is a potential issue in the above. Consider the following:
300 - -200. The foregoing won't match that case.
Now, based upon the examples, I think the point is that one should have a valid operator. The number of math operations is likely limited, so I would whitelist the operators in the middle. Thus, something like:
"((-?)[\\d,.]+)([\\s]*[*/+-]+[\\s]*)((-?)[\\d,.]+)"
Would, I think, be more appropriate. The [*/+-] can be expanded for the power operator ^ or whatever. Now, if one is going to start adding words (such as mod) in the equation, then the expression will need to be modified.
You can see this regular expression here
In your regex you have to escape the dot \. to match it literally and escape the \+ or else it would make the ? a possessive quantifier. To match 1+ digits you have to use a quantifier [0-9]+
For your example data, you could match 1+ digits followed by an optional part which matches either a dot or a comma at the start and at the end. If you want to match 1 time any character you could use a dot.
Instead of using a dot, you could also use for example a character class [-+*] to list some operators or list what you would allow to match. If this should be the only match, you could use anchors to assert the start ^ and the end $ of the string.
\d+(?:[.,]\d+)?.\d+(?:[.,]\d+)?
In Java:
String regex = "\\d+(?:[.,]\\d+)?.\\d+(?:[.,]\\d+)?";
Regex demo
That would match:
\d+(?:[.,]\d+)? 1+ digits followed by an optional part that matches . or , followed by 1+ digits
. Match any character (Use .+) to repeat 1+ times
Same as the first pattern

Regex for numbers

Im trying to create a regex of numbers where 7 should appear atleast once and it shouldn't include 9
/[^9]//d+
I'm not sure how to do make it include 7 at least once
Also, it fails for the following example
123459, it accepts the string, even tho, there is a 9 included in there
However, if my string is 95, it rejects it, which is right
Code
Method 1
See regex in use here
(?=\d*7)(?!\d*9)\d+
Method 2
See regex in use here
\b(?=\d*7)[0-8]+\b
Note: This method uses fewer steps (170) as opposed to Method 1 with 406 steps.
Alternatively, you can also replace [0-8] with [^9\D] as seen here, which is basically saying don't match 9 or \D (any non-digit character).
You can also use \b(?=[^7\D]*7)[0-8]+\b as seen here, which brings the number of steps down from 170 to 147.
Method 3
See regex in use here
\b[0-8]*7[0-8]*\b
Note: This method uses few steps than both methods above at 139 steps. The only issue with this regex is that you need to identify valid characters in multiple locations in the pattern.
Results
Input
**VALID**
123456780
7
1237412
**INVALID**
9
12345680
1234567890
12341579
Output
Note: Shown below are strings that match.
123456780
7
1237412
Explanation
Method 1
(?=\d*7) Positive lookahead ensuring what follows is any digit any number of times, followed by 7 literally
(?!\d*9) Negative lookahead ensuring what follows is not any digit any number of times, followed by 9 literally
\d+ Any digit one or more times
Method 2
\b Assert the position as a word boundary
(?=\d*7) Positive lookahead ensuring what follows is any digit any number of times, followed by 7 literally
[0-8]+ Match any character present in the set 0-8
\b Assert the position as a word boundary
Method 3
\b Assert the position as a word boundary
[0-8]* Match any digit (except 9) any number of times
7 Match the digit 7 literally
[0-8]* Match any digit (except 9) any number of times
\b Assert the position as a word boundary
One way to do it would be to use several lookaheads:
(?=[^7]*7)(?!.*9)^\d+$
See a demo on regex101.com.
Note that you need to double escape the backslashes in Java, so that it becomes:
(?=[^7]*7)(?!.*9)^\\d+$
This has got a bit complex but it works for your use case :
(?=.*^[0-68-9]*7[0-68-9]*$)(?=^(?:(?!9).)*$).*$
First expression matches exactly one occurence of 7, accepts just numbers and second expression tests non-occurence of 9.
Try here : https://regex101.com/r/5OHgIr/1
If I find out correctly, you need a regex that accept all numbers that include at least one 7 and exclude 9. so try this:
(?:[0-8]*7[0-8]*)+
If you want found only numbers in a normal text add \s first and last of regex.

Validating numeric input using RegEx

I'm trying to create a regular expression in Java to validate a number with the following constraints:
The number can be of any length but can only contain digits
First digit can be 0 - 9
Subsequent digits can be 0 - 9, but one of the digits must be non-zero.
For example: 042004359 is valid, but 0000000000 is not.
\\d+[1-9]\\d* should work, I'd think.
This should do what you need:
/^(?=.*[1-9])([0-9]+)$/
Whilst it matches all of digits [0-9] it contains a lookahead that makes sure there is at least one of [1-9].
I am fairly certain that Java allows can use lookaheads.
EDIT: This regular expression test page seems to imply that it can.
EDIT: If 0 is valid, then you can use this:
^((?=.*[1-9])([0-9]+)|0)$
This will make an exception for 0 on its own (notice the OR operator).
^(\d{1})(\d*?[1-9]{1}\d*)*$
^(\d{1}) - Line must start with 1 digit
(\d*?[1-9]{1}\d*)*$ - Line must end with zero or more 0-9 digits(? for conservative), then 1 1-9 digit, then zero or more digits. This pattern can repeat zero or more times.
Works with:
100000
100100
1010200
1
2
Maybe this is too complicated, lol.
Here's one solution using lookarounds: (?<=\D|^)\d+(?=[1-9])\d*
(?<=\D|^) # lookbehind for non-digit or beginning of line
\d+ # match any number of digits 0-9
(?=[1-9]) # but lookahead to make sure there is 1-9
\d* # then match all subsequent digits, once the lookahead is satisfied

Categories