Regex for numbers - java

Im trying to create a regex of numbers where 7 should appear atleast once and it shouldn't include 9
/[^9]//d+
I'm not sure how to do make it include 7 at least once
Also, it fails for the following example
123459, it accepts the string, even tho, there is a 9 included in there
However, if my string is 95, it rejects it, which is right

Code
Method 1
See regex in use here
(?=\d*7)(?!\d*9)\d+
Method 2
See regex in use here
\b(?=\d*7)[0-8]+\b
Note: This method uses fewer steps (170) as opposed to Method 1 with 406 steps.
Alternatively, you can also replace [0-8] with [^9\D] as seen here, which is basically saying don't match 9 or \D (any non-digit character).
You can also use \b(?=[^7\D]*7)[0-8]+\b as seen here, which brings the number of steps down from 170 to 147.
Method 3
See regex in use here
\b[0-8]*7[0-8]*\b
Note: This method uses few steps than both methods above at 139 steps. The only issue with this regex is that you need to identify valid characters in multiple locations in the pattern.
Results
Input
**VALID**
123456780
7
1237412
**INVALID**
9
12345680
1234567890
12341579
Output
Note: Shown below are strings that match.
123456780
7
1237412
Explanation
Method 1
(?=\d*7) Positive lookahead ensuring what follows is any digit any number of times, followed by 7 literally
(?!\d*9) Negative lookahead ensuring what follows is not any digit any number of times, followed by 9 literally
\d+ Any digit one or more times
Method 2
\b Assert the position as a word boundary
(?=\d*7) Positive lookahead ensuring what follows is any digit any number of times, followed by 7 literally
[0-8]+ Match any character present in the set 0-8
\b Assert the position as a word boundary
Method 3
\b Assert the position as a word boundary
[0-8]* Match any digit (except 9) any number of times
7 Match the digit 7 literally
[0-8]* Match any digit (except 9) any number of times
\b Assert the position as a word boundary

One way to do it would be to use several lookaheads:
(?=[^7]*7)(?!.*9)^\d+$
See a demo on regex101.com.
Note that you need to double escape the backslashes in Java, so that it becomes:
(?=[^7]*7)(?!.*9)^\\d+$

This has got a bit complex but it works for your use case :
(?=.*^[0-68-9]*7[0-68-9]*$)(?=^(?:(?!9).)*$).*$
First expression matches exactly one occurence of 7, accepts just numbers and second expression tests non-occurence of 9.
Try here : https://regex101.com/r/5OHgIr/1

If I find out correctly, you need a regex that accept all numbers that include at least one 7 and exclude 9. so try this:
(?:[0-8]*7[0-8]*)+
If you want found only numbers in a normal text add \s first and last of regex.

Related

Please explain the output this regex (starts with a positive lookahead)

Pattern p = Pattern.compile("(?=[1-9][0-9]{2})[0-9]*[05]");
Matcher m = p.matcher("101");
while(m.find()){
System.out.println(m.start()+":"+ m.end()+ m.group());
}
Output------ >> 0:210
Please let me know why I am getting output of m.group() as 10 here.
As far as I understand m.group() should return nothing because [05] matches to nothing.
Your Pattern, (?=[1-9][0-9]{2})[0-9]*[05] consists of 2 parts:
(?=[1-9][0-9]{2})
and
[0-9]*[05]
The first part is a zero-width positive lookahead which searches for a number of length 3, and the first can not be 0. This matches your 101.
The second part searches for any amount of numbers and then a 0 or a 5. This matches the first 2 characters of 101, thus the result is 10.
See Java - Pattern for more information.
What your Regex is looking for is:
[1-9]:
match a single character present in the list below
1-9 a single character in the range between 1 and 9
[0-9]{2}:
match a single character present in the list below
Quantifier: {2} Exactly 2 times
0-9 a single character in the range between 0 and 9
[0-9]*:
match a single character present in the list below
Quantifier: * Between zero and unlimited times, as many times as possible, giving back as needed [greedy]
0-9 a single character in the range between 0 and 9
[05]:
match a single character present in the list below
05 a single character in the list 05 literally
for the String "101" this nacht the first 2 chars 101,
so you are printing.out:
System.out.println(**m.start()**+":"+ **m.end()**+ m.group());
where m.start() returns the start index of the previous match(char at
0). where m.end() returns the offset after the last character matched.
and where m.group() returns the input subsequence matched by the previous
match.
That regex was meant to match a number that's a multiple of 5 and greater than or equal to 100, but it's useless without anchors. It should be:
^(?=[1-9][0-9]{2}$)[0-9]*[05]$
The anchors ensure that both the lookahead and the main part are examining the whole of the string. But the task doesn't require a lookahead anyway; this works just fine:
^[1-9][0-9][05]$
As #AlanMoore states, there has to be an alignment.
Assertions are a self contained entity, all they have to do is Pass
to advance to the next construct.
Lets see what (?=[1-9][0-9]{2}) matches;
1111111110666
2222222222222222225666
33333333333333333333333330666
So far so good, on to the next construct.
Lets see what [0-9]*[05] matches.
What ever this matches is the final answer.
1111111110666
2222222222222222225666
33333333333333333333333330666
What to learn is that to get a cohesive answer, assertions have to be crafted to
coincide with constructs that come after them.
Here is an example of a constraint that could be applied
after the assertion.
The assertion state's it needs three digits and the first digit must be >= 1.
The constructs after the assertion state it can be any number of digit's,
as long as it ends with a 0 or 5.
This last part is distressing since it will match only the 500000
So for sure, you need at least three digits.
That can be done like this:
[0-9]{2,}[05]
This says two things
There must be at least three digits, but can be more
It must end with a 0 or 5.
That's it, put it all together, its:
(?=[1-9][0-9]{2})[0-9]{2,}[05]
Of course, this can be condensed to;
[1-9][0-9]+[05]

Java Regex: Optional Matching

I've been using the following Regex to extract a zip code from a bunch of text:
"\\d{5}\\-?[1-9]?[1-9]?[1-9]?[1-9]?"
My intention of making the last 4 [1-9] optional (using ? ) was to be able to extract both 5 digit zip codes and 5 digit zip codes with + 4 such as 11001-1010.
However, it only matches the first two digits of the last four numbers even though I put 4 digits at the end.
For example, in the zip code 11001-1010 it would match 11001-10.
Anyone know why?
Simple answer to question: For zip code 11001-1010 your regex would only match 11001-1 because the optional 4 digits after the - cannot be 0.
For the unstated question of how to fix that, it depends on whether you only want to match an optional +4, or you want to also match +3, +2, +1, and +0, like your expression would.
Matching Zip5 with optional +4, e.g. matching 11001-1010 and 11001:
"\\d{5}(?:-\\d{4})?"
Matching Zip5 with optional +N, e.g. matching 11001-1010, 11001-101, 11001-10, 11001-1, 11001-, and 11001:
"\\d{5}(?:-\\d{0,4})?"
Update
Now, if you want to make sure it doesn't match the 56789-1234 of 123456789-123456789 or abcd56789-1234qwerty, you can add a word-boundary check:
"\\b\\d{5}(?:-\\d{4})?\\b"
It's stopping at the first 0 in the suffix,
"\d{5}\-?[1-9]?[1-9]?[1-9]?[1-9]?"
So in your example, it only matches up to 11001-1
Does "\d{5}\-?[0-9]?[0-9]?[0-9]?[0-9]?" work ok?
The other answers are probably cleaner, but that is the bug.
Looks ok per this
You can use \\d{5}\\-\\d{0,4} which allows you to match 0 to 4 digits after -.
EDIT
From the comment : But then the - won’t be optional.
For that you can use \\d{5}(\\-\\d{0,4})? to make group of - and digits after dash optional.

Java Regex negative lookahead wrong match

I'm looking for strings where the two first digits are present (in any order) in the digits that follow the space character.First I tried
(\d)(\d)\s\d*(\1|\2)\d*[\1\2&&[^\3]][\d]*
but it seems that I can't use brackets with backreferences.I tried using the lookahead feature instead with
(\d)(\d)\s\d*(\1|\2)\d*(?!\3(\1|\2))\d*
but I isn't right.The idea was "look for two digits, followed by a space, followed by zero or more digits, followed by either of the captured digits, followed by zero or more digits, followed by one of the captured digits which ISN'T the one I got before, followed by zero or more digits".21 20329 is a match.Why?How do I look for the strings I need?
This is simpler.
^(\d)(\d) (?=.*?\1)(?=.*?\2)\d+
See demo
The first lookahead ensures that the digit captured by Group 1 is present somewhere later in the string.
The second lookahead ensures that the digit captured by Group 2 is present somewhere later in the string.
If these conditions are met, the \d+ eats up all the digits after the space.

Set minimum and maximum characters in a regular expression

I've written a regular expression that matches any number of letters with any number of single spaces between the letters. I would like that regular expression to also enforce a minimum and maximum number of characters, but I'm not sure how to do that (or if it's possible).
My regular expression is:
[A-Za-z](\s?[A-Za-z])+
I realized it was only matching two sets of letters surrounding a single space, so I modified it slightly to fix that. The original question is still the same though.
Is there a way to enforce a minimum of three characters and a maximum of 30?
Yes
Just like + means one or more you can use {3,30} to match between 3 and 30
For example [a-z]{3,30} matches between 3 and 30 lowercase alphabet letters
From the documentation of the Pattern class
X{n,m} X, at least n but not more than m times
In your case, matching 3-30 letters followed by spaces could be accomplished with:
([a-zA-Z]\s){3,30}
If you require trailing whitespace, if you don't you can use: (2-29 times letter+space, then letter)
([a-zA-Z]\s){2,29}[a-zA-Z]
If you'd like whitespaces to count as characters you need to divide that number by 2 to get
([a-zA-Z]\s){1,14}[a-zA-Z]
You can add \s? to that last one if the trailing whitespace is optional. These were all tested on RegexPlanet
If you'd like the entire string altogether to be between 3 and 30 characters you can use lookaheads adding (?=^.{3,30}$) at the beginning of the RegExp and removing the other size limitations
All that said, in all honestly I'd probably just test the String's .length property. It's more readable.
This is what you are looking for
^[a-zA-Z](\s?[a-zA-Z]){2,29}$
^ is the start of string
$ is the end of string
(\s?[a-zA-Z]){2,29} would match (\s?[a-zA-Z]) 2 to 29 times..
Actually Benjamin's answer will lead to the complete solution to the OP's question.
Using lookaheads it is possible to restrict the total number of characters AND restrict the match to a set combination of letters and (optional) single spaces.
The regex that solves the entire problem would become
(?=^.{3,30}$)^([A-Za-z][\s]?)+$
This will match AAA, A A and also fail to match AA A since there are two consecutive spaces.
I tested this at http://regexpal.com/ and it does the trick.
You should use
[a-zA-Z ]{20}
[For allowed characters]{for limiting of the number of characters}

Validating numeric input using RegEx

I'm trying to create a regular expression in Java to validate a number with the following constraints:
The number can be of any length but can only contain digits
First digit can be 0 - 9
Subsequent digits can be 0 - 9, but one of the digits must be non-zero.
For example: 042004359 is valid, but 0000000000 is not.
\\d+[1-9]\\d* should work, I'd think.
This should do what you need:
/^(?=.*[1-9])([0-9]+)$/
Whilst it matches all of digits [0-9] it contains a lookahead that makes sure there is at least one of [1-9].
I am fairly certain that Java allows can use lookaheads.
EDIT: This regular expression test page seems to imply that it can.
EDIT: If 0 is valid, then you can use this:
^((?=.*[1-9])([0-9]+)|0)$
This will make an exception for 0 on its own (notice the OR operator).
^(\d{1})(\d*?[1-9]{1}\d*)*$
^(\d{1}) - Line must start with 1 digit
(\d*?[1-9]{1}\d*)*$ - Line must end with zero or more 0-9 digits(? for conservative), then 1 1-9 digit, then zero or more digits. This pattern can repeat zero or more times.
Works with:
100000
100100
1010200
1
2
Maybe this is too complicated, lol.
Here's one solution using lookarounds: (?<=\D|^)\d+(?=[1-9])\d*
(?<=\D|^) # lookbehind for non-digit or beginning of line
\d+ # match any number of digits 0-9
(?=[1-9]) # but lookahead to make sure there is 1-9
\d* # then match all subsequent digits, once the lookahead is satisfied

Categories