RegEx - Testing for 9 adjacent identical numbers - java

I am trying to test a string (in java) for 9 adjacent identical numbers ... I can test for adjacent identical numbers - but only the next adjacent number ....
boolean result = string.matches("/([0-9])\1/g");
I want to match 9 characters - anyone able to help me ?
Thanks
EDIT : Some examples
"1111111111" should match
"222222222" should match
"3311111111133" should match
"1234567890" should not match

Try this regex: ([0-9])\1{8}

Java Greedy quantifier:
X{n} machtches: X, exactly n times
X would be any number = [0-9] and n would be 9
([0-9]{9})
edit:
This will match 9 identical numbers:
([0-9]\1{8})
[0-9] machtes any number
\1 is the first match, which is performed
\1{8} matches 8 times the first match

I found Kirill Polishchuk answer fine and good. However in rare case if you have to print 9 adjacent characters which are separated by space ( like in a text file), you can do the following:
Input:
1111111111 4444444444 4455555555558899 567833333333339
The regex may be modified as follows:
'([0-9]) ?\1{8}' If you have only one space between columns.
or
'([0-9]) *\1{1}' If you have uneven number of spaces between columns.
If you want to use with grep you can do it this way:
grep -E '([0-9]) ?\1{8}'
or
grep -E '([0-9]) *\1{8}'
Hope this helps.

Related

Java 8 regex: a capturing group in a pattern doesn't match, yet the whole pattern does match

This is my first question. Nice to e-meet everyone.
I have created the following regex pattern in Java 8 (this is just a simplified example of what I actually have in my code - for the sake of clarity):
(?<!a)([0-9])\,([0-9])(?!a)|(?<!b)([0-9]) ([0-9])(?!b)|(?<!c)([0-9])([0-9])(?!c)
so in general it consists of three alternatives:
1st one matches two single digits separated with a comma, for example:
1,1
2,0
4,5
2nd one matches two single digits separated with a space, for example:
1 1
2 0
4 5
3rd one matches two single digits in a row, for example:
11
20
45
Each alternative uses lookarounds and their content has to be slightly different for each one of them - that's why I couldn't just put everything together like that:
([0-9])[, ]?([0-9])
Each of the matched digits is enclosed in a capturing group and now I have a second line to 'call out' these captured numbers like this:
(?<!n)($1 $2|$3 $4|$5 $6)(?!n)
So at the end I need to match a text that would have the same digits separated with single space and not surrounded by 'n'. So if any of the examples shown above would be matched by the pattern from the 1st line, the 2nd line pattern should match these:
1 1
2 0
4 5
11 11
22 00
44 55
And not any of these:
n1 1
2,0
45
asd asd asd
The problem is the following: it returns a match even if I do not have these captured digits in the tested text, but I do have space in it... So here I do not get match and that is correct:
aaaaaaaaa
bbbbbbbbb
aasdfasdf
but here I get a match on the following things (most apparently because there is a space/spaces):
abc abc
q w r t y
as df
Does anyone know if this is normal that despite the fact that the characters in capturing groups are not captured by the 1st line, the 'non capturing group' part (so a single space) will be matched and therefore the whole pattern returns match, as if a capturing group could be a zero-length match in the second line if nothing is captured by the first line? Thanks in advance for any comment on this.
Your regex matches whitespace because the resulting pattern for the 1,1 string is (?<!n)(1 1| | )(?!n), and it can match a space that is neither preceded nor followed with a space.
When a replacement backreference does not match any string in a .replaceAll/.replaceFirst it is assigned an empty string (it is assigned null when using .find() / .matches()), and thus you still get the blank alternatives in the resulting pattern.
You may leverage this functionality AND the fact that each alternative has exactly two capturing groups by concatenating replacement backreferences in the string replacement pattern, getting rid of the alternations altogether:
SEARCH: (?<!a)([0-9]),([0-9])(?!a)|(?<!b)([0-9]) ([0-9])(?!b)|(?<!c)([0-9])([0-9])(?!c)
REPLACE: (?<!n)($1 $2|$3 $4|$5 $6)(?!n)
Note how the backreferences are concatenated: all backreferences to odd groups come first, then all backreferences to even groups are placed in a no-alternative pattern.
See the regex demo.
Note that even if the number of groups is different across the alternatives you may just add "fake" empty groups to each of them, and this approach will still work.

minimum number in a string should be 1 regex validation?

I have a String which I need to match. Meaning it should only contains a number followed by space or just a number and minimum number should be 1 always. For ex:
3 1 2
1 p 3
6 3 2
0 3 2
First and third are valid string and all other are not.
I came up with below regex but I am not sure how can I check for minimum number in that string should be 1 always?
str.matches("(\\d|\\s)+")
Regex used from here
Just replace \\d with [1-9].
\\d is just a shorthand for the class [0-9].
This is a better regex though: ([1-9]\\s)*[1-9]$, as it takes care of double digit issues and won't allow space at the end.
Not everything can or should be solved with regular expressions.
You could use a simple expression like
str.matches("((\\d+)\\s)+")
or something alike to simply check that your input line contains only groups of digits followed by one or more spaces.
If that matches, you split along the spaces and for each group of digits you turn it into a number and validate against the valid range.
I have a gut feeling that regular expressions are actually not sufficient for the kind of validation you need.
If it should only contains a number followed by space or just a number and minimum number should be 1 and number can also be larger than 10 you might use:
^[1-9]\\d*(?: [1-9]\\d*)*$
Note that if you want to match a space only, instead of using \s which matches more you could just add a space in the pattern.
Explanation
^ Assert the start of the string
[1-9]\\d* Match a number from 1 up
(?: [1-9]\\d*)* Repeat a number from 1 up with a prepended space
$ Assert end of the string
Regex demo
Regex is part of the solution. But I don't think that regex alone can solve your problem.
This is my proposed solution:
private static boolean isValid(String str) {
Pattern pattern = Pattern.compile("[(\\d+)\\s]+");
Matcher matcher = pattern.matcher(str);
return matcher.matches() && Arrays.stream(Arrays.stream(matcher.group().split(" "))
.mapToInt(Integer::parseInt)
.toArray()).min().getAsInt() == 1;
}
Pay attention to the mathing type: matcher.matches() - to check match against the entire input. (don't use matcher.find() - because it will not reject invalid input such as "1 p 2")

Regex for numbers

Im trying to create a regex of numbers where 7 should appear atleast once and it shouldn't include 9
/[^9]//d+
I'm not sure how to do make it include 7 at least once
Also, it fails for the following example
123459, it accepts the string, even tho, there is a 9 included in there
However, if my string is 95, it rejects it, which is right
Code
Method 1
See regex in use here
(?=\d*7)(?!\d*9)\d+
Method 2
See regex in use here
\b(?=\d*7)[0-8]+\b
Note: This method uses fewer steps (170) as opposed to Method 1 with 406 steps.
Alternatively, you can also replace [0-8] with [^9\D] as seen here, which is basically saying don't match 9 or \D (any non-digit character).
You can also use \b(?=[^7\D]*7)[0-8]+\b as seen here, which brings the number of steps down from 170 to 147.
Method 3
See regex in use here
\b[0-8]*7[0-8]*\b
Note: This method uses few steps than both methods above at 139 steps. The only issue with this regex is that you need to identify valid characters in multiple locations in the pattern.
Results
Input
**VALID**
123456780
7
1237412
**INVALID**
9
12345680
1234567890
12341579
Output
Note: Shown below are strings that match.
123456780
7
1237412
Explanation
Method 1
(?=\d*7) Positive lookahead ensuring what follows is any digit any number of times, followed by 7 literally
(?!\d*9) Negative lookahead ensuring what follows is not any digit any number of times, followed by 9 literally
\d+ Any digit one or more times
Method 2
\b Assert the position as a word boundary
(?=\d*7) Positive lookahead ensuring what follows is any digit any number of times, followed by 7 literally
[0-8]+ Match any character present in the set 0-8
\b Assert the position as a word boundary
Method 3
\b Assert the position as a word boundary
[0-8]* Match any digit (except 9) any number of times
7 Match the digit 7 literally
[0-8]* Match any digit (except 9) any number of times
\b Assert the position as a word boundary
One way to do it would be to use several lookaheads:
(?=[^7]*7)(?!.*9)^\d+$
See a demo on regex101.com.
Note that you need to double escape the backslashes in Java, so that it becomes:
(?=[^7]*7)(?!.*9)^\\d+$
This has got a bit complex but it works for your use case :
(?=.*^[0-68-9]*7[0-68-9]*$)(?=^(?:(?!9).)*$).*$
First expression matches exactly one occurence of 7, accepts just numbers and second expression tests non-occurence of 9.
Try here : https://regex101.com/r/5OHgIr/1
If I find out correctly, you need a regex that accept all numbers that include at least one 7 and exclude 9. so try this:
(?:[0-8]*7[0-8]*)+
If you want found only numbers in a normal text add \s first and last of regex.

Regex - how to extract integers only not float from text

Given a text:
Why should the number 12.8 be rounded to 13. It must be rather 11
What must be a regex to extract, the integer values only:
13
11
I tried this: \d+(?!\\.)
But still no luck.
You need to use lookarounds (lookbehind, lookahead) to check what happens before and after the digits you match:
a naive approach:
(?<![0-9]|[0-9]\.)[0-9]+(?!\.?[0-9])
an efficient approach:
[0-9](?<![0-9][0-9]|[0-9]\.[0-9])[0-9]*+(?!\.[0-9])
(Because this one quickly discards positions where there is not a digit)
Note: don't forget to escape the backslashes in the java string.
You can also write it like this:
\b[0-9](?<![0-9]\.[0-9])[0-9]*+(?!\.[0-9])
I solved applying two regex. The command line bellow shows how they work:
echo "Why number 12.8 be rounded to 13. It must be rather 11" | grep -Po '\b\d+\.?\d\b' | grep -Po '^\d+$'
The first regex select all numbers, including floating points. The second regex selects only integers.
In java, use "\\b\\d+\\.?\\d\\b" to select all numbers, and "^\\d+$" to select only integers.

Java regex to identify 3 column table from ascii text

So I have incoming data that looks something like this:
Applications 7 days 6 days
And I'm trying to create regex that will match this line but not a line that has another column, like this:
Applications 7 days 6 days 5 days
The regex that I'm trying to use is:
^(.*?)(\s){4,}(.*?)(\s){4,}[^(\s){2}]+
Where [^(\s){2}]+ would mean selecting everything up to a double space. The problem with this is that
it doesn't work to begin with.
the second line I have would still match this.
Is there any regex I can use to only match the 3 column table and not the 4 column, 5 column, etc.?
You should take care with character classes ([]) as some chars inside are treated literally (as if they were escaped).
Try this regex (demo here):
^((?:(?!\s\s).)+)(?:\s){4,}((?:(?!\s\s).)+)(?:\s){4,}((?:.(?!\s\s))+)$
I switched the (.*?) with ((?:(?!\s\s).)+) which will match everything up to a sequence of two spaces.
I added a $ at the end, so it wouldn't match the lines with more than two columns.
I also added some ?: so the groups would become non-matching groups.
Finally, I removed the character class from the end of the regex and added a negative look-ahead.
Columns not ending with spaces
This one will not accept lines where the second column ends with spaces (demo here):
^((?:(?!\s\s).)+)(?:\s){4,}((?:(?!\s\s).)+)(?:\s){4,}((?:.(?!\s\s)(?!\s$))+)$
Notice the addition of a second negative look-ahead in the last group: (?!\s$).
try this :
^[^\s]*(\s{2,}[^\s].*){2,}
assuming before each column-value there is at least 2 spaces.
DEMO

Categories