I want to know if there is a way to check if a string contains a certain pattern for a regex.
For example:
string.matches("something[0-9]x") would check if the string contains a substring of "something" with any single digit integer following it followed by "x". But lets say if I want to check the same thing, but there is no limit for that int, ie it could be 1000000. Is there like a wildcard for an int that I can use?
Just use modifier + after your character class which match the preceding token one or more time :
string.matches("something[0-9]+x")
Regular expressions work on characters; they have no semantic understanding of those characters. So it doesn't make sense to talk about "integers" here; the best that you can do is to talk about "digits". The number "1" is one digit; "1234" is four.
In a regular expression, you can match one or more of the preceding pattern using "+", so the regex "something[0-9]+x" should do what you want. If you want an upper bound on the number of digits, than you can try something like "something[0-9]{1,5}x"
Yes, simply use *, so in your example string.matches("something[0-9]+x")
It would match a string something followed by any digit from 0 to 9, which have to occur at least one time, so * means zero or more times, while + means it have to occur at least one time but can occur more times if it wants.
If you do [0-9]{n,m} you can specify with m and n in which range it can occur for example:
[0-9]{2,3} will match any digit and it have to occur 2 or 3 times, if you only use one digit in this bracs [0-9]{2} it has to occur at least 2 times.
But at last: simply learn to use google ... there are so many regexp sites with tutorials and stuff.
Related
Consider the following regular expression, where X is any regex.
X{n}|X{m}
This regex would test for X occurring exactly n or m times.
Is there a regex quantifier that can test for an occurrence X exactly n or m times?
There is no single quantifier that means "exactly m or n times". The way you are doing it is fine.
An alternative is:
X{m}(X{k})?
where m < n and k is the value of n-m.
Here is the complete list of quantifiers (ref. http://www.regular-expressions.info/reference.html):
?, ?? - 0 or 1 occurences (?? is lazy, ? is greedy)
*, *? - any number of occurences
+, +? - at least one occurence
{n} - exactly n occurences
{n,m} - n to m occurences, inclusive
{n,m}? - n to m occurences, lazy
{n,}, {n,}? - at least n occurence
To get "exactly N or M", you need to write the quantified regex twice, unless m,n are special:
X{n,m} if m = n+1
(?:X{n}){1,2} if m = 2n
...
No, there is no such quantifier. But I'd restructure it to /X{m}(X{m-n})?/ to prevent problems in backtracking.
Very old post, but I'd like to contribute sth that might be of help.
I've tried it exactly the way stated in the question and it does work but there's a catch:
The order of the quantities matters. Consider this:
#[a-f0-9]{6}|#[a-f0-9]{3}
This will find all occurences of hex colour codes (they're either 3 or 6 digits long). But when I flip it around like this
#[a-f0-9]{3}|#[a-f0-9]{6}
it will only find the 3 digit ones or the first 3 digits of the 6 digit ones. This does make sense and a Regex pro might spot this right away, but for many this might be a peculiar behaviour. There are some advanced Regex features that might avoid this trap regardless of the order, but not everyone is knee-deep into Regex patterns.
TLDR; (?<=[^x]|^)(x{n}|x{m})(?:[^x]|$)
Looks like you want "x n times" or "x m times", I think a literal translation to regex would be (x{n}|x{m}).
Like this https://regex101.com/r/vH7yL5/1
or, in a case where you can have a sequence of more than m "x"s (assuming m > n), you can add 'following no "x"' and 'followed by no "x", translating to [^x](x{n}|x{m})[^x] but that would assume that there are always a character behind and after you "x"s. As you can see here: https://regex101.com/r/bB2vH2/1
you can change it to (?:[^x]|^)(x{n}|x{m})(?:[^x]|$), translating to "following no 'x' or following line start" and "followed by no 'x' or followed by line end". But still, it won't match two sequences with only one character between them (because the first match would require a character after, and the second a character before) as you can see here: https://regex101.com/r/oC5oJ4/1
Finally, to match the one character distant match, you can add a positive look ahead (?=) on the "no 'x' after" or a positive look behind (?<=) on the "no 'x' before", like this: https://regex101.com/r/mC4uX3/1
(?<=[^x]|^)(x{n}|x{m})(?:[^x]|$)
This way you will match only the exact number of 'x's you want.
Taking a look at Enhardened's answer, they state that their penultimate expression won't match sequences with only one character between them. There is an easy way to fix this without using look ahead/look behind, and that's to replace the start/end character with the boundary character. This lets you match against word boundaries which includes start/end. As such, the appropriate expression should be:
(?:[^x]|\b)(x{n}|x{m})(?:[^x]|\b)
As you can see here: https://regex101.com/r/oC5oJ4/2.
I want to check if a string is a positive natural number but I don't want to use Integer.parseInt() because the user may enter a number larger than an int. Instead I would prefer to use a regex to return false if a numeric String contains all "0" characters.
if(val.matches("[0-9]+")){
// We know that it will be a number, but what if it is "000"?
// what should I change to make sure
// "At Least 1 character in the String is from 1-9"
}
Note: the string must contain only 0-9 and it must not contain all 0s; in other words it must have at least 1 character in [1-9].
You'd be better off using BigInteger if you're trying to work with an arbitrarily large integer, however the following pattern should match a series of digits containing at least one non-zero character.
\d*[1-9]\d*
Debuggex Demo
Debugex's unit tests seem a little buggy, but you can play with the pattern there. It's simple enough that it should be reasonably cross-language compatible, but in Java you'd need to escape it.
Pattern positiveNumber = Pattern.compile("\\d*[1-9]\\d*");
Note the above (intentionally) matches strings we wouldn't normally consider "positive natural numbers", as a valid string can start with one or more 0s, e.g. 000123. If you don't want to match such strings, you can simplify the pattern further.
[1-9]\d*
Debuggex Demo
Pattern exactPositiveNumber = Pattern.compile("[1-9]\\d*");
If you want to match positive natural numbers, written in the standard way, without a leading zero, the regular expression you want is
[1-9]\d*
which matches any string of characters consisting only of digits, where the first digit is not zero. Don't forget to double the backslash ("[1-9]\\d*") if you write it as a Java String literal.
I made the following regex for only positive natural numbers:
^[1-9]\d*$
This will will check if a number starts with 1 to 9 (so there can't be any zero's in the beginning) and there rest of the numbers need to be numbers from 0 to 9. You can test it at https://regex101.com
I've written a regular expression that matches any number of letters with any number of single spaces between the letters. I would like that regular expression to also enforce a minimum and maximum number of characters, but I'm not sure how to do that (or if it's possible).
My regular expression is:
[A-Za-z](\s?[A-Za-z])+
I realized it was only matching two sets of letters surrounding a single space, so I modified it slightly to fix that. The original question is still the same though.
Is there a way to enforce a minimum of three characters and a maximum of 30?
Yes
Just like + means one or more you can use {3,30} to match between 3 and 30
For example [a-z]{3,30} matches between 3 and 30 lowercase alphabet letters
From the documentation of the Pattern class
X{n,m} X, at least n but not more than m times
In your case, matching 3-30 letters followed by spaces could be accomplished with:
([a-zA-Z]\s){3,30}
If you require trailing whitespace, if you don't you can use: (2-29 times letter+space, then letter)
([a-zA-Z]\s){2,29}[a-zA-Z]
If you'd like whitespaces to count as characters you need to divide that number by 2 to get
([a-zA-Z]\s){1,14}[a-zA-Z]
You can add \s? to that last one if the trailing whitespace is optional. These were all tested on RegexPlanet
If you'd like the entire string altogether to be between 3 and 30 characters you can use lookaheads adding (?=^.{3,30}$) at the beginning of the RegExp and removing the other size limitations
All that said, in all honestly I'd probably just test the String's .length property. It's more readable.
This is what you are looking for
^[a-zA-Z](\s?[a-zA-Z]){2,29}$
^ is the start of string
$ is the end of string
(\s?[a-zA-Z]){2,29} would match (\s?[a-zA-Z]) 2 to 29 times..
Actually Benjamin's answer will lead to the complete solution to the OP's question.
Using lookaheads it is possible to restrict the total number of characters AND restrict the match to a set combination of letters and (optional) single spaces.
The regex that solves the entire problem would become
(?=^.{3,30}$)^([A-Za-z][\s]?)+$
This will match AAA, A A and also fail to match AA A since there are two consecutive spaces.
I tested this at http://regexpal.com/ and it does the trick.
You should use
[a-zA-Z ]{20}
[For allowed characters]{for limiting of the number of characters}
I want to create a regular expression in java using standard libraries that will accommodate the following sentence:
12 of 128
Obviously the numbers can be anything though... From 1 digit to many
Also, I'm not sure how to accommodate the word "of" but I thought maybe something along the lines of:
[\d\sof\s\d]
This should work for you:
(\d+\s+of\s+\d+)
This will assume that you want to capture the full block of text as "one group", and there can be one-or-more whitespace characters in between each (if only one space, you can change \s+ to just \s).
If you want to capture the numbers separately, you can try:
(\d+)\s+of\s+(\d+)
You want this:
\d+\sof\s\d+
The relevant change from what you already had is the addition of the two plus signs. That means, that it should match multiple digits but at least one.
Sample: http://regexr.com?32cao
This regexp
"\\d+ of \\d+"
will match at least one to any number of digits, followed by string " of " followed by one to any number of digits.
I'm new to regular expressions and I need to find a regular expression that matches one or more digits [1-9] only ONE '|' sign, one or more '*' sign and zero or more ',' sign.
The string should not contain any other characters.
This is what I have:
if(this.ruleString.matches("^[1-9|*,]*$"))
{
return true;
}
Is it correct?
Thanks,
Vinay
I think you should test separately for every type of symbols rather than write complex expression.
First, test that you don't have invalid symbols - "^[0-9|*,]$"
Then test for digits "[1-9]", it should match at least one.
Then test for "\\|", "\\*" and "\\," and check the number of matches.
If all test are passed then your string is valid.
Nope, try this:
"^[1-9]+\\|\\*+,*$"
Please give us at least 10 possible matching strings of what you are looking to accept, and 10 of what you want to reject, and tell us if either this have to keep some sequence or its order doesn't matter. So we can make a reliable regex.
By now, all I can offer is:
^[1-9]+\|{1}\*+,*$
This RegEx was tested against these sample strings, accepting them:
56421|*****,,,
2|*********,,,
1|*
7|*,
18|****
123456789|*
12|********,,
1516332|**,,,
111111|*
6|*****,,,,
And it was tested against these sample strings, rejecting them:
10|*,
2***525*|*****,,,
123456,15,22*66*****4|,,,*167
1|2*3,4,5,6*
,*|173,
|*,
||12211
12
1|,*
1233|54|***,,,,
I assume your given order is strict and all conditions apply at the same time.
It looks like the pattern you need is
n-n, one or more times seperated by commas
then a bar (|)
then n*n, one or more times seperated by commas.
Here is a regular expression for that.
([1-9]{1}[0-9]*\-[0-9]+){1}
(,[1-9]{1}[0-9]*\-[0-9]+)*
\|
([1-9]{1}[0-9]*\*[0-9]+){1}
(,[1-9]{1}[0-9]*\*[0-9]+)*
But it is so complex, and does not take into account the details, such as
for the case of n-m, you want
n less than m
(I guess).
And you likely want the same number of n-m before the bar, and x*y after the bar.
Depends whether you want to check the syntax completely or not.
(I hope you do want to.)
Since this is so complex, it should be done with a set of code instead of a single regular expression.
this regex should work
"^[1-9\\|\\*,-]*$"
Assert position at the beginning of the string «^»
Match a single character present in the list below «[1-9\|*,-]»
Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «»
A character in the range between “1” and “9” «1-9»
A | character «\|»
A * character «*»
The character “,” «,»
The character “-” «-»
Assert position at the end of the string (or before the line break at the end of the string, if any) «$»