How to restrict occurrence of a character in regex? - java

I want to check if a string consists of letters and digits only, and allow a - separator:
^[\w\d-]*$
Valid: TEST-TEST123
Now I want to check that the separator occurs only once at a time. Thus the following examples should be invalid:
Invalid: TEST--TEST, TEST------TEST, TEST-TEST--TEST.
Question: how can I restrict the repeated occurrence of the a character?

You may use
^(?:[a-zA-Z0-9]+(?:-[a-zA-Z0-9]+)*)?$
Or, in Java, you may use an alphanumeric \p{Alnum} character class to denote letters and digits:
^(?:\p{Alnum}+(?:-\p{Alnum}+)*)?$
See the regex demo
Details
^ - start of the string
(?: - start of an optional non-capturing group (it will ensure the pattern matches an empty string, if you do not need it, remove this group!)
\p{Alnum}+ - 1 or more letters or digits
(?:-\p{Alnum}+)* - zero or more repetitions of
- - a hyphen
\p{Alnum}+ - 1 or more letters or digits
)? - end of the optional non-capturing group
$ - end of string.
In code, you do not need the ^ and $ anchors if you use the pattern in the matches method since it anchors the match by default:
Boolean valid = s.matches("(?:\\p{Alnum}+(?:-\\p{Alnum}+)*)?");

Related

Plus quantifier does not check the last character if last character has negated set

I need to match a string with the following constraints:
At least one alphanumeric character
Forbid specific characters (^*#:;)
Forbid dot at the end
I have the next pattern:
^[^*#:;]*[\p{Alnum}]+[^*#:;]*[^.*#:;]$
The problem is that when I have an alphanumeric character at the end, the string will not match the pattern.
For example:
$$$....1$ will match the pattern.
$$$....$1 will not.
As far as I understand, the problem is that [\p{Alnum}]+ does not check the last character.
Is there any possible way to do this with one regexp?
It seems the following should tick your boxes:
^(?=.*\p{Alnum})(?!.*[*#:;]).+(?<!\.)$
Where:
^ - Start string anchor.
(?=.*\p{Alnum}) - Postive lookahead to match at least a single alphanumeric character.
(?!.*[*#:;]) - Negative lookahead to prevent any of the characters mentioned in the character class.
.+ - 1+ characters other than newline.
(?<!\.) - Negative lookbehind to prevent a dot before;
$ - End string anchor.
See the online demo
Alternatively use a negated character class as you were doing instead of the negative lookahead:
^(?=.*\p{Alnum})[^*#:;\n]+(?<!\.)$
^ - Start string anchor.
(?=.*\p{Alnum}) - Postive lookahead to match at least a single alphanumeric character.
[^*#:;\n]+ - 1+ characters other than those mentioned in the character class.
(?<!\.) - Negative lookbehind to prevent a dot before;
$ - End string anchor.
See the online demo

Regex validating eight or more char string that must contain at least two non-alphabetic characters

I am validating a password string that must consist of eight characters or more and must contain at least two non-alphabetic (i.e., not A-Za-z) characters using regular expression.
The code I have so far is
Pattern p = Pattern.compile("((?=2.*[^a-z[A-Z]]).{8,})");
Matcher m = p.matcher(pass);
I don't know whether my expression is correct.
I want to validate my password with eight characters or more and must contain at least two non-alphabetic (i.e., not A-Z) characters
You may use
s.matches("(?=(?:[^a-zA-Z]*[a-zA-Z]){2}).{8,}")
See the regex demo.
Another way of writing the same
s.matches("(?=.{8})(?:[^a-zA-Z]*[a-zA-Z]){2}.*")
Explanation
^ - (not necessary in .matches as the method requires a full string match) - start of string
(?= - start of a positive lookahead that requires, immediately to the right of the current location,
(?:[^a-zA-Z]*[a-zA-Z]){2} - a non-capturing group that matches 2 consecutive occurrences of:
[^a-zA-Z]* - any 0+ chars other than ASCII letters
[a-zA-Z] - an ASCII letter
) - end of the lookahead
.{8,} - any 8 or more chars other than line break chars, as many as possible
$ - (not necessary in .matches as the method requires a full string match) - end of string
In (?=.{8})(?:[^a-zA-Z]*[a-zA-Z]){2}.* pattern, the first lookahead requires at least 8 chars, then at least two letters are requires using the (?:[^a-zA-Z]*[a-zA-Z]){2} pattern, and then .* matches the rest of the string.

Regex to match a text with special chars in it

I need a regex to match a text with special chars -,.+\/& in it. The special chars must not be more than 2 subsequent and a special char can not be followed by space. More specifically I have to cover these cases:
some text/
/some text
some /text
I came up with this regex:
^[-\/,\.+\&]{0,1}[\p{L}]+[-\/,\.+\&]{0,1}([\s\-']?[-\/,\.+\&]{0,1}[\p{L}]+)([-\/,\.+\&]{0,1})$
It matches most of the cases that I need but fails to match for instance:
some te&xt. Every help will be appreciated. Thanks.
You can use
"^(?!.*(?:[-,.+/&]\\s|[-,.+/&]{2}))[^\\s\\d]+(?:\\s+[^\\s\\d]+)*$"
See the regex demo
Explanation:
^ - start of string
(?!.*(?:[-,.+/&]\\s|[-,.+/&]{2})) - a negative lookahead that fails the match if there is a special char [-,.+/&] followed with a whitespace \s, or 2 consecutive special chars from [-,.+/&] set
[^\\s\\d]+ - 1 or more characters other than digit and whitespace
(?:\\s+[^\\s\\d]+)* - 0+ sequences of:
\\s+ - 1+ whitespaces
[^\\s\\d]+ - 1 or more characters other than digit and whitespace
$ - end of string
I found the solution:
^[-\/,\.+\&\s]{0,1}([\p{L}][-\/,\.+\&\s]{0,1})+([-\/,\.+\&\s]{0,1}([\p{L}][-\/,\.+\&\s]{0,1})+)([\p{L}][-\/,\.+\&\s]{0,1})([-\/,\.+\&\s]{0,1})$

How to find special pattern via regex

I have following regex which doesn’t match two different strings.
Actual regex which finds AB-434. Which doesn’t match TEMS-54534.
([a-zA-Z][a-zA-Z0-9_]+-[1-9][0-9]*)([^.]|\.[^0-9]|\.$|$)
here is the sample inputs
TEMS-54534
TEMS-5453
TEMS-1233
TEMS-12
CB-213
CB-2135
CB-12
ABC-2223
ABC-223
ABC-12
You seem to be looking for a pattern that starts with 1 ASCII letter followed with 1 or more alphanumeric or underscore characters followed with a - followed with one or more digits not starting with 0.
You can use
^[a-zA-Z][a-zA-Z0-9_]+-[1-9][0-9]*$
or
^[a-zA-Z]\w+-(?!0)\d+$
See the regex demo (and another one).
Explanation:
^ - start of string
[a-zA-Z][a-zA-Z0-9_]+ / [a-zA-Z]\w+ - an ASCII letter followed with 1+ alphanumerics/underscore chars
- - a hyphen
[1-9][0-9]* / (?!0)\d+ - a digit from 1-9 range followed with 0+ nay digits (you can restrict it with {min,max} limiting quantifier if need be)
$ - end of string
More details:
[a-zA-Z0-9_] can be written as \w (if no Pattern.UNICODE_CHARACTER_CLASS is used)
In Java, do not forget to use double backslashes to escape metacharacters and shorthand character classes
If the pattern is used with String#matches(), the ^ at the start and $ at the end of the pattern are redundant.
And a Java demo:
List<String> strs = Arrays.asList("TEMS-54534","TEMS-5453","TEMS-1233","TEMS-12","CB-213",
"CB-2135","CB-12","ABC-2223","ABC-223","ABC-12");
for (String str : strs)
System.out.println(str.matches("[a-zA-Z]\\w+-(?!0)\\d+"));

How to match String only contain Alphanumeric characters, a dash and an underscore using Regex

All:
What I want to do is using Regex to match a string which only allow [A-Za-z0-9_-] and the format should be:
Started with only [A-Za-z0-9], and followed by [A-Za-z0-9_-]. There could be [_-] in the middle, but if there is any, it is only allowed once(both _ and - can exist, but each one only has one chance), and ended with [A-Za-z0-9].
I only know how to match Alphanumeric characters, a dash and an underscore, but have no idea how to limit their occurrence time.
Thanks
You can use negative lookahead:
^(?!.*(-[^-]*-|_[^_]*_))[A-Za-z0-9][\w-]*[A-Za-z0-9]$
RegEx Demo
Explanation:
^ - Line start
(?!.*(-[^-]*-|_[^_]*_)) - Negative lookahead which means fail the match if there are 2 underscore or 2 hyphens ahead
[A-Za-z0-9] - Match 1 alphanumeric character
[\w-]* - Match 0 or more of [A-Za-z0-9_-] characters
$ - Match line end

Categories