Regexp Java for password validation - java

I'm creating a regexp for password validation to be used in a Java application as a configuration parameter.
The regexp is:
^.*(?=.{8,})(?=..*[0-9])(?=.*[a-z])(?=.*[A-Z])(?=.*[##$%^&+=]).*$
The password policy is:
At least 8 chars
Contains at least one digit
Contains at least one lower alpha char and one upper alpha char
Contains at least one char within a set of special chars (##%$^ etc.)
Does not contain space, tab, etc.
I’m missing just point 5. I'm not able to have the regexp check for space, tab, carriage return, etc.
Could anyone help me?

Try this:
^(?=.*[0-9])(?=.*[a-z])(?=.*[A-Z])(?=.*[##$%^&+=])(?=\S+$).{8,}$
Explanation:
^ # start-of-string
(?=.*[0-9]) # a digit must occur at least once
(?=.*[a-z]) # a lower case letter must occur at least once
(?=.*[A-Z]) # an upper case letter must occur at least once
(?=.*[##$%^&+=]) # a special character must occur at least once
(?=\S+$) # no whitespace allowed in the entire string
.{8,} # anything, at least eight places though
$ # end-of-string
It's easy to add, modify or remove individual rules, since every rule is an independent "module".
The (?=.*[xyz]) construct eats the entire string (.*) and backtracks to the first occurrence where [xyz] can match. It succeeds if [xyz] is found, it fails otherwise.
The alternative would be using a reluctant qualifier: (?=.*?[xyz]). For a password check, this will hardly make any difference, for much longer strings it could be the more efficient variant.
The most efficient variant (but hardest to read and maintain, therefore the most error-prone) would be (?=[^xyz]*[xyz]), of course. For a regex of this length and for this purpose, I would dis-recommend doing it that way, as it has no real benefits.

simple example using regex
public class passwordvalidation {
public static void main(String[] args) {
String passwd = "aaZZa44#";
String pattern = "(?=.*[0-9])(?=.*[a-z])(?=.*[A-Z])(?=.*[##$%^&+=])(?=\\S+$).{8,}";
System.out.println(passwd.matches(pattern));
}
}
Explanations:
(?=.*[0-9]) a digit must occur at least once
(?=.*[a-z]) a lower case letter must occur at least once
(?=.*[A-Z]) an upper case letter must occur at least once
(?=.*[##$%^&+=]) a special character must occur at least once
(?=\\S+$) no whitespace allowed in the entire string
.{8,} at least 8 characters

All the previously given answers use the same (correct) technique to use a separate lookahead for each requirement. But they contain a couple of inefficiencies and a potentially massive bug, depending on the back end that will actually use the password.
I'll start with the regex from the accepted answer:
^(?=.*[0-9])(?=.*[a-z])(?=.*[A-Z])(?=.*[##$%^&+=])(?=\S+$).{8,}$
First of all, since Java supports \A and \z I prefer to use those to make sure the entire string is validated, independently of Pattern.MULTILINE. This doesn't affect performance, but avoids mistakes when regexes are recycled.
\A(?=.*[0-9])(?=.*[a-z])(?=.*[A-Z])(?=.*[##$%^&+=])(?=\S+$).{8,}\z
Checking that the password does not contain whitespace and checking its minimum length can be done in a single pass by using the all at once by putting variable quantifier {8,} on the shorthand \S that limits the allowed characters:
\A(?=.*[0-9])(?=.*[a-z])(?=.*[A-Z])(?=.*[##$%^&+=])\S{8,}\z
If the provided password does contain a space, all the checks will be done, only to have the final check fail on the space. This can be avoided by replacing all the dots with \S:
\A(?=\S*[0-9])(?=\S*[a-z])(?=\S*[A-Z])(?=\S*[##$%^&+=])\S{8,}\z
The dot should only be used if you really want to allow any character. Otherwise, use a (negated) character class to limit your regex to only those characters that are really permitted. Though it makes little difference in this case, not using the dot when something else is more appropriate is a very good habit. I see far too many cases of catastrophic backtracking because the developer was too lazy to use something more appropriate than the dot.
Since there's a good chance the initial tests will find an appropriate character in the first half of the password, a lazy quantifier can be more efficient:
\A(?=\S*?[0-9])(?=\S*?[a-z])(?=\S*?[A-Z])(?=\S*?[##$%^&+=])\S{8,}\z
But now for the really important issue: none of the answers mentions the fact that the original question seems to be written by somebody who thinks in ASCII. But in Java strings are Unicode. Are non-ASCII characters allowed in passwords? If they are, are only ASCII spaces disallowed, or should all Unicode whitespace be excluded.
By default \s matches only ASCII whitespace, so its inverse \S matches all Unicode characters (whitespace or not) and all non-whitespace ASCII characters. If Unicode characters are allowed but Unicode spaces are not, the UNICODE_CHARACTER_CLASS flag can be specified to make \S exclude Unicode whitespace. If Unicode characters are not allowed, then [\x21-\x7E] can be used instead of \S to match all ASCII characters that are not a space or a control character.
Which brings us to the next potential issue: do we want to allow control characters? The first step in writing a proper regex is to exactly specify what you want to match and what you don't. The only 100% technically correct answer is that the password specification in the question is ambiguous because it does not state whether certain ranges of characters like control characters or non-ASCII characters are permitted or not.

You should not use overly complex Regex (if you can avoid them) because they are
hard to read (at least for everyone but yourself)
hard to extend
hard to debug
Although there might be a small performance overhead in using many small regular expressions, the points above outweight it easily.
I would implement like this:
bool matchesPolicy(pwd) {
if (pwd.length < 8) return false;
if (not pwd =~ /[0-9]/) return false;
if (not pwd =~ /[a-z]/) return false;
if (not pwd =~ /[A-Z]/) return false;
if (not pwd =~ /[%#$^]/) return false;
if (pwd =~ /\s/) return false;
return true;
}

Thanks for all answers, based on all them but extending sphecial characters:
#SuppressWarnings({"regexp", "RegExpUnexpectedAnchor", "RegExpRedundantEscape"})
String PASSWORD_SPECIAL_CHARS = "##$%^`<>&+=\"!ºª·#~%&'¿¡€,:;*/+-.=_\\[\\]\\(\\)\\|\\_\\?\\\\";
int PASSWORD_MIN_SIZE = 8;
String PASSWORD_REGEXP = "^(?=.*[0-9])(?=.*[a-z])(?=.*[A-Z])(?=.*[" + PASSWORD_SPECIAL_CHARS + "])(?=\\S+$).{"+PASSWORD_MIN_SIZE+",}$";
Unit tested:

Password Requirement :
Password should be at least eight (8) characters in length where the system can support it.
Passwords must include characters from at least two (2) of these groupings: alpha, numeric, and special characters.
^.*(?=.{8,})(?=.*\d)(?=.*[a-zA-Z])|(?=.{8,})(?=.*\d)(?=.*[!##$%^&])|(?=.{8,})(?=.*[a-zA-Z])(?=.*[!##$%^&]).*$
I tested it and it works

For anyone interested in minimum requirements for each type of character, I would suggest making the following extension over Tomalak's accepted answer:
^(?=(.*[0-9]){%d,})(?=(.*[a-z]){%d,})(?=(.*[A-Z]){%d,})(?=(.*[^0-9a-zA-Z]){%d,})(?=\S+$).{%d,}$
Notice that this is a formatting string and not the final regex pattern. Just substitute %d with the minimum required occurrences for: digits, lowercase, uppercase, non-digit/character, and entire password (respectively). Maximum occurrences are unlikely (unless you want a max of 0, effectively rejecting any such characters) but those could be easily added as well. Notice the extra grouping around each type so that the min/max constraints allow for non-consecutive matches. This worked wonders for a system where we could centrally configure how many of each type of character we required and then have the website as well as two different mobile platforms fetch that information in order to construct the regex pattern based on the above formatting string.

This one checks for every special character :
^(?=.*[0-9])(?=.*[a-z])(?=.*[A-Z])(?=\S+$).*[A-Za-z0-9].{8,}$

Java Method ready for you, with parameters
Just copy and paste and set your desired parameters.
If you don't want a module, just comment it or add an "if" as done by me for special char
//______________________________________________________________________________
/**
* Validation Password */
//______________________________________________________________________________
private static boolean validation_Password(final String PASSWORD_Arg) {
boolean result = false;
try {
if (PASSWORD_Arg!=null) {
//_________________________
//Parameteres
final String MIN_LENGHT="8";
final String MAX_LENGHT="20";
final boolean SPECIAL_CHAR_NEEDED=true;
//_________________________
//Modules
final String ONE_DIGIT = "(?=.*[0-9])"; //(?=.*[0-9]) a digit must occur at least once
final String LOWER_CASE = "(?=.*[a-z])"; //(?=.*[a-z]) a lower case letter must occur at least once
final String UPPER_CASE = "(?=.*[A-Z])"; //(?=.*[A-Z]) an upper case letter must occur at least once
final String NO_SPACE = "(?=\\S+$)"; //(?=\\S+$) no whitespace allowed in the entire string
//final String MIN_CHAR = ".{" + MIN_LENGHT + ",}"; //.{8,} at least 8 characters
final String MIN_MAX_CHAR = ".{" + MIN_LENGHT + "," + MAX_LENGHT + "}"; //.{5,10} represents minimum of 5 characters and maximum of 10 characters
final String SPECIAL_CHAR;
if (SPECIAL_CHAR_NEEDED==true) SPECIAL_CHAR= "(?=.*[##$%^&+=])"; //(?=.*[##$%^&+=]) a special character must occur at least once
else SPECIAL_CHAR="";
//_________________________
//Pattern
//String pattern = "(?=.*[0-9])(?=.*[a-z])(?=.*[A-Z])(?=.*[##$%^&+=])(?=\\S+$).{8,}";
final String PATTERN = ONE_DIGIT + LOWER_CASE + UPPER_CASE + SPECIAL_CHAR + NO_SPACE + MIN_MAX_CHAR;
//_________________________
result = PASSWORD_Arg.matches(PATTERN);
//_________________________
}
} catch (Exception ex) {
result=false;
}
return result;
}

Also You Can Do like This.
public boolean isPasswordValid(String password) {
String regExpn =
"^(?=.*[0-9])(?=.*[a-z])(?=.*[A-Z])(?=.*[##$%^&+=])(?=\\S+$).{8,}$";
CharSequence inputStr = password;
Pattern pattern = Pattern.compile(regExpn,Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher(inputStr);
if(matcher.matches())
return true;
else
return false;
}

Use Passay library which is powerful api.

I think this can do it also (as a simpler mode):
^(?=.*\d)(?=.*[a-z])(?=.*[A-Z])(?=.*[##$%^&+=])[^\s]{8,}$
[Regex Demo]

easy one
("^ (?=.* [0-9]) (?=.* [a-z]) (?=.* [A-Z]) (?=.* [\\W_])[\\S]{8,10}$")
(?= anything ) ->means positive looks forward in all input string and make sure for this condition is written .sample(?=.*[0-9])-> means ensure one digit number is written in the all string.if not written return false
.
(?! anything ) ->(vise versa) means negative looks forward if condition is written return false.
close meaning ^(condition)(condition)(condition)(condition)[\S]{8,10}$

String s=pwd;
int n=0;
for(int i=0;i<s.length();i++)
{
if((Character.isDigit(s.charAt(i))))
{
n=5;
break;
}
else
{
}
}
for(int i=0;i<s.length();i++)
{
if((Character.isLetter(s.charAt(i))))
{
n+=5;
break;
}
else
{
}
}
if(n==10)
{
out.print("Password format correct <b>Accepted</b><br>");
}
else
{
out.print("Password must be alphanumeric <b>Declined</b><br>");
}
Explanation:
First set the password as a string and create integer set o.
Then check the each and every char by for loop.
If it finds number in the string then the n add 5. Then jump to the
next for loop. Character.isDigit(s.charAt(i))
This loop check any alphabets placed in the string. If its find then
add one more 5 in n. Character.isLetter(s.charAt(i))
Now check the integer n by the way of if condition. If n=10 is true
given string is alphanumeric else its not.

Sample code block for strong password:
(?=.*[0-9])(?=.*[a-z])(?=.*[A-Z])(?=.*[^a-zA-Z0-9])(?=\\S+$).{6,18}
at least 6 digits
up to 18 digits
one number
one lowercase
one uppercase
can contain all special characters

RegEx is -
^(?:(?=.*\d)(?=.*[a-z])(?=.*[A-Z])(?=.*[##$%^&+=]).*)[^\s]{8,}$
at least 8 digits {8,}
at least one number (?=.*\d)
at least one lowercase (?=.*[a-z])
at least one uppercase (?=.*[A-Z])
at least one special character (?=.*[##$%^&+=])
No space [^\s]

A more general answer which accepts all the special characters including _ would be slightly different:
^(?=.*[0-9])(?=.*[a-z])(?=.*[A-Z])(?=.*[\W|\_])(?=\S+$).{8,}$
The difference (?=.*[\W|\_]) translates to "at least one of all the special characters including the underscore".

Related

Regular expression special characters not working at the starting of the string in java

After trying other variations, I use this regular expression in Java to validate a password:
PatternCompiler compiler = new Perl5Compiler();
PatternMatcher matcher = new Perl5Matcher();
pattern = compiler.compile("^(?=.*?[a-zA-Z])(?![\\\\\\\\_\-])(?=.*?[0-9])([A-Za-z0-9-/-~]
[^\\\\\\\\_\-]*)$");
But it still doesn't match my test cases as expected:
Apr#2017 match
$$Apr#2017 no match, but it should match
!!Apr#2017 no match, but it should match
!#ap#2017 no match, but it should match
-Apr#2017 it should not match
_Apr#2017 it should not match
\Apr#2017 it should not match
Except three special characters - _ \ remaining, all should match at the start of the string.
Rules:
It should accept all special characters any number of times except above three symbols.
It must and should contain one number and Capital letter at any place in the string.
You have two rules, why not create more than one regular expression?
It should accept all special characters any number of times except above three symbols.
For this one, make sure it does not match [-\\_] (note that the - is the first character in the character class or it will be interpreted as a range.
It must and should contain one number and Capital letter at any place in the string.
For this one, make sure it matches [A-Z] and [0-9]
To make it easy to modify and extend, do some abstraction:
class PasswordRule
{
private Pattern pattern;
// If true, string must match, if false string must not match
private boolean shouldMatch;
PasswordRule(String patternString, boolean shouldMatch)
{
this.shouldMatch = shouldMatch;
this.pattern = compiler.compile(patternString);
}
boolean match(String passwordString)
{
return pattern.matches(passwordString) == shouldMatch;
}
}
I don't know or care if I have the API to Perl5 matching correct in the above, but you should get the idea. Then your rules go in an array
PasswordRule rules[] =
{
PasswordRule("[-\\\\_]", false),
PasswordRule("[A-Z]", true),
PasswordRule("[0-9]", true)
};
boolean passwordIsOk(String password)
{
for (PasswordRule rule : rules)
{
if (!rule.match(password)
{
return false;
}
}
return true;
}
Using the above, your rules are far more flexible and modifiable than one monstrous regular expression.
Here's an alternative solution - reverse the condition. This regex
^(?:[^0-9]*|[^A-Z]*|[_\\-].*)$
matches non conforming passwords. This makes it much simpler to understand.
It matches either
a string free from digits
a string free from capital letters
a string containing either of _, \ or -
See it illustrated here at regex101.
There are some unclear issues remaining in your question though, so it may have to be adjusted. (The restriction in starting character I mentioned as a comment)
You seem to need
"^(?=[^a-zA-Z]*[a-zA-Z])(?=[^0-9]*[0-9])[^\\\\_-]*$"
See the regex demo
^ - start of string
(?=[^a-zA-Z]*[a-zA-Z]) - a positive lookahead that requires at least 1 ASCII letter ([a-zA-Z]) to appear after 0+ chars other than letters ([^a-zA-Z]*)
(?=[^0-9]*[0-9])- at least 1 ASCII digit (same principle of contrast as above is used here)
[^\\\\_-]* - 0+ chars other than \ (inside a Java string literal, \ should be doubled to denote 1 literal backslash, and to match a single backslash with a regex, we need double literal backslash), _, -
$ - end of string (\\z might be better though as it matches at the very end of the string).

Java regular expression for number starts with code

I am not a Java developer but I am interfacing with a Java system.
Please help me with a regular expression that would detect all numbers starting with with 25678 or 25677.
For example in rails would be:
^(25677|25678)
Sample input is 256776582036 an 256782405036
^(25678|25677)
or
^2567[78]
if you do ^(25678|25677)[0-9]* it Guarantees that the others are all numbers and not other characters.
Should do the trick for you...Would look for either number and then any number after
In Java the regex would be the same, assuming that the number takes up the entire line. You could further simplify it to
^2567[78]
If you need to match a number anywhere in the string, use \b anchor (double the backslash if you are making a string literal in Java code).
\b2567[78]
how about if there is a possibility of a + at the beginning of a number
Add an optional +, like this [+]? or like this \+? (again, double the backslash for inclusion in a string literal).
Note that it is important to know what Java API is used with the regular expression, because some APIs will require the regex to cover the entire string in order to declare it a match.
Try something like:
String number = ...;
if (number.matches("^2567[78].*$")) {
//yes it starts with your number
}
Regex ^2567[78].*$ Means:
Number starts with 2567 followed by either 7 or 8 and then followed by any character.
If you need just numbers after say 25677, then regex should be ^2567[78]\\d*$ which means followed by 0 or n numbers after your matching string in begining.
The regex syntax of Java is pretty close to that of rails, especially for something this simple. The trick is in using the correct API calls. If you need to do more than one search, it's worthwhile to compile the pattern once and reuse it. Something like this should work (mixed Java and pseudocode):
Pattern p = Pattern.compile("^2567[78]");
for each string s:
if (p.matcher(s).find()) {
// string starts with 25677 or 25678
} else {
// string starts with something else
}
}
If it's a one-shot deal, then you can simplify all this by changing the pattern to cover the entire string:
if (someString.matches("2567[78].*")) {
// string starts with 25677 or 25678
}
The matches() method tests whether the entire string matches the pattern; hence the leading ^ anchor is unnecessary but the trailing .* is needed.
If you need to account for an optional leading + (as you indicated in a comment to another answer), just include +? at the start of the pattern (or after the ^ if that's used).

Need help for writing regular expression

I am weak in writing regular expressions so I'm going to need some help on the one. I need a regular expression that can validate that a string is an set of alphabets (the alphabets must be unique) delimited by comma.
Only one character and after that a comma
Examples:
A,E,R
R,A
E,R
Thanks
You can use a repeated group to validate it's a comma separated string.
^[AER](?:,[AER])*$
To not have unique characters, you would do something like:
^([AER])(?:,(?!\1)([AER])(?!.*\2))*$
If I understand it correctly, a valid string will be a series (possibly zero long) of two-character patterns, where each pattern is a letter followed by a comma; finally followed at the end by one letter.
Thus:
"^([A-Za-z],)*[A-Za-z]$"
EDIT: Since you've clarified that the letters have to be A, E, or R:
"^([AER],)*[AER]$"
Something like this "^([AER],)*[AER]$"
#Edit: regarding the uniqueness, if you can drop the "last character cannot be a comma" requirement (which can be checked before the regex anyway in constant time) then this should work:
"^(?:([AER],?)(?!.*\\1))*$"
This will match A,E,R, hence you need that check before performing the regex. I do not take responsibility for the performance but since it's only 3 letters anyway...
The above is a java regex obviously, if you want a "pure one" ^(?:([AER],?)(?!.*\1))*$
#Edit2: sorry, missed one thing: this actually requires that check and then you need to add a comma at the end since otherwise it will also match A,E,E. Kind of limited I know.
My own ugly but extensible solution, which will disallow leading and trailing commas, and checks that the characters are unique.
It uses forward-declared backreference: note how the second capturing group is behind the reference made to it (?!.*\2). On the first repetition, since the second capturing group hasn't captured anything, Java treats any attempt to reference text match by second capturing group as failure.
^([AER])(?!.*\1)(?:,(?!.*\2)([AER]))*+$
Demo on regex101 (PCRE flavor has the same behavior for this case)
Demo on RegexPlanet
Test cases:
A,E,R
A,R,E
E,R,A
A
R,E
R
E
A,
A,R,
A,A,R
E,A,E
A,E,E
X,R,E
R,A,E,
,A
AA,R,E
Note: I'm going to answer the original question. That is, I don't care if the elements repeat.
We've had several suggestions for this regex:
^([AER],)*[AER]$
Which does indeed work. However, to match a String, it first has to back up one character because it will find that there is no , at the end. So we switch it for this to increase performance:
^[AER](,[AER])*$
Notice that this will match a correct String the very first time it attempts to. But also note that we don't need to worry about the ( )* backing up at all; it will either match the first time, or it won't match the String at all. So we can further improve performance by using a possessive quantifier:
^[AER](,[AER])*+$
This will take the whole String and attempt to match it. If it fails, then it stops, saving time by not doing useless backing up.
If I were trying to ensure the String had no repeated elements, I would not use regex; it just complicates things. You end up with less-readable code (sadly, most people don't understand regex) and, oftentimes, slower code. So I would build my own validator:
public static boolean isCommaDelimitedSet(String toValidate, HashSet<Character> toMatch) {
for (int index = 0; index < toValidate.length(); index++) {
if (index % 2 == 0) {
if (!toMatch.contains(toValidate.charAt(index))) return false;
} else {
if (toValidate.charAt(index) != ',') return false;
}
}
return true;
}
This assumes that you want to be able to pass in a set of characters that are allowed. If you don't want that and have explicit chars you want to match, change the contents of the if (index % 2 == 0) block to:
char c = toValidate.charAt(index);
if (c == 'A' || c == 'E' || c == 'R' || /* and so on */ ) return false;

Password matching with regex

I'm using java.util.regex.Pattern to match passwords that meet the following criteria:
At least 7 characters
Must consist of only letters and digits
At least one letter and at least one digit
I have 1 & 2 covered, but I can't think of how to do 3.
1 & 2 - [\\w]{7,}
Any ideas?
You can use this. This basically uses lookahead for achieving the 3rd requirement.
(?=.*\d)(?=.*[a-zA-Z])\w{7,}
or the Java string
"(?=.*\\d)(?=.*[a-zA-Z])\\w{7,}"
Explanation
"(?=" + // Assert that the regex below can be matched, starting at this position (positive lookahead)
"." + // Match any single character
"*" + // Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
"\\d" + // Match a single digit 0..9
")" +
"(?=" + // Assert that the regex below can be matched, starting at this position (positive lookahead)
"." + // Match any single character
"*" + // Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
"[a-zA-Z]" + // Match a single character present in the list below
// A character in the range between “a” and “z”
// A character in the range between “A” and “Z”
")" +
"\\w" + // Match a single character that is a “word character” (letters, digits, and underscores)
"{7,}" // Between 7 and unlimited times, as many times as possible, giving back as needed (greedy)
Edit
If you want to include unicode letter support, then use this
(?=.*\d)(?=.*\pL)[\pL\d]{7,}
Doing this with only Regex will very easily become convoluted and very difficult to understand/read if you ever need to change the credentials for a password.
Instead iterate over the password in a loop and count the different types of characters and then do simple if-checks.
Such as (untested):
if (password.length() < 7) return false;
int countDigit = 0;
int countLetter = 0;
for (int i = 0; password.length(); i++) {
if (Character.isDigit(password.charAt(i)) {
countDigit++;
}
else if (Character.isLetter(password.charAt(i)) {
countLetter++;
}
}
if (countDigit == 0 || countLetter == 0) {
return false;
}
return true;
You won't need a character class for using \w, it is a character class by itself. However it also matches underscore which you didn't mention. So it might be better to use a custom character class.
To the "at least one" part, use look aheads:
/(?=.*\d)(?=.*[A-Za-z])[A-Za-z0-9]{7,}/
You may need to add some extra escapes to make it work with Java*.
* which unfortunately I can't help!
It's possible to do this in a single regexp, but I wouldn't as it'll be hard to maintain.
I would just do:
if (pass.matches("[a-zA-Z0-9]{7,}") &&
pass.matches("[a-zA-Z]") &&
pass.matches("\\d"))
{
// password is OK
}
It then becomes obvious how to apply additional constraints to the password - they just get added on with additional && ... clauses.
NB: I've deliberately used [a-z] rather than \w because I'm unsure what happens to \w if you use it in alternate locales where other characters might be considered "letters".
I would add another regex to cover the 3rd criteria (you don't have to nail them all in one regex, but may want to combine them). I would go with somthing like ^(?=.*\d)(?=.*[a-zA-Z])
taken from here-
http://www.mkyong.com/regular-expressions/10-java-regular-expression-examples-you-should-know/

Java - Unknown characters passing as [a-zA-z0-9]*?

I'm no expert in regex but I need to parse some input I have no control over, and make sure I filter away any strings that don't have A-z and/or 0-9.
When I run this,
Pattern p = Pattern.compile("^[a-zA-Z0-9]*$"); //fixed typo
if(!p.matcher(gottenData).matches())
System.out.println(someData); //someData contains gottenData
certain spaces + an unknown symbol somehow slip through the filter (gottenData is the red rectangle):
In case you're wondering, it DOES also display Text, it's not all like that.
For now, I don't mind the [?] as long as it also contains some string along with it.
Please help.
[EDIT] as far as I can tell from the (very large) input, the [?]'s are either white spaces either nothing at all; maybe there's some sort of encoding issue, also perhaps something to do with #text nodes (input is xml)
The * quantifier matches "zero or more", which means it will match a string that does not contain any of the characters in your class. Try the + quantifier, which means "One or more": ^[a-zA-Z0-9]+$ will match strings made up of alphanumeric characters only. ^.*[a-zA-Z0-9]+.*$ will match any string containing one or more alphanumeric characters, although the leading .* will make it much slower. If you use Matcher.lookingAt() instead of Matcher.matches, it will not require a full string match and you can use the regex [a-zA-Z0-9]+.
You have an error in your regex: instead of [a-zA-z0-9]* it should be [a-zA-Z0-9]*.
You don't need ^ and $ around the regex.
Matcher.matches() always matches the complete string.
String gottenData = "a ";
Pattern p = Pattern.compile("[a-zA-z0-9]*");
if (!p.matcher(gottenData).matches())
System.out.println("doesn't match.");
this prints "doesn't match."
The correct answer is a combination of the above answers. First I imagine your intended character match is [a-zA-Z0-9]. Note that A-z isn't as bad as you might think it include all characters in the ASCII range between A and z, which is the letters plus a few extra (specifically [,\,],^,_,`).
A second potential problem as Martin mentioned is you may need to put in the start and end qualifiers, if you want the string to only consists of letters and numbers.
Finally you use the * operator which means 0 or more, therefore you can match 0 characters and matches will return true, so effectively your pattern will match any input. What you need is the + quantifier. So I will submit the pattern you are most likely looking for is:
^[a-zA-Z0-9]+$
You have to change the regexp to "^[a-zA-Z0-9]*$" to ensure that you are matching the entire string
Looks like it should be "a-zA-Z0-9", not "a-zA-z0-9", try correcting that...
Did anyone consider adding space to the regex [a-zA-Z0-9 ]*. this should match any normal text with chars, number and spaces. If you want quotes and other special chars add them to the regex too.
You can quickly test your regex at http://www.regexplanet.com/simple/
You can check input value is contained string and numbers? by using regex ^[a-zA-Z0-9]*$
if your value just contained numberString than its show match i.e, riz99, riz99z
else it will show not match i.e, 99z., riz99.z, riz99.9
Example code:
if(e.target.value.match('^[a-zA-Z0-9]*$')){
console.log('match')
}
else{
console.log('not match')
}
}
online working example

Categories