Figuring out regex for the mentioned condition - java

I came across the concept of regex recently and was poised to solve the problem using just the regex inside matches() and length() method of String class. The problem was related to password matching.Here are the three conditions that need to be considered:
A password must have at least eight characters.
A password consists of only letters and digits.
A password must contain at least two digits.
I was able to do this problem by using various other String and Character class methods but I need to do them only by regex.What I have tried helps me with most of the test cases but some of them(test cases) are still failing.Since, I am learning regex implementation so please help me with what I am missing or doing wrong. Below is what I tried:
public class CheckPassword {
public static void main(String[]args){
Scanner sc = new Scanner(System.in);
System.out.println("Enter your password:\n");
String str1 = sc.next();
//String dig2 = "\\d{2}";
//String letter = ".*[A-Z].*";
//String letter1 = ".*[a-z].*";
//if(str1.length() >= 8 && str1.matches(dig2) &&(str1.matches(letter) || str1.matches(letter1)) )
if(str1.length() >= 8 && str1.matches("^(?=.*[A-Z])(?=.*[a-z])(?=.*\\d{2,})(?=.*[0-9])[A-Z0-9a-z]+$"))
System.out.println("Valid Password");
else
System.out.println("Invalid Password");
}
}
EDIT
Okay So I figured out the first and second case just I am having problem in appending the third case with them i.e. contains at least 2 digits.
if(str1.length() >= 8 && str1.matches("[a-zA-Z0-9]*"))
//works exclusive of the third criterion

You may actually use a single regex inside matches() to validate all 3 conditions:
A password must have at least eight characters and
A password consists of only letters and digits - use \p{Alnum}{8,} in the consuming part
A password must contain at least two digits - use the (?=(?:[a-zA-Z]*\d){2}) positive lookahead anchored at the start
Combining all three:
.matches("(?=(?:[a-zA-Z]*\\d){2})\\p{Alnum}{8,}")
Since matches() method anchors the pattern by default (i.e. it requires a full string match) no ^ and $ anchors are necessary.
Details
^ - implicit in matches() - start of string
(?=(?:[a-zA-Z]*\d){2}) - a positive lookahead ((?=...)) that requires the presence of exactly two sequences of:
[a-zA-Z]* - zero or more ASCII letters
\d - an ASCII digit
\p{Alnum}{8,} - 8 or more alphanumeric chars (ASCII only)
$ - implicit in matches() - end of string.

Okay Thank you #TDG and M.Aroosi for giving your precious time. I have figured out the solution and this solution satisfies all cases
// answer edited based on OP's working comment.
String dig2 = "^(?=.*?\\d.*\\d)[a-zA-Z0-9]{8,}$";
if(str1.matches(dig2))
{
//body
}

Related

How to use Pattern and Matcher? [duplicate]

This question already has answers here:
Java regular expressions and dollar sign
(5 answers)
Closed 4 years ago.
I have two simple questions about Pattern.
First one is reading the given name(s) and surname. I need to tell whether they contain numbers or punctuation characters. If not, it's a valid name. Whatever I input, the output is
This is not a valid name.
What am I doing wrong?
Scanner input = new Scanner(System.in);
System.out.print("Enter: ");
String name = input.next();
Pattern p = Pattern.compile("[A-Za-z]");
Matcher m = p.matcher(name);
boolean n = m.matches();
if (n == true) {
System.out.println(name);
}
else {
System.out.println("This is not a valid name.");
}
The second question: I read a list of salary amounts that start with a dollar sign $ and followed by a non-negative number, and save the valid salaries into an array. My program can output an array, but it can't distinguish $.
Scanner sc = new Scanner(System.in);
System.out.print("Enter Salary: ");
String salary = sc.nextLine();
Pattern pattern = Pattern.compile("($+)(\\d)");
Matcher matcher = pattern.matcher(salary);
String[] slArray=pattern.split(salary);
System.out.print(Arrays.toString(slArray));
I wouldn't even use a formal matcher for these simple use cases. Java's String#matches() method can just as easily handle this. To check for a valid name using your rules, you could try this:
String name = input.next();
if (name.matches("[A-Za-z]+")) {
System.out.println(name);
}
else {
System.out.println("This is not a valid name.");
}
And to check salary amounts, you could use:
String salary = sc.nextLine();
if (salary.matches("\\$\\d+(?:\\.\\d+)?")) {
System.out.println("Salary is valid.");
}
A note on the second pattern \$\d+(?:\.\d+)?, we need to escape dollar sign, because it is a regex metacharacter. Also, I did not use ^ and $ anchors in any of the two patterns, because String#matches() by default applies the pattern to the entire string.
Edit:
If you have multiple currency amounts in a given line, then split by whitespace to get an array of currency strings:
String input = "$23 $24.50 $25.10";
String[] currencies = input.split("\\s+");
Then, use the above matching logic to check each entry.
Explanation
Your regex pattern is wrong. You are missing the symbol to repeat the pattern.
Currently you have [A-Za-z] which matches only one letter. You can repeat using
* - 0 to infinite repetitions
? - 0 to 1 repetitions
+ - 1 to infinite repetitions
{x, y} - x to y repetitions
So you probably wanted a pattern like [A-Za-z]+. You can use sites like regex101.com to test your regex patterns (it also explains the pattern in detail). See regex101/n6OZGp for an example of your pattern.
Here is a tutorial on the regex repetition symbols.
For the second problem you need to know that $ is a special symbol in regex. It stands for the end of a line. If you want to match the $ symbol instead you need to escape it by adding a backslash:
"\\$\\d+"
Note that you need to add two backslashes because the backslash itself has a special meaning in Java. So you first need to escape the backslash using a backslash so that the string itself contains a backslash:
\$\d+
which then is passed to the regex engine. The same if you want to match a + sign, you need to escape it.
Notes
If you just want to check a given String against a pattern you can use the String#matches method:
String name = "John";
if (name.matches("[A-Za-z]+")) {
// Do something
}
Also note that there are shorthand character classes like \w (word character) which is short for [A-Za-z0-9_].
Code like
if (n == true) { ... }
can be simplified to just
if (n) { ... }
Because n already is a boolean, you don't need to test it against true anymore.
To parse currency values you should consider using already given methods like
NumberFormat format = NumberFormat.getCurrencyInstance();
Number num = format.parse("$5.34");
See the documentation of the class for examples.

Regular Expression to match spaces between numbers and operators but no spaces between numbers

I've searched many post on this forum and to my surprise, I haven't found anyone with a problem like mine.
I have to make a simple calculator for string values from console. Right now,I'm trying to make some regexes to validate the input.
My calculator has to accept numbers with spaces between the operators (only + and - is allowed) but not the ones with spaces between numbers, to sum up:
2 + 2 = 4 is correct, but
2 2 + 2 --> this should make an error and inform user on the console that he put space between numbers.
I've come up with this:
static String properExpression = "([0-9]+[+-]?)*[0-9]+$";
static String noInput = "";
static String numbersFollowedBySpace = "[0-9]+[\\s]+[0-9]";
static String numbersWithSpaces = "\\d+[+-]\\d+";
//I've tried also "[\\d\\s+\\d]";
void validateUserInput() {
Scanner sc = new Scanner(System.in);
System.out.println("Enter a calculation.");
input = sc.nextLine();
if(input.matches(properExpression)) {
calculator.calculate();
} else if(input.matches(noInput)) {
System.out.print(0);
} else if(input.matches(numbersFollowedBySpace)) {
input.replaceAll(" ", "");
calculator.calculate();
} else if(input.matches(numbersWithSpaces))
{
System.out.println("Check the numbers.
It seems that there is a space between the digits");
}
else System.out.println("sth else");
Can you give me a hint about the regex I should use?
To match a complete expression, like 2+3=24 or 6 - 4 = 2, a regex like
^\d+\s*[+-]\s*\d+\s*=\s*\d+$
will do. Look at example 1 where you can play with it.
If you want to match longer expressions like 2+3+4+5=14 then you can use:
^\d+\s*([+-]\s*\d+\s*)+=\s*\d+$
Explanation:
^\d+ # first operand
\s* # 0 or more spaces
( # start repeating group
[+-]\s* # the operator (+/-) followed by 0 or more spaces
\d+\s* # 2nd (3rd,4th) operand followed by 0 or more spaces
)+ # end repeating group. Repeat 1 or more times.
=\s*\d+$ # equal sign, followed by 0 or more spaces and result.
Now, you might want to accept an expression like 2=2 as a valid expression. In that case the repeating group could be absent, so change + into *:
^\d+\s*([+-]\s*\d+\s*)*=\s*\d+$
Look at example 2 for that one.
Try:
^(?:\d+\s*[+-])*\s*\d+$
Demo
Explanation:
The ^ and $ anchor the regex to match the whole string.
I have added \s* to allow whitespace between each number/operator.
I have replaced [0-9] with \d just to simplify it slightly; the two are equivalent.
I'm a little unclear whether you wanted to allow/disallow including = <digits> at the end, since your question mentions this but your attempted properExpression expression doesn't attempt it. If this is the case, it should be fairly easy to see how the expression can be modified to support it.
Note that I've not attempted to solve any potential issues arising out of anything other than regex issues.
Tried as much as possible to keep your logical flow. Although there are other answers which are more efficient but you would've to alter your logical flow a lot.
Please see the below and let me know if you have any questions.
static String properExpression = "\\s*(\\d+\\s*[+-]\\s*)*\\d+\\s*";
static String noInput = "";
static String numbersWithSpaces = ".*\\d[\\s]+\\d.*";
//I've tried also "[\\d\\s+\\d]";
static void validateUserInput() {
Scanner sc = new Scanner(System.in);
System.out.println("Enter a calculation.");
String input = sc.nextLine();
if(input.matches(properExpression)) {
input=input.replaceAll(" ", ""); //You've to assign it back to input.
calculator.calculate(); //Hope you have a way to pass input to calculator object
} else if(input.matches(noInput)) {
System.out.print(0);
} else if(input.matches(numbersWithSpaces)) {
System.out.println("Check the numbers. It seems that there is a space between the digits");
} else
System.out.println("sth else");
Sample working version here
Explanation
The below allows replaceable spaces..
\\s* //Allow any optional leading spaces if any
( //Start number+operator sequence
\\d+ //Number
\\s* //Optional space
[+-] //Operator
\\s* //Optional space after operator
)* //End number+operator sequence(repeated)
\\d+ //Last number in expression
\\s* //Allow any optional space.
Numbers with spaces
.* //Any beginning expression
\\d //Digit
[\\s]+ //Followed by one or more spaces
\\d //Followed by another digit
.* //Rest of the expression

Java Regex hung on a long string

I am trying to write a REGEX to validate a string. It should validate to the requirement which is that it should have only Uppercase and lowercase English letters (a to z, A to Z) (ASCII: 65 to 90, 97 to 122) AND/OR Digits 0 to 9 (ASCII: 48 to 57) AND Characters - _ ~ (ASCII: 45, 95, 126). Provided that they are not the first or last character. It can also have Character. (dot, period, full stop) (ASCII: 46) Provided that it is not the first or last character, and provided also that it does not appear two or more times consecutively. I have tried using the following
Pattern.compile("^[^\\W_*]+((\\.?[\\w\\~-]+)*\\.?[^\\W_*])*$");
It works fine for smaller strings but it doesn't for long strings as i am experiencing thread hung issues and huge spikes in cpu. Please help.
Test cases for invalid strings:
"aB78."
"aB78..ab"
"aB78,1"
"aB78 abc"
".Abc12"
Test cases for valid strings:
"abc-def"
"a1b2c~3"
"012_345"
Your regex suffers from catastrophic backtracking, which leads to O(2n) (ie exponential) solution time.
Although following the link will provide a far more thorough explanation, briefly the problem is that when the input doesn't match, the engine backtracks the first * term to try different combinations of the quantitys of the terms, but because all groups more or less match the same thing, the number of combinations of ways to group grows exponentially with the length of the backtracking - which in the case of non- matching input is the entire input.
The solution is to rewrite the regex so it won't catastrophically backtrack:
don't use groups of groups
use possessive quantifiers eg .*+ (which never backtrack)
fail early on non-match (eg using an anchored negative look ahead)
limit the number of times terms may appear using {n,m} style quantifiers
Or otherwise mitigate the problem
Problem
It is due to catastrophic backtracking. Let me show where it happens, by simplifying the regex to a regex which matches a subset of the original regex:
^[^\W_*]+((\.?[\w\~-]+)*\.?[^\W_*])*$
Since [^\W_*] and [\w\~-] can match [a-z], let us replace them with [a-z]:
^[a-z]+((\.?[a-z]+)*\.?[a-z])*$
Since \.? are optional, let us remove them:
^[a-z]+(([a-z]+)*[a-z])*$
You can see ([a-z]+)*, which is the classical example of regex which causes catastrophic backtracking (A*)*, and the fact that the outermost repetition (([a-z]+)*[a-z])* can expand to ([a-z]+)*[a-z]([a-z]+)*[a-z]([a-z]+)*[a-z] further exacerbate the problem (imagine the number of permutation to split the input string to match all expansions that your regex can have). And this is not mentioning [a-z]+ in front, which adds insult to injury, since it is of the form A*A*.
Solution
You can use this regex to validate the string according to your conditions:
^(?=[a-zA-Z0-9])[a-zA-Z0-9_~-]++(\.[a-zA-Z0-9_~-]++)*+(?<=[a-zA-Z0-9])$
As Java string literal:
"^(?=[a-zA-Z0-9])[a-zA-Z0-9_~-]++(\\.[a-zA-Z0-9_~-]++)*+(?<=[a-zA-Z0-9])$"
Breakdown of the regex:
^ # Assert beginning of the string
(?=[a-zA-Z0-9]) # Must start with alphanumeric, no special
[a-zA-Z0-9_~-]++(\.[a-zA-Z0-9_~-]++)*+
(?<=[a-zA-Z0-9]) # Must end with alphanumeric, no special
$ # Assert end of the string
Since . can't appear consecutively, and can't start or end the string, we can consider it a separator between strings of [a-zA-Z0-9_~-]+. So we can write:
[a-zA-Z0-9_~-]++(\.[a-zA-Z0-9_~-]++)*+
All quantifiers are made possessive to reduce stack usage in Oracle's implementation and make the matching faster. Note that it is not appropriate to use them everywhere. Due to the way my regex is written, there is only one way to match a particular string to begin with, even without possessive quantifier.
Shorthand
Since this is Java and in default mode, you can shorten a-zA-Z0-9_ to \w and [a-zA-Z0-9] to [^\W_] (though the second one is a bit hard for other programmer to read):
^(?=[^\W_])[\w~-]++(\.[\w~-]++)*+(?<=[^\W_])$
As Java string literal:
"^(?=[^\\W_])[\\w~-]++(\\.[\\w~-]++)*+(?<=[^\\W_])$"
If you use the regex with String.matches(), the anchors ^ and $ can be removed.
As #MarounMaroun already commented, you don't really have a pattern. It might be better to iterate over the string as in the following method:
public static boolean validate(String string) {
char chars[] = string.toCharArray();
if (!isSpecial(chars[0]) && !isLetterOrDigit(chars[0]))
return false;
if (!isSpecial(chars[chars.length - 1])
&& !isLetterOrDigit(chars[chars.length - 1]))
return false;
for (int i = 1; i < chars.length - 1; ++i)
if (!isPunctiation(chars[i]) && !isLetterOrDigit(chars[i])
&& !isSpecial(chars[i]))
return false;
return true;
}
public static boolean isPunctiation(char c) {
return c == '.' || c == ',';
}
public static boolean isSpecial(char c) {
return c == '-' || c == '_' || c == '~';
}
public static boolean isLetterOrDigit(char c) {
return (Character.isDigit(c) || (Character.isLetter(c) && (Character
.getType(c) == Character.UPPERCASE_LETTER || Character
.getType(c) == Character.LOWERCASE_LETTER)));
}
Test code:
public static void main(String[] args) {
System.out.println(validate("aB78."));
System.out.println(validate("aB78..ab "));
System.out.println(validate("abcdef"));
System.out.println(validate("aB78,1"));
System.out.println(validate("aB78 abc"));
}
Output:
false
false
true
true
false
A solution should try and find negatives rather than try and match a pattern over the entire string.
Pattern bad = Pattern.compile( "[^-\\W.~]|\\.\\.|^\\.|\\.$" );
for( String str: new String[]{ "aB78.", "aB78..ab", "abcdef",
"aB78,1", "aB78 abc" } ){
Matcher mat = bad.matcher( str );
System.out.println( mat.find() );
}
(It is remarkable to see how the initial statement "string...should have only" leads programmers to try and create positive assertions by parsing or matching valid characters over the full length rather than the much simpler search for negatives.)

Regex for password matching

I have searched the site and not finding exactly what I am looking for.
Password Criteria:
Must be 6 characters, 50 max
Must include 1 alpha character
Must include 1 numeric or special character
Here is what I have in java:
public static Pattern p = Pattern.compile(
"((?=.*\\d)(?=.*[a-z])(?=.*[A-Z])|(?=.*[\\d~!##$%^&*\\(\\)_+\\{\\}\\[\\]\\?<>|_]).{6,50})"
);
The problem is that a password of 1234567 is matching(it is valid) which it should not be.
Any help would be great.
I wouldn't try to use a single regular expression to do that. Regular expressions tend not to perform well when they get long and complicated.
boolean valid(String password){
return password != null &&
password.length() >= 6 &&
password.length() <= 50 &&
password.matches(".*[A-Za-z].*") &&
password.matches(".*[0-9\\~\\!\\#\\#\\$\\%\\^\\&\\*\\(\\)_+\\{\\}\\[\\]\\?<>|_].*");
}
Make sure you use Matcher.matches() method, which assert that the whole string matches the pattern.
Your current regex:
"((?=.*\\d)(?=.*[a-z])(?=.*[A-Z])|(?=.*[\\d~!##$%^&*\\(\\)_+\\{\\}\\[\\]\\?<>|_]).{6,50})"
means:
The string must contain at least a digit (?=.*\\d), a lower case English alphabet (?=.*[a-z]), and an upper case character (?=.*[A-Z])
OR | The string must contain at least 1 character which may be digit or special character (?=.*[\\d~!##$%^&*\\(\\)_+\\{\\}\\[\\]\\?<>|_])
Either conditions above holds true, and the string must be between 6 to 50 characters long, and does not contain any line separator.
The correct regex is:
"(?=.*[a-zA-Z])(?=.*[\\d~!##$%^&*()_+{}\\[\\]?<>|]).{6,50}"
This will check:
The string must contain an English alphabet character (either upper case or lower case) (?=.*[a-zA-Z]), and a character which can be either a digit or a special character (?=.*[\\d~!##$%^&*()_+{}\\[\\]?<>|])
The string must be between 6 and 50 characters, and does not contain any line separator.
Note that I removed escaping for most characters, except for [], since {}?() loses their special meaning inside character class.
A regular expression can only match languages which can be expressed as a deterministic finite automaton, i.e. which doesn't require memory. Since you have to count special and alpha characters, this does require memory, so you're not going to be able to do this in a DFA. Your rules are simple enough, though that you could just scan the password, determine its length and ensure that the required characters are available.
I'd suggest you to separate characters and length validation:
boolean checkPassword(String password) {
return password.length() >= 6 && password.length() <= 50 && Pattern.compile("\\d|\\w").matcher(password).find();
}
I would suggest splitting into separate regular expressions
$re_numbers = "/[0-9]/";
$re_letters = "/[a-zA-Z]/";
both of them must match and the length is tested separately, too.
The code looks quite cleaner then and is easier to understand/change.
This way too complex for such a simple task:
Validate length using String#length()
password.length() >= 6 && password.length() <= 50
Validate each group using Matcher#find()
Pattern alpha = Pattern.compile("[a-zA-Z]");
boolean hasAlpha = alpha.matcher(password).find();
Pattern digit = Pattern.compile("\d");
boolean hasDigit = digit.matcher(password).find();
Pattern special = Pattern.compile("[\\~\\!\\#\\#\\$\\%\\^\\&\\*\\(\\)_+\\{\\}\\[\\]\\?<>|_]");
boolean hasSpecial = special.matcher(password).find();

Regex to get first number in string with other characters

I'm new to regular expressions, and was wondering how I could get only the first number in a string like 100 2011-10-20 14:28:55. In this case, I'd want it to return 100, but the number could also be shorter or longer.
I was thinking about something like [0-9]+, but it takes every single number separately (100,2001,10,...)
Thank you.
/^[^\d]*(\d+)/
This will start at the beginning, skip any non-digits, and match the first sequence of digits it finds
EDIT:
this Regex will match the first group of numbers, but, as pointed out in other answers, parseInt is a better solution if you know the number is at the beginning of the string
Try this to match for first number in string (which can be not at the beginning of the string):
String s = "2011-10-20 525 14:28:55 10";
Pattern p = Pattern.compile("(^|\\s)([0-9]+)($|\\s)");
Matcher m = p.matcher(s);
if (m.find()) {
System.out.println(m.group(2));
}
Just
([0-9]+) .*
If you always have the space after the first number, this will work
Assuming there's always a space between the first two numbers, then
preg_match('/^(\d+)/', $number_string, $matches);
$number = $matches[1]; // 100
But for something like this, you'd be better off using simple string operations:
$space_pos = strpos($number_string, ' ');
$number = substr($number_string, 0, $space_pos);
Regexs are computationally expensive, and should be avoided if possible.
the below code would do the trick.
Integer num = Integer.parseInt("100 2011-10-20 14:28:55");
[0-9] means the numbers 0-9 can be used the + means 1 or more times. if you use [0-9]{3} will get you 3 numbers
Try ^(?'num'[0-9]+).*$ which forces it to start at the beginning, read a number, store it to 'num' and consume the remainder without binding.
This string extension works perfectly, even when string not starts with number.
return 1234 in each case - "1234asdfwewf", "%sdfsr1234" "## # 1234"
public static string GetFirstNumber(this string source)
{
if (string.IsNullOrEmpty(source) == false)
{
// take non digits from string start
string notNumber = new string(source.TakeWhile(c => Char.IsDigit(c) == false).ToArray());
if (string.IsNullOrEmpty(notNumber) == false)
{
//replace non digit chars from string start
source = source.Replace(notNumber, string.Empty);
}
//take digits from string start
source = new string(source.TakeWhile(char.IsDigit).ToArray());
}
return source;
}
NOTE: In Java, when you define the patterns as string literals, do not forget to use double backslashes to define a regex escaping backslash (\. = "\\.").
To get the number that appears at the start or beginning of a string you may consider using
^[0-9]*\.?[0-9]+ # Float or integer, leading digit may be missing (e.g, .35)
^-?[0-9]*\.?[0-9]+ # Optional - before number (e.g. -.55, -100)
^[-+]?[0-9]*\.?[0-9]+ # Optional + or - before number (e.g. -3.5, +30)
See this regex demo.
If you want to also match numbers with scientific notation at the start of the string, use
^[0-9]*\.?[0-9]+([eE][+-]?[0-9]+)? # Just number
^-?[0-9]*\.?[0-9]+([eE][+-]?[0-9]+)? # Number with an optional -
^[-+]?[0-9]*\.?[0-9]+([eE][+-]?[0-9]+)? # Number with an optional - or +
See this regex demo.
To make sure there is no other digit on the right, add a \b word boundary, or a (?!\d)
or (?!\.?\d) negative lookahead that will fail the match if there is any digit (or . and a digit) on the right.
public static void main(String []args){
Scanner s=new Scanner(System.in);
String str=s.nextLine();
Pattern p=Pattern.compile("[0-9]+");
Matcher m=p.matcher(str);
while(m.find()){
System.out.println(m.group()+" ");
}
\d+
\d stands for any decimal while + extends it to any other decimal coming directly after, until there is a non number character like a space or letter

Categories