Using this as a guide to attempt to emulate an if-else Java regex, I came up with:
[0-2]?(?:(?<=2)(?![6-9])|(?<!2)(?=[0-9])) to do the following:
An optional digit between 0-2 inclusive as the leftmost digit; However, if the first digit is a 2, then the next digit to the right can be maximum 5. If it is a 0 or 1, or left blank, then 0-9 is valid. I am trying to ultimately end up allowing a user to only write the numbers 0-255.
Testing the regular expression on both Regex101 as well as javac doesn't work on test cases, despite the Regex101 explanation being congruent with what I want.
When I test the regex:
System.out.println("0".matches("[0-2]?(?:(?<=2)(?![6-9])|(?<!2)(?=[0-9]))")); ---> false
System.out.println("2".matches("[0-2]?(?:(?<=2)(?![6-9])|(?<!2)(?=[0-9]))")); ----> true
System.out.println("25".matches("[0-2]?(?:(?<=2)(?![6-9])|(?<!2)(?=[0-9]))")); ----> false
System.out.println("22".matches("[0-2]?(?:(?<=2)(?![6-9])|(?<!2)(?=[0-9]))")); ----> false
System.out.println("1".matches("[0-2]?(?:(?<=2)(?![6-9])|(?<!2)(?=[0-9]))")); ----> false
It appears so far, from few test cases, 2 is the only valid case that is accepted by the regex.
For reference, here is my initial regex, using if-else that limits a number to the range of 0-255: [0-2]?(?(?<=2)[0-5]|[0-9])(?(?<=25)[0-5]|[0-9])
I don't see why to mimic if else for checking a range. It's just putting some patterns together.
^(?:[1-9]?\d|1\d\d|2[0-4]\d|25[0-5])$
^ start anchor
(?: opens a non capture group for alternation
[1-9]?\d matches 0-99
1\d\d matches 100-199
2[0-4]\d matches 200-249
25[0-5] matches 250-255
$ end anchor
See demo at regex101
With allowing leading zeros, you can reduce it to ^(?:[01]?\d\d?|2[0-4]\d|25[0-5])$
As you are trying to only allow a range of numbers (0-255), why use regex at all? Instead, parse the string as an int and check if it falls within the range.
public static boolean isInRange(String input, int min, int max) {
try {
int val = Integer.parseInt(input);
return val >= min && val < max;
} catch (NumberFormatException e) {
return false;
}
}
Related
I need to write a regex to validate phone numbers with the following criteria:
Return the input as-is if it's fewer than 7 digits. Otherwise, remove the first character if it is a 1 or 0. If we haven't returned yet and the number is < 10 digits, return it. If it's >= 10 digits, return the last 7.
This is performance-critical code converted from coded conditional statements so ideally it can be done in a single regex. I managed to hack together something that got me close but I'm having some trouble meeting all criteria without further breaking things.
(Spaces are just to break things up since there's a lot here).
var pattern = Pattern.compile("(?<=\A[01]?) ([0-9]{1,9}) (?![0-9]) | (?:[01]?) (?<=\A[01]?) (?:[0-9]{3,}) ([0-9]{7}) (.*)", "$1$2");
return pattern.replaceAll(phoneNum);
This passes all the test strings I gave it except it doesn't remove the 0 or 1 like it should if they exist as the first character of strings of length 7+.
// Returns input as-is if fewer than 7 digits
555123 --> 555123 Success
// If 7+ digits remove the first character if it is a 1 or 0
1234567 --> 234567 Failure, returned 1234567
// If we haven't returned yet and the number is < 10 digits, return
5551212 --> 5551212 Success
// If it's >= 10 digits, return the last 7
5551234567 --> 1234567 Success
Java isn't my forte, but as people have mentioned regex might not be the right solution to your question. Just in case you are still interested in a regular expression, I think the following covers all your criteria:
^(?:(?=\d{7,9}$)[01]?|\d*(?=\d{7}$)|)(\d+$)
See the online demo
^ - Start string ancor.
(?: - Open non-capturing group.
(?=\d{7,9}$- A positive lookahead to assert position when there are 7-9 digits up to end string ancor.
[01]? - Optionally capture a zero or one.
| - Or:
\d* - Capture as many digits but untill:
(?=\d{7}$) - Positive lookahead for 7 digits untill end string ancor.
| - Or: Match nothing.
) - Close non-capturing group.
(\d+$) - Capture all remaining digits in 1st capture group until end string ancor.
A replaceAll with a lambda might be sufficient, having the disadvantage that the lambda is a bit slower, though the regex faster. It is more maintainable, certainly for real-world business logic. Just time the result in a micro-benchmark.
var pattern = Pattern.compile("\\b(\\d+)\\b");
return pattern.matcher(phoneNum).replaceAll(mr -> {
String digits = mr.group(1);
if (digits.length() < 7) { // Or better \\d{7, 20}
return digits;
}
if (digits.startsWith("0") || digits.startsWith(1)) { // Can be optimized
digits = digits.substring(1);
}
if (digits >= 10) {
digits = digits.substring(digits.length() - 7);
}
return digits;
});
Your test cases should be kept as unit tests, as such business rules tend to change "slightly" - especially if you prefer a single regex.
Here's the if version, as suggested in comments, I've also added your tests as unit tests:
import org.junit.jupiter.api.Test;
import static org.junit.jupiter.api.Assertions.assertEquals;
public class SomeClass {
public String correctPhoneNumber(String number) {
if (number.length() >= 7 && (number.startsWith("0") || number.startsWith("1"))) {
return number.substring(1);
}
if (number.length() >= 10) {
return number.substring(number.length() - 7);
}
return number;
}
#Test
void correctPhoneNumberTest() {
SomeClass objectToTest = new SomeClass();
assertEquals("555123", objectToTest.correctPhoneNumber("555123"));
assertEquals("234567", objectToTest.correctPhoneNumber("1234567"));
assertEquals("5551212", objectToTest.correctPhoneNumber("5551212"));
assertEquals("1234567", objectToTest.correctPhoneNumber("5551234567"));
}
}
I've searched many post on this forum and to my surprise, I haven't found anyone with a problem like mine.
I have to make a simple calculator for string values from console. Right now,I'm trying to make some regexes to validate the input.
My calculator has to accept numbers with spaces between the operators (only + and - is allowed) but not the ones with spaces between numbers, to sum up:
2 + 2 = 4 is correct, but
2 2 + 2 --> this should make an error and inform user on the console that he put space between numbers.
I've come up with this:
static String properExpression = "([0-9]+[+-]?)*[0-9]+$";
static String noInput = "";
static String numbersFollowedBySpace = "[0-9]+[\\s]+[0-9]";
static String numbersWithSpaces = "\\d+[+-]\\d+";
//I've tried also "[\\d\\s+\\d]";
void validateUserInput() {
Scanner sc = new Scanner(System.in);
System.out.println("Enter a calculation.");
input = sc.nextLine();
if(input.matches(properExpression)) {
calculator.calculate();
} else if(input.matches(noInput)) {
System.out.print(0);
} else if(input.matches(numbersFollowedBySpace)) {
input.replaceAll(" ", "");
calculator.calculate();
} else if(input.matches(numbersWithSpaces))
{
System.out.println("Check the numbers.
It seems that there is a space between the digits");
}
else System.out.println("sth else");
Can you give me a hint about the regex I should use?
To match a complete expression, like 2+3=24 or 6 - 4 = 2, a regex like
^\d+\s*[+-]\s*\d+\s*=\s*\d+$
will do. Look at example 1 where you can play with it.
If you want to match longer expressions like 2+3+4+5=14 then you can use:
^\d+\s*([+-]\s*\d+\s*)+=\s*\d+$
Explanation:
^\d+ # first operand
\s* # 0 or more spaces
( # start repeating group
[+-]\s* # the operator (+/-) followed by 0 or more spaces
\d+\s* # 2nd (3rd,4th) operand followed by 0 or more spaces
)+ # end repeating group. Repeat 1 or more times.
=\s*\d+$ # equal sign, followed by 0 or more spaces and result.
Now, you might want to accept an expression like 2=2 as a valid expression. In that case the repeating group could be absent, so change + into *:
^\d+\s*([+-]\s*\d+\s*)*=\s*\d+$
Look at example 2 for that one.
Try:
^(?:\d+\s*[+-])*\s*\d+$
Demo
Explanation:
The ^ and $ anchor the regex to match the whole string.
I have added \s* to allow whitespace between each number/operator.
I have replaced [0-9] with \d just to simplify it slightly; the two are equivalent.
I'm a little unclear whether you wanted to allow/disallow including = <digits> at the end, since your question mentions this but your attempted properExpression expression doesn't attempt it. If this is the case, it should be fairly easy to see how the expression can be modified to support it.
Note that I've not attempted to solve any potential issues arising out of anything other than regex issues.
Tried as much as possible to keep your logical flow. Although there are other answers which are more efficient but you would've to alter your logical flow a lot.
Please see the below and let me know if you have any questions.
static String properExpression = "\\s*(\\d+\\s*[+-]\\s*)*\\d+\\s*";
static String noInput = "";
static String numbersWithSpaces = ".*\\d[\\s]+\\d.*";
//I've tried also "[\\d\\s+\\d]";
static void validateUserInput() {
Scanner sc = new Scanner(System.in);
System.out.println("Enter a calculation.");
String input = sc.nextLine();
if(input.matches(properExpression)) {
input=input.replaceAll(" ", ""); //You've to assign it back to input.
calculator.calculate(); //Hope you have a way to pass input to calculator object
} else if(input.matches(noInput)) {
System.out.print(0);
} else if(input.matches(numbersWithSpaces)) {
System.out.println("Check the numbers. It seems that there is a space between the digits");
} else
System.out.println("sth else");
Sample working version here
Explanation
The below allows replaceable spaces..
\\s* //Allow any optional leading spaces if any
( //Start number+operator sequence
\\d+ //Number
\\s* //Optional space
[+-] //Operator
\\s* //Optional space after operator
)* //End number+operator sequence(repeated)
\\d+ //Last number in expression
\\s* //Allow any optional space.
Numbers with spaces
.* //Any beginning expression
\\d //Digit
[\\s]+ //Followed by one or more spaces
\\d //Followed by another digit
.* //Rest of the expression
String always consists of two distinct alternating characters. For example, if string 's two distinct characters are x and y, then t could be xyxyx or yxyxy but not xxyy or xyyx.
But a.matches() always returns false and output becomes 0. Help me understand what's wrong here.
public static int check(String a) {
char on = a.charAt(0);
char to = a.charAt(1);
if(on != to) {
if(a.matches("["+on+"("+to+""+on+")*]|["+to+"("+on+""+to+")*]")) {
return a.length();
}
}
return 0;
}
Use regex (.)(.)(?:\1\2)*\1?.
(.) Match any character, and capture it as group 1
(.) Match any character, and capture it as group 2
\1 Match the same characters as was captured in group 1
\2 Match the same characters as was captured in group 2
(?:\1\2)* Match 0 or more pairs of group 1+2
\1? Optionally match a dangling group 1
Input must be at least two characters long. Empty string and one-character string will not match.
As java code, that would be:
if (a.matches("(.)(.)(?:\\1\\2)*\\1?")) {
See regex101.com for working examples1.
1) Note that regex101 requires use of ^ and $, which are implied by the matches() method. It also requires use of flags g and m to showcase multiple examples at the same time.
UPDATE
As pointed out by Austin Anderson:
fails on yyyyyyyyy or xxxxxx
To prevent that, we can add a zero-width negative lookahead, to ensure input doesn't start with two of the same character:
(?!(.)\1)(.)(.)(?:\2\3)*\2?
See regex101.com.
Or you can use Austin Anderson's simpler version:
(.)(?!\1)(.)(?:\1\2)*\1?
Actually your regex is almost correct but problem is that you have enclosed your regex in 2 character classes and you need to match an optional 2nd character in the end.
You just need to use this regex:
public static int check(String a) {
if (a.length() < 2)
return 0;
char on = a.charAt(0);
char to = a.charAt(1);
if(on != to) {
String re = on+"("+to+on+")*"+to+"?|"+to+"("+on+to+")*"+on+"?";
System.out.println("re: " + re);
if(a.matches(re)) {
return a.length();
}
}
return 0;
}
Code Demo
I would like to mask the last 4 digits of the identity number (hkid)
A123456(7) -> A123***(*)
I can do this by below:
hkid.replaceAll("\\d{3}\\(\\d\\)", "***(*)")
However, can my regular expression really can match the last 4 digit and replace by "*"?
hkid.replaceAll(regex, "*")
Please help, thanks.
Jessie
Personally, I wouldn't do it with regular expressions:
char[] cs = hkid.toCharArray();
for (int i = cs.length - 1, d = 0; i >= 0 && d < 4; --i) {
if (Character.isDigit(cs[i])) {
cs[i] = '*';
++d;
}
}
String masked = new String(cs);
This goes from the end of the string, looking for digit characters, which it replaces with a *. Once it's found 4 (or reaches the start of the string), it stops iterating, and builds a new string.
While I agree that a non-regex solution is probably the simplest and fastest, here's a regex to catch the last 4 digits independent if there is a grouping ot not: \d(?=(?:\D*\d){0,3}\D*$)
This expression is meant to match any digit that is followed by 0 to 3 digits before hitting the end of the input.
A short breakdown of the expression:
\d matches a single digit
\D matches a single non-digit
(?=...) is a positive look-ahead that contributes to the match but isn't consumed
(?:...){0,3} is a non-capturing group with a quantity of 0 to 3 occurences given.
$ matches the end of the input
So you could read the expression as follows: "match a single digit if it is followed by a sequence of 0 to 3 times any number of non-digits which are followed by a single digit and that sequence is followed by any number of non-digits and the end of the input" (sounds complicated, no?).
Some results when using input.replaceAll( "\\d(?=(?:\\D*\\d){0,3}\\D*$)", "*" ):
input = "A1234567" -> output = "A123****"
input = "A123456(7)" -> output = "A123***(*)"
input = "A12345(67)" -> output = "A123**(**)"
input = "A1(234567)" -> output = "A1(23****)"
input = "A1234B567" -> output = "A123*B***"
As you can see in the last example the expression will match digits only. If you want to match letters as well either replace \d and \D with \w and \W (note that \w matches underscores as well) or use custom character classes, e.g. [02468] and [^02468] to match even digits only.
I am trying to write a REGEX to validate a string. It should validate to the requirement which is that it should have only Uppercase and lowercase English letters (a to z, A to Z) (ASCII: 65 to 90, 97 to 122) AND/OR Digits 0 to 9 (ASCII: 48 to 57) AND Characters - _ ~ (ASCII: 45, 95, 126). Provided that they are not the first or last character. It can also have Character. (dot, period, full stop) (ASCII: 46) Provided that it is not the first or last character, and provided also that it does not appear two or more times consecutively. I have tried using the following
Pattern.compile("^[^\\W_*]+((\\.?[\\w\\~-]+)*\\.?[^\\W_*])*$");
It works fine for smaller strings but it doesn't for long strings as i am experiencing thread hung issues and huge spikes in cpu. Please help.
Test cases for invalid strings:
"aB78."
"aB78..ab"
"aB78,1"
"aB78 abc"
".Abc12"
Test cases for valid strings:
"abc-def"
"a1b2c~3"
"012_345"
Your regex suffers from catastrophic backtracking, which leads to O(2n) (ie exponential) solution time.
Although following the link will provide a far more thorough explanation, briefly the problem is that when the input doesn't match, the engine backtracks the first * term to try different combinations of the quantitys of the terms, but because all groups more or less match the same thing, the number of combinations of ways to group grows exponentially with the length of the backtracking - which in the case of non- matching input is the entire input.
The solution is to rewrite the regex so it won't catastrophically backtrack:
don't use groups of groups
use possessive quantifiers eg .*+ (which never backtrack)
fail early on non-match (eg using an anchored negative look ahead)
limit the number of times terms may appear using {n,m} style quantifiers
Or otherwise mitigate the problem
Problem
It is due to catastrophic backtracking. Let me show where it happens, by simplifying the regex to a regex which matches a subset of the original regex:
^[^\W_*]+((\.?[\w\~-]+)*\.?[^\W_*])*$
Since [^\W_*] and [\w\~-] can match [a-z], let us replace them with [a-z]:
^[a-z]+((\.?[a-z]+)*\.?[a-z])*$
Since \.? are optional, let us remove them:
^[a-z]+(([a-z]+)*[a-z])*$
You can see ([a-z]+)*, which is the classical example of regex which causes catastrophic backtracking (A*)*, and the fact that the outermost repetition (([a-z]+)*[a-z])* can expand to ([a-z]+)*[a-z]([a-z]+)*[a-z]([a-z]+)*[a-z] further exacerbate the problem (imagine the number of permutation to split the input string to match all expansions that your regex can have). And this is not mentioning [a-z]+ in front, which adds insult to injury, since it is of the form A*A*.
Solution
You can use this regex to validate the string according to your conditions:
^(?=[a-zA-Z0-9])[a-zA-Z0-9_~-]++(\.[a-zA-Z0-9_~-]++)*+(?<=[a-zA-Z0-9])$
As Java string literal:
"^(?=[a-zA-Z0-9])[a-zA-Z0-9_~-]++(\\.[a-zA-Z0-9_~-]++)*+(?<=[a-zA-Z0-9])$"
Breakdown of the regex:
^ # Assert beginning of the string
(?=[a-zA-Z0-9]) # Must start with alphanumeric, no special
[a-zA-Z0-9_~-]++(\.[a-zA-Z0-9_~-]++)*+
(?<=[a-zA-Z0-9]) # Must end with alphanumeric, no special
$ # Assert end of the string
Since . can't appear consecutively, and can't start or end the string, we can consider it a separator between strings of [a-zA-Z0-9_~-]+. So we can write:
[a-zA-Z0-9_~-]++(\.[a-zA-Z0-9_~-]++)*+
All quantifiers are made possessive to reduce stack usage in Oracle's implementation and make the matching faster. Note that it is not appropriate to use them everywhere. Due to the way my regex is written, there is only one way to match a particular string to begin with, even without possessive quantifier.
Shorthand
Since this is Java and in default mode, you can shorten a-zA-Z0-9_ to \w and [a-zA-Z0-9] to [^\W_] (though the second one is a bit hard for other programmer to read):
^(?=[^\W_])[\w~-]++(\.[\w~-]++)*+(?<=[^\W_])$
As Java string literal:
"^(?=[^\\W_])[\\w~-]++(\\.[\\w~-]++)*+(?<=[^\\W_])$"
If you use the regex with String.matches(), the anchors ^ and $ can be removed.
As #MarounMaroun already commented, you don't really have a pattern. It might be better to iterate over the string as in the following method:
public static boolean validate(String string) {
char chars[] = string.toCharArray();
if (!isSpecial(chars[0]) && !isLetterOrDigit(chars[0]))
return false;
if (!isSpecial(chars[chars.length - 1])
&& !isLetterOrDigit(chars[chars.length - 1]))
return false;
for (int i = 1; i < chars.length - 1; ++i)
if (!isPunctiation(chars[i]) && !isLetterOrDigit(chars[i])
&& !isSpecial(chars[i]))
return false;
return true;
}
public static boolean isPunctiation(char c) {
return c == '.' || c == ',';
}
public static boolean isSpecial(char c) {
return c == '-' || c == '_' || c == '~';
}
public static boolean isLetterOrDigit(char c) {
return (Character.isDigit(c) || (Character.isLetter(c) && (Character
.getType(c) == Character.UPPERCASE_LETTER || Character
.getType(c) == Character.LOWERCASE_LETTER)));
}
Test code:
public static void main(String[] args) {
System.out.println(validate("aB78."));
System.out.println(validate("aB78..ab "));
System.out.println(validate("abcdef"));
System.out.println(validate("aB78,1"));
System.out.println(validate("aB78 abc"));
}
Output:
false
false
true
true
false
A solution should try and find negatives rather than try and match a pattern over the entire string.
Pattern bad = Pattern.compile( "[^-\\W.~]|\\.\\.|^\\.|\\.$" );
for( String str: new String[]{ "aB78.", "aB78..ab", "abcdef",
"aB78,1", "aB78 abc" } ){
Matcher mat = bad.matcher( str );
System.out.println( mat.find() );
}
(It is remarkable to see how the initial statement "string...should have only" leads programmers to try and create positive assertions by parsing or matching valid characters over the full length rather than the much simpler search for negatives.)