Length of some characters in regex - java

I have following regex:
\+?[0-9\.,()\-\s]+$
which allows:
optional + at the beginning
then numbers, dots, commas, round brackets, dashes and white spaces.
In addition to that I need to make sure that amount of numbers and plus symbol (if exists) has length between 9 and 15 (so I'm not counting any special characters apart from + symbol).
And this last condition is what I'm having problem with.
valid inputs:
+358 (9) 1234567
+3 5 8.9,1-2(3)4..5,6.7 (25 characters but only 12 characters that counts (numbers and plus symbol))
invalid input:
+3 5 8.9,1-2(3)4..5,6.777777777 (33 characters and only 20 characters that counts (numbers and plus symbol) is too many)
It is important to use regex if possible because it's used in javax.validation.constraints.Pattern annotation as:
#Pattern(regexp = REGEX)
private String number;
where my REGEX is what I'm looking for here.
And if regex cannot be provided then it means that I need to rewrite my entity validation implementation. So is it possible to add such condition to regex or do I need a function to validate such pattern?

You may use
^(?=(?:[^0-9+]*[0-9+]){9,15}[^0-9+]*$)\+?[0-9.,()\s-]+$
See the regex demo
Details
^ - start of string
(?=(?:[^0-9+]*[0-9+]){9,15}[^0-9+]*$) - a positive lookahead whose pattern must match for the regex to find a match:
(?:[^0-9+]*[0-9+]){9,15} - 9 to 15 repetitions of
[^0-9+]* - any 0+ chars other than digits and + symbol
[0-9+] - a digit or +
[^0-9+]* - 0+ chars other than digits and +
$ - end of string
\+? - an optional + symbol
[0-9.,()\s-]+ - 1 or more digits, ., ,, (, ), whitespace and - chars
$ - end of string.
In Java, when used with matches(), the ^ and $ anchors may be omitted:
s.matches("(?=(?:[^0-9+]*[0-9+]){9,15}[^0-9+]*$)\\+?[0-9.,()\\s-]+")

Not using regex, you could simply loop and count the numbers and +s:
int count = 0;
for (int i = 0; i < str.length(); i++) {
if (Character.isDigit(str.charAt(i)) || str.charAt(i) == '+') {
count++;
}
}

Since you're using Java, I wouldn't rely solely on a regex here:
String input = "+123,456.789";
int count = input.replaceAll("[^0-9+]", "").length();
if (input.matches("^\\+?[0-9.,()\\-\\s]+$") && count >= 9 && count <= 15) {
System.out.println("PASS");
}
else {
System.out.println("FAIL");
}
This approach allows us to just use straightaway your original regex. We handle the length requirements of numbers (and maybe plus) using Java string calls.

Related

Regex for negative number with leading zeros and words with apastrophe

hey I need a regex that removes the leadings zeros.
right now I am using this code . it does work it just doesn't keep the negative symbol.
String regex = "^+(?!$)";
String numbers = txaTexte.getText().replaceAll(regex, ")
after that I split numbers so it puts the numbers in a array.
input :
-0005
0003
-87
output :
-5
3
-87
I was also wondering what regex I could use to get this.
the words before the arrow are input and after is the output
the text is in french. And right now I am using this it works but not with the apostrophe.
String [] tab = txaTexte.getText().split("(?:(?<![a-zA-Z])'|'(?![a-zA-Z])|[^a-zA-Z'])+")
Un beau JOUR. —> Un/beau/JOUR
La boîte crânienne —> La/boîte/crânienne
C’était mieux aujourd’hui —> C’/était/mieux/aujourd’hui
qu’autrefois —> qu’/autrefois
D’hier jusqu’à demain! —> D’/hier/jusqu’/à/demain
Dans mon sous-sol—> Dans/mon/sous-sol
You might capture an optional hyphen, then match 1+ more times a zero and 1 capture 1 or more digits in group 2 starting with a digit 1-9
^(-?)0+([1-9]\d*)$
^ Start of string
(-?) Capture group 1, match optional hyphen
0+ Match 0+ zeroes
([1-9]\d*) Capture group 2, match 1+ digits starting with a digit 1-9
$ End of string
See a regex demo.
In the replacement use group 1 and group 2.
String regex = "^(-?)0+([1-9]\\d*)$";
String text = "-0005";
String numbers = txaTexte.getText().replaceAll(regex, "$1$2");
Here is one way. This preserves the sign.
capture the optional sign.
check for 0 or more leading zeros
followed by 1 or more digits.
String regex = "^([+-])?0*(\\d+)";
String [] data = {"-1415", "+2924", "-0000123", "000322", "+000023"};
for (String num : data) {
String after = num.replaceAll(regex, "$1$2");
System.out.printf("%8s --> %s%n", num , after);
}
prints
-1415 --> -1415
+2924 --> +2924
-0000123 --> -123
000322 --> 322
+000023 --> +23
If you want to keep -000, 000, +0000 etc. as just 0, try this regex:
`^[-+]?0*(0)$|^([-+])?0*(\d+)$`
Break down:
^...$ means the entire string should match (^ is the start of the string, $ is the end)
...|... is an alternative
[-+] is a character class that contains only the plus and minus characters. Note that - has a special meaning ("range") in character classes if it's not the first or last character
(...) is a capturing group which can be referenced in the replacement string by $number where number is the 1-based and 1-digit position of the group within the regex (the first group to start is no. 1 etc.)
?, * and + are quantifiers when used outside character classes meaning "0 or 1 occurence" (?), "any number of occurences, including none" (*) and "at least one occurence" (+)
^[-+]?0*(0)$ thus means: the entire string must be an optional sign, followed by any number of zeros and ending with a single zero which is captured as group 1.
alternatively ^([-+])?0*(\d+)$ means the entire string must be an optional sign which is captured as group 2, followed by any number of zeros and ending in at least one digit which is captured as group 3.
This regex can then be used with String.replaceAll(regex, "$1$2$3") in order to keep only the single 0 from group 1 or the optional sign and the number without leading zeros from groups 2 and 3. Any empty groups will result empty strings, that's why this works.
However, regular expressions can be slow, especially if you have to process a lot of strings.
One thing to improve this would be to compile the pattern only once:
//compile the pattern once and reuse it
Pattern p = Pattern.compile("^[+-]?0*(0)$|^([+-])?0*(\\d+)$");
//build a matcher from the pattern and the input string, and do the replacement
String number = p.matcher(txaTexte.getText()).replaceAll("$1$2$3");
If you're working on a large number of strings (> 10000) you might want to use some specialized plain parsing without regex. Consider something like this, which on my machine is about 10x faster than the regex approach with reused pattern:
public static String stripLeadingZeros(String s) {
//nothing to do, return the string as is
if( s == null || s.isEmpty() ) {
return s;
}
char[] chars = s.toCharArray();
int usedChars = 0;
//check if the first character is the sign
boolean hasSign = false;
if(chars[0] == '-' || chars[0] == '+') {
hasSign = true;
usedChars++;
//special case: just a sign
if(chars.length == 1) {
return s;
}
}
//process the rest of the characters
boolean stripZeros = true;
for( int i = usedChars; i < chars.length; i++) {
//not a digit, this isn't a simple integer, stop processing and keep the original string
if( chars[i] < '0' || chars[i] > '9') {
return s;
}
//are we still in zero-stripping mode
if( stripZeros) {
if( chars[i] == '0') {
continue; //check next char
}
//we've found a non-zero char, keep it and end zero-stripping mode
if(chars[i] >= '1' && chars[i] <= '9') {
stripZeros = false;
}
}
//since we are ignoring leading zeros, we just move all digits of the actual number to the left
chars[usedChars++] = chars[i];
}
//handle special case of number 0 (with optional sign)
if( usedChars == (hasSign ? 1 : 0)) {
chars[0] = '0';
usedChars = 1;
}
return new String(chars,0, usedChars);
}

Mask mobile number in Java [duplicate]

I would like to mask the last 4 digits of the identity number (hkid)
A123456(7) -> A123***(*)
I can do this by below:
hkid.replaceAll("\\d{3}\\(\\d\\)", "***(*)")
However, can my regular expression really can match the last 4 digit and replace by "*"?
hkid.replaceAll(regex, "*")
Please help, thanks.
Jessie
Personally, I wouldn't do it with regular expressions:
char[] cs = hkid.toCharArray();
for (int i = cs.length - 1, d = 0; i >= 0 && d < 4; --i) {
if (Character.isDigit(cs[i])) {
cs[i] = '*';
++d;
}
}
String masked = new String(cs);
This goes from the end of the string, looking for digit characters, which it replaces with a *. Once it's found 4 (or reaches the start of the string), it stops iterating, and builds a new string.
While I agree that a non-regex solution is probably the simplest and fastest, here's a regex to catch the last 4 digits independent if there is a grouping ot not: \d(?=(?:\D*\d){0,3}\D*$)
This expression is meant to match any digit that is followed by 0 to 3 digits before hitting the end of the input.
A short breakdown of the expression:
\d matches a single digit
\D matches a single non-digit
(?=...) is a positive look-ahead that contributes to the match but isn't consumed
(?:...){0,3} is a non-capturing group with a quantity of 0 to 3 occurences given.
$ matches the end of the input
So you could read the expression as follows: "match a single digit if it is followed by a sequence of 0 to 3 times any number of non-digits which are followed by a single digit and that sequence is followed by any number of non-digits and the end of the input" (sounds complicated, no?).
Some results when using input.replaceAll( "\\d(?=(?:\\D*\\d){0,3}\\D*$)", "*" ):
input = "A1234567" -> output = "A123****"
input = "A123456(7)" -> output = "A123***(*)"
input = "A12345(67)" -> output = "A123**(**)"
input = "A1(234567)" -> output = "A1(23****)"
input = "A1234B567" -> output = "A123*B***"
As you can see in the last example the expression will match digits only. If you want to match letters as well either replace \d and \D with \w and \W (note that \w matches underscores as well) or use custom character classes, e.g. [02468] and [^02468] to match even digits only.

Regex for detecting repeating symbols

I'm looking for the regex expression that will detect repeating symbols in a String. And currently I didn't found solution that fits all my requirements.
Requirements are pretty simple:
detect any repeating symbol in a String;
to be able to setup repeating count (eg. more than twice)
Examples of required detection (of symbol 'a', more than 2 times, true if detects, false otherwise)
"Abcdefg" - false
"AbcdaBCD" - false
"abcd_ab_ab" - true (symbol 'a' used three times)
"aabbaabb" - true (symbols 'a' used four times)
Since I'm not a pro in regex and usage of them - code snippet and explanation would be appreciated!
Thanks!
I think that
(.).*\1
would work:
(.) match a single character and capture
.* match any intervening characters
\1 match the captured group again.
(You'd need to compile with the DOTALL flag, or replace . with [\s\S] or similar if the string contains characters not ordinarily matched by .)
and if you want to require that it is found at least 3 times, just change the quantifier of the second two bullets:
(.)(.*\1){2}
etc.
This is going to be pretty inefficient, though, because it's going to have to do the "search for the next matching character" between every character in the string and the end of the string, making it at least quadratic.
You might be as well off not using regular expressions, e.g.
char[] cs = str.toCharArray();
Arrays.sort(cs);
int n = numOccurrencesRequired - 1;
for (int i = n; i < cs.length; ++i) {
boolean allSame = true;
for (int j = 1; j <= n && allSame; ++j) {
allSame = cs[i] == cs[i - j];
}
if (allSame) return true;
}
return false;
This sorts all of the same characters together, allowing you just to pass over the string once looking for adjacent equal characters.
Note that this doesn't quite work for any symbol: it will split up multi-char codepoints like 🍕. You can adapt the code above to work with codepoints, rather than chars.
Try this regex: (.)(?:.*\1)
It basically matches any character (.) is followed by anything .* and itself \1. If you want to check for 2 or more repeats only add {n,} at the end with n being the number of repeats you want to check for.
Yea, such regex exists but just because the set of characters is finite.
regex: .*(a.*a|b.*b|c.*c|...|y.*y|z.*z).*
It makes no sense. Use another approach:
String string = "something";
int[] count = new int[256];
for (int i = 0; i < string.length; i++) {
int temp = int(string.charAt(i));
count[temp]++;
}
Now you have all characters counted and you can use them as you wish.

java regular expression examples for match without length limitation

i trying to write a regular expression for match a string starting with letter "G" and second index should be any number (0-9) and rest of the string can be contain any thing and can be any length,
i'm stuck in following code
String[] array = { "DA4545", "G121", "G8756942", "N45", "4578", "#45565" };
String regExp = "^[G]\\d[0-9]";
for(int i = 0; i < array.length; i++)
{
if(Pattern.matches(regExp, array[i]))
{
System.out.println(array[i] + " - Successful");
}
}
output:
G12 - Successful
why is not match the 3 index "G8756942"
G - the letter G
[0-9] - a digit
.* - any sequence of characters
So the expression
G[0-9].*
will match a letter G followed by a digit followed by any sequence of characters.
when you write \d it already means [0-9]
so when you say \d[0-9] that means two digits exactly
better use :
^G\\d*
which will match all words starting with G and having zero or more digits
"^[G]\\d[0-9]"
This regex matches "G" followed by \\d, then another number.
Use one of these:
"^G\\d"
"^G[0-9]"
Also note that you don't need a character class since it only contains one letter, so it's redundant.
try this regex .* will match any character after digit
^G\\d.*
http://regex101.com/r/uE4tX1/1
why is not match the 3 index "G8756942"
Because you match for a string starting with G, followed by a \, a d and exactly one digit. Solution:
^[G]\d
This regex would be fine.
"G\\d.*"
Because matches method tries to match the whole input, you need to add .* at the last in your pattern and also you don't need to include anchors.
String[] array = { "DA4545", "G121", "G8756942", "N45", "4578", "#45565" };
String regExp = "G\\d.*";
for(int i = 0; i < array.length; i++)
{
if(Pattern.matches(regExp, array[i]))
{
System.out.println(array[i] + " - Successful");
}
}
Output:
G121 - Successful
G8756942 - Successful

Is there a way to stop a RegEx before a character value and start another one after that character?

I am trying to remove numbers before a character such as a-z or *, /, +, -, and then remove any numbers following that character but before a different character. Here is what I have.
s= s.replaceAll("(\\d+)", "");
s= s.replace("*", r.toString());
Where s is the string that I need to read, and r is the result of the operation.
The * is arbitrary. It could be any char. previously mentioned
The problem with this is that it removes every number in the string.
If I were to iterate once with the input of:
26 + 4 - 2
The program returns this:
30 -
It deletes all three numbers and then replaces the "+" with 30.
I would like to change it to resemble this (with one iteration):
26 + 4 - 2
The first RegEx would delete the first set of numbers
+ 4 - 2
The second would remove the numbers after the operator, but before the next operator
+ - 2
The next statement would replace the operator with the result of the expression
30 - 2
I would like the same for problems with other functions such as sine, cosine, etc.
Note: Sine is 'a'
"Sin pi" is the same as "a pi"
After one iteration it should look like
a pi + 2
a + 2
0 + 2
Here is a sample of the code.
This is the Multiply "case"
case '*':
{
int m = n + 1;
while (m < result.length){
if (result[m] != '*' && result[m] != '/' && result[m] != '+' && result[m] != '-'){ //checks the item to see if it is numeric
char ch2 = result[m]; //makes the number a character
number3 += new String(new char[]{ch2}); //combines the character into a string. For example: '2' + '3' = "23".
++m;}
else {
break;
}}
resultNumber = (Double.parseDouble(number2) * Double.parseDouble(number3)); //"number2" holds the value of the numbers before the operator. Example: This number ----> "3" '*' "23"
equation = equation.replaceAll("(\\d+)", ""); // <---- Line I pulled out earlier that I want to change.
equation = equation.replace("*", resultNumber.toString()); // <----- Line I pulled out earlier
result = equation.toCharArray();
number3 = ""; //erases any number held
number2 = ""; //erases any number held
++n;
break;
}
I'll first suggest two alternate approaches, then answer your question as it stands.
Perhaps beter without regular expressions
I have many doubts about your application. A proper tokenizer (lexer), together with a very simple parser would likely do a better job and give clearer error messages than your code.
Matching all operands
Even if you were to use regular expressions, it might make more sense to match both operands in a single pass. I.e. match (\d+)\s*\*\s*(\d+) to match a multiplication of exactly two numbers. You could first search for a match, then extract the operands from the capturing groups, then compute the resulting value and finally glue together substrings including the result:
// Multiplication of unparenthesized integers
Pattern p = Pattern.compile("(\\d+)\\s*\\*\\s*(\\d+)");
Matcher m = p.matcher(s);
while (m.find()) {
int a = Integer.parseInt(m.group(1));
int b = Integer.parseInt(m.group(2));
s = s.substring(0, m.start(1)) + (a*b) + s.substring(m.end(2));
m.reset(s);
}
Answer to the question as it was phrased in the title
Regarding the exact formulation of your question:
Is there a way to stop a RegEx before a character value and start another one after that character?
If you want a regex to not match after a given character in the input, you can achieve that by a negative look-behind assertion. Likewise, to only match after a given character, you can use a positive look-behind assertion.
So a regex starting in (?<!\*.*) would only match up to the first occurrence of '*', whereas a regex starting in (?<=\*.*) would only match after the first occurrence of that character. Both would have to be compiled using DOTALL, or in a more complicated form like (?<!\*(?:\n|.*)*).
But ensuring that these matches correspond to the math you have in mind would likely be very tricky.

Categories