Regex to check for a plus sign and digits - java

I want to check a string contains a "+" and digits only:
public boolean checkPhoneNumberIsValid(String arg) {
return Pattern.compile("\\+[0-9]").matcher(arg).find();
}
When I try "xxx", this correctly fails. When I try "+3531234567" it correcrtly passes. But when I try "+35312ccc34567" is incorrectly passes. Why is this?

The reason it's incorrectly passes is because you are matching only a single digit after +. You need to match entire string till the end and check whether it has only digits. Try following regex:
\\+[0-9]+$
or
\\+\d+$
+ at the end of the regex signifies: Match one or more occurrence of [0-9].
See this: https://regex101.com/r/0ufZPi/1
+3245edsfv //fail
+86569653386 //pass
+xxxx //fail
+234fsvfb7890 //fail

I think it is because the find method finds a single substring match in your argument. I would recommend the matches methods which checks that the full string matches:
public boolean checkPhoneNumberIsValid(String arg) {
return Pattern.matches("\\+[0-9]+", arg);
}
[0-9]+ means one or more digits. You were missing the + so you were only matching a single digit at the beginning of the string using the find method.
You can also see the Pattern.matches method here.
However, if you really want to use the find method you would have to use "\\+[0-9]+$" for the regular expression to force the find method to match the full string. $ means that the end of the string must be there.
Additionally, if you are planning on calling checkPhoneNumberIsValid often you should precompile the regular expression as it is more efficient as stated here:
private static final Pattern PHONE_NUMBER_REGEX = Pattern.compile("\\+[0-9]+");
public boolean checkPhoneNumberIsValid(String arg) {
return PHONE_NUMBER_REGEX.matcher(arg).matches();
}

Related

Regex detect if entire string is a placeholder

I am trying to write a regex which should detect
"Is the entire string a placeholder".
An example of a valid placeholder here is ${var}
An example of an invalid palceholder here is ${var}-sometext as the placeholder is just a part of the text
The regex I have currently is ^\$\{(.+)\}$
This works for normal cases.
for example
1
${var}
Regex Matches
Expected ✅
2
${var} txt
Regex Does Not Match
Expected ✅
even works for nested placeholders
3
${var-${nestedVar}}
Regex Matches
Expected ✅
Where this fails is if the strings begins and ends with a placeholder
for eg
4
${var1}-txt-${var2}
Regex Matches
NOT Expected ❌
Basically even though the entire string is not a placeholder, the regex treats it as one as it begins with ${ and ends with }
I can try solving it by replacing .+ with something like [^$]+ to exclude dollar, but that will break the nested use case in example 3.
How do I solve this?
EDIT
Adding some code for context
public static final Pattern PATTERN = Pattern.compile("^\\$\\{(.+)\\}$");
Matcher matcher = PATTERN.matcher(placeholder);
boolean isMatch = matcher.find();
From your example, I think you need to avoid greedy quantifier:
\$\{(.+?)\}
Notice the ? after + which are reluctant quantifier: https://docs.oracle.com/javase/tutorial/essential/regex/quant.html
That should match ${var1}-txt-${var2}
Now, if you use ^ and $ as well, this will fail.
Note that you could also use StringSubstitutor from commons-text to perform a similar job (it will handle the parsing and you may use a Lookup that capture the variable).
Edit for comment: given that Java regex don't support recursion, you would have to hard code part of recursion here if you wanted to match all your 4 cases:
\$\{([^{}-]+)(?:|-\$\{([^{}-]+)\})\}
The first part match a variable, ignoring {} and -. The other part match either an empty default value, either an interpolation.
If you need to catch ${a-${b-${c}}} you would have to add another layer which you should avoid: doing complex regex for the sake of doing complex regex will simply be a maintenance ache (with only one level of recursion the regexp above is hard to read)
If you need to handle recursion, I think you get no other alternative do it yourself with code as as below:
void parse(String pattern) {
if (pattern.startsWith("${") && pattern.endsWith("}")) {
// remove ${ and }
var content = pattern.substring(2, pattern.length() - 2 - 1);
var n = content.indexOf('-');
String leftVar = content;
if (n != -1) {
leftVar = content.substring(0, n);
// perform recursion
parse(content.substring(n+1));
}
// return whatever you need
}
Or use something that already exists.
static boolean isPlaceHolder(String s) {
return s.matches("\\$\\{[^}]*\\}");
}
or optimized for several uses:
private static final Pattern PLACE_HOLDER_PATTERN =
Pattern.compile("\\$\\{[^}]*\\}");
static boolean isPlaceHolder(String s) {
return PLACE_HOLDER_PATTERN.matcher(s).matches();
}
A matches does a match from begin to end, so no need for: ^...$. As opposed to find.
It still is tricky to detect as false: "${x}, ${y}". It would be best when the placeholder is just for a variable, \\w+.
It is not possible to match arbitrarily deep nested structures using regular expressions. The most you can do with a single regex is match a finite number of nested parts, though your pattern will probably be pretty ugly.
Another approach is to apply a simpler pattern many times, until you have an answer. For example:
Replace everything that matches \$\{[^}]*\} (or \$\{.*?\}) with nothing (the empty string)
Repeat until the pattern no longer matches
If the string is now empty, then the value was "valid".
If the string is not empty, then the value is "invalid".
private static final Pattern PATTERN = Pattern.compile("\\$\\{.*?\\}");
public boolean isValid(String value) {
while (true) {
String newValue = PATTERN.matcher(value).replaceAll("");
if (newValue.equals(value))
break;
value = newValue;
}
return value.isEmpty();
}

Regular Expressions: How do I single out a chain of characters with a precise length at the end of any string?

I have a string which can be any string, and I want to single it out if is ends in exactly two "s"
for example, "foo", "foos", "foosss", should all return false but "fooss" and "barss" should return true
The problem is if I use .*s{2} for a string like "foosss", java sees "foos" as the .* and "ss" as the s{2}
And if I use .*?s{2} for "foosss" java will see "foo" as the .* for a bit, which is what I want, but once it checks the rest of the string to see if it matches s{2} and it fails, it iterates to the next try
It's important that the beginning string can contain the letter "s", too. "sometexts" and "sometextsss" should return false but "sometextss" should return true
Simply checking whether (^|[^s])s{2}$ matches part of the string should work. $ asserts that the end of the string has been reached whereas ^ asserts that you're still at the beginning of the string. This is necessary to match just ss preceded by no non-s character.
You can use
Boolean result = text.matches(".*(?<!s)s{2}");
String#matches requires a full string match, so there is no need for anchors (\A/^ and $/\z).
The (?<!s) part is a negative lookbehind that fails the match if there is an s char immediately to the left of the current location.
See the regex demo.
However, you do not really need a regex here if you can use Java code:
Boolean result = !text.endsWith("sss") && text.endsWith("ss");
which literally means "if text does not end with sss and text ends with ss" the result is True.
.*[^s]ss
How i provided:
public static void main(String[] args) {
String[] strings = new String[]{"fooss", "barss", "foosss", "foos", "foo", "barssss"};
for (String s : strings) {
if (s.matches(".*[^s]ss")) {
System.out.println(s);
}
}
}
Output: fooss barss

How to match two string using java Regex

String 1= abc/{ID}/plan/{ID}/planID
String 2=abc/1234/plan/456/planID
How can I match these two strings using Java regex so that it returns true? Basically {ID} can contain anything. Java regex should match abc/{anything here}/plan/{anything here}/planID
If your "{anything here}" includes nothing, you can use .*. . matches any letter, and * means that match the string with any length with the letter before, including 0 length. So .* means that "match the string with any length, composed with any letter". If {anything here} should include at least one letter, you can use +, instead of *, which means almost the same, but should match at least one letter.
My suggestion: abc/.+/plan/.+/planID
If {ID} can contain anything I assume it can also be empty.
So this regex should work :
str.matches("^abc.*plan.*planID$");
^abc at the beginning
.* Zero or more of any Character
planID$ at the end
I am just writing a small code, just check it and start making changes as per you requirement. This is working, check for your other test cases, if there is any issue please comment that test case. Specifically I am using regex, because you want to match using java regex.
import java.util.regex.Matcher;
import java.util.regex.Pattern;
class MatchUsingRejex
{
public static void main(String args[])
{
// Create a pattern to be searched
Pattern pattern = Pattern.compile("abc/.+/plan/.+/planID");
// checking, Is pattern match or not
Matcher isMatch = pattern.matcher("abc/1234/plan/456/planID");
if (isMatch.find())
System.out.println("Yes");
else
System.out.println("No");
}
}
If line always starts with 'abc' and ends with 'planid' then following way will work:
String s1 = "abc/{ID}/plan/{ID}/planID";
String s2 = "abc/1234/plan/456/planID";
String pattern = "(?i)abc(?:/\\S+)+planID$";
boolean b1 = s1.matches(pattern);
boolean b2 = s2.matches(pattern);

Regular expression not matching on first and last word of string

I am trying to write a java program that will look for a specific words in a string. I have it working for the most part but it doesnt seem to match if the word to match is the first or last word in the string. Here is an example:
"trying to find the first word".matches(".*[^a-z]find[^a-z].*") //returns true
"trying to find the first word".matches(".*[^a-z]trying[^a-z].*") //returns false
"trying to find the first word".matches(".*[^a-z]word[^a-z].*") //returns false
Any idea how to make this match on any word in the string?
Thanks in advance,
Craig
The problem is your character class before and after the words [^a-z]- I think that what you actually want is a word boundary character \b (as per ColinD's comment) as opposed to not a character in the a-z range. As pointed out in the comments (thanks) you'll also needs to handle the start and end of string cases.
So try, eg:
"(?:^|.*\b)trying(?:\b.*|$)"
You can use the optional (?) , check below link and test more cases if this give proper output:
https://regex101.com/r/oP5zB8/1
(.*[^a-z]?trying[^a-z]?.*)
I think (^|^.*[^a-z])trying([^a-z].*$|$) just fits your need.
Or (?:^|^.*[^a-z])trying(?:[^a-z].*$|$) for non capturing parentheses.
You can try following program to check the existence on start and end of any string:
package com.ajsodhi.utilities;
import java.util.regex.Pattern;
public class RegExStartEndWordCheck {
public static final String stringToMatch = "StartingsomeWordsEndWord";
public static void main(String[] args) {
String regEx = "Starting[A-Za-z0-9]{0,}EndWord";
Pattern patternOriginalSign = Pattern.compile(regEx, Pattern.CASE_INSENSITIVE);
boolean OriginalStringMatchesPattern = patternOriginalSign.matcher(stringToMatch).matches();
System.out.println(OriginalStringMatchesPattern);
}
}
you should use the boundary \b that's specify a beginning or a ending of a word instead of [^a-z] which is not so logic.
Just something like
".*\\bfind\\b.*"

How to check if a string contains only digits in Java

In Java for String class there is a method called matches, how to use this method to check if my string is having only digits using regular expression. I tried with below examples, but both of them returned me false as result.
String regex = "[0-9]";
String data = "23343453";
System.out.println(data.matches(regex));
String regex = "^[0-9]";
String data = "23343453";
System.out.println(data.matches(regex));
Try
String regex = "[0-9]+";
or
String regex = "\\d+";
As per Java regular expressions, the + means "one or more times" and \d means "a digit".
Note: the "double backslash" is an escape sequence to get a single backslash - therefore, \\d in a java String gives you the actual result: \d
References:
Java Regular Expressions
Java Character Escape Sequences
Edit: due to some confusion in other answers, I am writing a test case and will explain some more things in detail.
Firstly, if you are in doubt about the correctness of this solution (or others), please run this test case:
String regex = "\\d+";
// positive test cases, should all be "true"
System.out.println("1".matches(regex));
System.out.println("12345".matches(regex));
System.out.println("123456789".matches(regex));
// negative test cases, should all be "false"
System.out.println("".matches(regex));
System.out.println("foo".matches(regex));
System.out.println("aa123bb".matches(regex));
Question 1:
Isn't it necessary to add ^ and $ to the regex, so it won't match "aa123bb" ?
No. In java, the matches method (which was specified in the question) matches a complete string, not fragments. In other words, it is not necessary to use ^\\d+$ (even though it is also correct). Please see the last negative test case.
Please note that if you use an online "regex checker" then this may behave differently. To match fragments of a string in Java, you can use the find method instead, described in detail here:
Difference between matches() and find() in Java Regex
Question 2:
Won't this regex also match the empty string, "" ?*
No. A regex \\d* would match the empty string, but \\d+ does not. The star * means zero or more, whereas the plus + means one or more. Please see the first negative test case.
Question 3
Isn't it faster to compile a regex Pattern?
Yes. It is indeed faster to compile a regex Pattern once, rather than on every invocation of matches, and so if performance implications are important then a Pattern can be compiled and used like this:
Pattern pattern = Pattern.compile(regex);
System.out.println(pattern.matcher("1").matches());
System.out.println(pattern.matcher("12345").matches());
System.out.println(pattern.matcher("123456789").matches());
You can also use NumberUtil.isNumber(String str) from Apache Commons
Using regular expressions is costly in terms of performance. Trying to parse string as a long value is inefficient and unreliable, and may be not what you need.
What I suggest is to simply check if each character is a digit, what can be efficiently done using Java 8 lambda expressions:
boolean isNumeric = someString.chars().allMatch(x -> Character.isDigit(x));
One more solution, that hasn't been posted, yet:
String regex = "\\p{Digit}+"; // uses POSIX character class
You must allow for more than a digit (the + sign) as in:
String regex = "[0-9]+";
String data = "23343453";
System.out.println(data.matches(regex));
Long.parseLong(data)
and catch exception, it handles minus sign.
Although the number of digits is limited this actually creates a variable of the data which can be used, which is, I would imagine, the most common use-case.
We can use either Pattern.compile("[0-9]+.[0-9]+") or Pattern.compile("\\d+.\\d+"). They have the same meaning.
the pattern [0-9] means digit. The same as '\d'.
'+' means it appears more times.
'.' for integer or float.
Try following code:
import java.util.regex.Pattern;
public class PatternSample {
public boolean containNumbersOnly(String source){
boolean result = false;
Pattern pattern = Pattern.compile("[0-9]+.[0-9]+"); //correct pattern for both float and integer.
pattern = Pattern.compile("\\d+.\\d+"); //correct pattern for both float and integer.
result = pattern.matcher(source).matches();
if(result){
System.out.println("\"" + source + "\"" + " is a number");
}else
System.out.println("\"" + source + "\"" + " is a String");
return result;
}
public static void main(String[] args){
PatternSample obj = new PatternSample();
obj.containNumbersOnly("123456.a");
obj.containNumbersOnly("123456 ");
obj.containNumbersOnly("123456");
obj.containNumbersOnly("0123456.0");
obj.containNumbersOnly("0123456a.0");
}
}
Output:
"123456.a" is a String
"123456 " is a String
"123456" is a number
"0123456.0" is a number
"0123456a.0" is a String
According to Oracle's Java Documentation:
private static final Pattern NUMBER_PATTERN = Pattern.compile(
"[\\x00-\\x20]*[+-]?(NaN|Infinity|((((\\p{Digit}+)(\\.)?((\\p{Digit}+)?)" +
"([eE][+-]?(\\p{Digit}+))?)|(\\.((\\p{Digit}+))([eE][+-]?(\\p{Digit}+))?)|" +
"(((0[xX](\\p{XDigit}+)(\\.)?)|(0[xX](\\p{XDigit}+)?(\\.)(\\p{XDigit}+)))" +
"[pP][+-]?(\\p{Digit}+)))[fFdD]?))[\\x00-\\x20]*");
boolean isNumber(String s){
return NUMBER_PATTERN.matcher(s).matches()
}
Refer to org.apache.commons.lang3.StringUtils
public static boolean isNumeric(CharSequence cs) {
if (cs == null || cs.length() == 0) {
return false;
} else {
int sz = cs.length();
for(int i = 0; i < sz; ++i) {
if (!Character.isDigit(cs.charAt(i))) {
return false;
}
}
return true;
}
}
In Java for String class, there is a method called matches(). With help of this method you can validate the regex expression along with your string.
String regex = "^[\\d]{4}$";
String value = "1234";
System.out.println(data.matches(value));
The Explanation for the above regex expression is:-
^ - Indicates the start of the regex expression.
[] - Inside this you have to describe your own conditions.
\\\d - Only allows digits. You can use '\\d'or 0-9 inside the bracket both are same.
{4} - This condition allows exactly 4 digits. You can change the number according to your need.
$ - Indicates the end of the regex expression.
Note: You can remove the {4} and specify + which means one or more times, or * which means zero or more times, or ? which means once or none.
For more reference please go through this website: https://www.rexegg.com/regex-quickstart.html
Offical regex way
I would use this regex for integers:
^[-1-9]\d*$
This will also work in other programming languages because it's more specific and doesn't make any assumptions about how different programming languages may interpret or handle regex.
Also works in Java
\\d+
Questions regarding ^ and $
As #vikingsteve has pointed out in java, the matches method matches a complete string, not parts of a string. In other words, it is unnecessary to use ^\d+$ (even though it is the official way of regex).
Online regex checkers are more strict and therefore they will behave differently than how Java handles regex.
Try this part of code:
void containsOnlyNumbers(String str)
{
try {
Integer num = Integer.valueOf(str);
System.out.println("is a number");
} catch (NumberFormatException e) {
// TODO: handle exception
System.out.println("is not a number");
}
}

Categories