How to check if a string contains only digits in Java - java

In Java for String class there is a method called matches, how to use this method to check if my string is having only digits using regular expression. I tried with below examples, but both of them returned me false as result.
String regex = "[0-9]";
String data = "23343453";
System.out.println(data.matches(regex));
String regex = "^[0-9]";
String data = "23343453";
System.out.println(data.matches(regex));

Try
String regex = "[0-9]+";
or
String regex = "\\d+";
As per Java regular expressions, the + means "one or more times" and \d means "a digit".
Note: the "double backslash" is an escape sequence to get a single backslash - therefore, \\d in a java String gives you the actual result: \d
References:
Java Regular Expressions
Java Character Escape Sequences
Edit: due to some confusion in other answers, I am writing a test case and will explain some more things in detail.
Firstly, if you are in doubt about the correctness of this solution (or others), please run this test case:
String regex = "\\d+";
// positive test cases, should all be "true"
System.out.println("1".matches(regex));
System.out.println("12345".matches(regex));
System.out.println("123456789".matches(regex));
// negative test cases, should all be "false"
System.out.println("".matches(regex));
System.out.println("foo".matches(regex));
System.out.println("aa123bb".matches(regex));
Question 1:
Isn't it necessary to add ^ and $ to the regex, so it won't match "aa123bb" ?
No. In java, the matches method (which was specified in the question) matches a complete string, not fragments. In other words, it is not necessary to use ^\\d+$ (even though it is also correct). Please see the last negative test case.
Please note that if you use an online "regex checker" then this may behave differently. To match fragments of a string in Java, you can use the find method instead, described in detail here:
Difference between matches() and find() in Java Regex
Question 2:
Won't this regex also match the empty string, "" ?*
No. A regex \\d* would match the empty string, but \\d+ does not. The star * means zero or more, whereas the plus + means one or more. Please see the first negative test case.
Question 3
Isn't it faster to compile a regex Pattern?
Yes. It is indeed faster to compile a regex Pattern once, rather than on every invocation of matches, and so if performance implications are important then a Pattern can be compiled and used like this:
Pattern pattern = Pattern.compile(regex);
System.out.println(pattern.matcher("1").matches());
System.out.println(pattern.matcher("12345").matches());
System.out.println(pattern.matcher("123456789").matches());

You can also use NumberUtil.isNumber(String str) from Apache Commons

Using regular expressions is costly in terms of performance. Trying to parse string as a long value is inefficient and unreliable, and may be not what you need.
What I suggest is to simply check if each character is a digit, what can be efficiently done using Java 8 lambda expressions:
boolean isNumeric = someString.chars().allMatch(x -> Character.isDigit(x));

One more solution, that hasn't been posted, yet:
String regex = "\\p{Digit}+"; // uses POSIX character class

You must allow for more than a digit (the + sign) as in:
String regex = "[0-9]+";
String data = "23343453";
System.out.println(data.matches(regex));

Long.parseLong(data)
and catch exception, it handles minus sign.
Although the number of digits is limited this actually creates a variable of the data which can be used, which is, I would imagine, the most common use-case.

We can use either Pattern.compile("[0-9]+.[0-9]+") or Pattern.compile("\\d+.\\d+"). They have the same meaning.
the pattern [0-9] means digit. The same as '\d'.
'+' means it appears more times.
'.' for integer or float.
Try following code:
import java.util.regex.Pattern;
public class PatternSample {
public boolean containNumbersOnly(String source){
boolean result = false;
Pattern pattern = Pattern.compile("[0-9]+.[0-9]+"); //correct pattern for both float and integer.
pattern = Pattern.compile("\\d+.\\d+"); //correct pattern for both float and integer.
result = pattern.matcher(source).matches();
if(result){
System.out.println("\"" + source + "\"" + " is a number");
}else
System.out.println("\"" + source + "\"" + " is a String");
return result;
}
public static void main(String[] args){
PatternSample obj = new PatternSample();
obj.containNumbersOnly("123456.a");
obj.containNumbersOnly("123456 ");
obj.containNumbersOnly("123456");
obj.containNumbersOnly("0123456.0");
obj.containNumbersOnly("0123456a.0");
}
}
Output:
"123456.a" is a String
"123456 " is a String
"123456" is a number
"0123456.0" is a number
"0123456a.0" is a String

According to Oracle's Java Documentation:
private static final Pattern NUMBER_PATTERN = Pattern.compile(
"[\\x00-\\x20]*[+-]?(NaN|Infinity|((((\\p{Digit}+)(\\.)?((\\p{Digit}+)?)" +
"([eE][+-]?(\\p{Digit}+))?)|(\\.((\\p{Digit}+))([eE][+-]?(\\p{Digit}+))?)|" +
"(((0[xX](\\p{XDigit}+)(\\.)?)|(0[xX](\\p{XDigit}+)?(\\.)(\\p{XDigit}+)))" +
"[pP][+-]?(\\p{Digit}+)))[fFdD]?))[\\x00-\\x20]*");
boolean isNumber(String s){
return NUMBER_PATTERN.matcher(s).matches()
}

Refer to org.apache.commons.lang3.StringUtils
public static boolean isNumeric(CharSequence cs) {
if (cs == null || cs.length() == 0) {
return false;
} else {
int sz = cs.length();
for(int i = 0; i < sz; ++i) {
if (!Character.isDigit(cs.charAt(i))) {
return false;
}
}
return true;
}
}

In Java for String class, there is a method called matches(). With help of this method you can validate the regex expression along with your string.
String regex = "^[\\d]{4}$";
String value = "1234";
System.out.println(data.matches(value));
The Explanation for the above regex expression is:-
^ - Indicates the start of the regex expression.
[] - Inside this you have to describe your own conditions.
\\\d - Only allows digits. You can use '\\d'or 0-9 inside the bracket both are same.
{4} - This condition allows exactly 4 digits. You can change the number according to your need.
$ - Indicates the end of the regex expression.
Note: You can remove the {4} and specify + which means one or more times, or * which means zero or more times, or ? which means once or none.
For more reference please go through this website: https://www.rexegg.com/regex-quickstart.html

Offical regex way
I would use this regex for integers:
^[-1-9]\d*$
This will also work in other programming languages because it's more specific and doesn't make any assumptions about how different programming languages may interpret or handle regex.
Also works in Java
\\d+
Questions regarding ^ and $
As #vikingsteve has pointed out in java, the matches method matches a complete string, not parts of a string. In other words, it is unnecessary to use ^\d+$ (even though it is the official way of regex).
Online regex checkers are more strict and therefore they will behave differently than how Java handles regex.

Try this part of code:
void containsOnlyNumbers(String str)
{
try {
Integer num = Integer.valueOf(str);
System.out.println("is a number");
} catch (NumberFormatException e) {
// TODO: handle exception
System.out.println("is not a number");
}
}

Related

Regex detect if entire string is a placeholder

I am trying to write a regex which should detect
"Is the entire string a placeholder".
An example of a valid placeholder here is ${var}
An example of an invalid palceholder here is ${var}-sometext as the placeholder is just a part of the text
The regex I have currently is ^\$\{(.+)\}$
This works for normal cases.
for example
1
${var}
Regex Matches
Expected ✅
2
${var} txt
Regex Does Not Match
Expected ✅
even works for nested placeholders
3
${var-${nestedVar}}
Regex Matches
Expected ✅
Where this fails is if the strings begins and ends with a placeholder
for eg
4
${var1}-txt-${var2}
Regex Matches
NOT Expected ❌
Basically even though the entire string is not a placeholder, the regex treats it as one as it begins with ${ and ends with }
I can try solving it by replacing .+ with something like [^$]+ to exclude dollar, but that will break the nested use case in example 3.
How do I solve this?
EDIT
Adding some code for context
public static final Pattern PATTERN = Pattern.compile("^\\$\\{(.+)\\}$");
Matcher matcher = PATTERN.matcher(placeholder);
boolean isMatch = matcher.find();
From your example, I think you need to avoid greedy quantifier:
\$\{(.+?)\}
Notice the ? after + which are reluctant quantifier: https://docs.oracle.com/javase/tutorial/essential/regex/quant.html
That should match ${var1}-txt-${var2}
Now, if you use ^ and $ as well, this will fail.
Note that you could also use StringSubstitutor from commons-text to perform a similar job (it will handle the parsing and you may use a Lookup that capture the variable).
Edit for comment: given that Java regex don't support recursion, you would have to hard code part of recursion here if you wanted to match all your 4 cases:
\$\{([^{}-]+)(?:|-\$\{([^{}-]+)\})\}
The first part match a variable, ignoring {} and -. The other part match either an empty default value, either an interpolation.
If you need to catch ${a-${b-${c}}} you would have to add another layer which you should avoid: doing complex regex for the sake of doing complex regex will simply be a maintenance ache (with only one level of recursion the regexp above is hard to read)
If you need to handle recursion, I think you get no other alternative do it yourself with code as as below:
void parse(String pattern) {
if (pattern.startsWith("${") && pattern.endsWith("}")) {
// remove ${ and }
var content = pattern.substring(2, pattern.length() - 2 - 1);
var n = content.indexOf('-');
String leftVar = content;
if (n != -1) {
leftVar = content.substring(0, n);
// perform recursion
parse(content.substring(n+1));
}
// return whatever you need
}
Or use something that already exists.
static boolean isPlaceHolder(String s) {
return s.matches("\\$\\{[^}]*\\}");
}
or optimized for several uses:
private static final Pattern PLACE_HOLDER_PATTERN =
Pattern.compile("\\$\\{[^}]*\\}");
static boolean isPlaceHolder(String s) {
return PLACE_HOLDER_PATTERN.matcher(s).matches();
}
A matches does a match from begin to end, so no need for: ^...$. As opposed to find.
It still is tricky to detect as false: "${x}, ${y}". It would be best when the placeholder is just for a variable, \\w+.
It is not possible to match arbitrarily deep nested structures using regular expressions. The most you can do with a single regex is match a finite number of nested parts, though your pattern will probably be pretty ugly.
Another approach is to apply a simpler pattern many times, until you have an answer. For example:
Replace everything that matches \$\{[^}]*\} (or \$\{.*?\}) with nothing (the empty string)
Repeat until the pattern no longer matches
If the string is now empty, then the value was "valid".
If the string is not empty, then the value is "invalid".
private static final Pattern PATTERN = Pattern.compile("\\$\\{.*?\\}");
public boolean isValid(String value) {
while (true) {
String newValue = PATTERN.matcher(value).replaceAll("");
if (newValue.equals(value))
break;
value = newValue;
}
return value.isEmpty();
}

How to match two string using java Regex

String 1= abc/{ID}/plan/{ID}/planID
String 2=abc/1234/plan/456/planID
How can I match these two strings using Java regex so that it returns true? Basically {ID} can contain anything. Java regex should match abc/{anything here}/plan/{anything here}/planID
If your "{anything here}" includes nothing, you can use .*. . matches any letter, and * means that match the string with any length with the letter before, including 0 length. So .* means that "match the string with any length, composed with any letter". If {anything here} should include at least one letter, you can use +, instead of *, which means almost the same, but should match at least one letter.
My suggestion: abc/.+/plan/.+/planID
If {ID} can contain anything I assume it can also be empty.
So this regex should work :
str.matches("^abc.*plan.*planID$");
^abc at the beginning
.* Zero or more of any Character
planID$ at the end
I am just writing a small code, just check it and start making changes as per you requirement. This is working, check for your other test cases, if there is any issue please comment that test case. Specifically I am using regex, because you want to match using java regex.
import java.util.regex.Matcher;
import java.util.regex.Pattern;
class MatchUsingRejex
{
public static void main(String args[])
{
// Create a pattern to be searched
Pattern pattern = Pattern.compile("abc/.+/plan/.+/planID");
// checking, Is pattern match or not
Matcher isMatch = pattern.matcher("abc/1234/plan/456/planID");
if (isMatch.find())
System.out.println("Yes");
else
System.out.println("No");
}
}
If line always starts with 'abc' and ends with 'planid' then following way will work:
String s1 = "abc/{ID}/plan/{ID}/planID";
String s2 = "abc/1234/plan/456/planID";
String pattern = "(?i)abc(?:/\\S+)+planID$";
boolean b1 = s1.matches(pattern);
boolean b2 = s2.matches(pattern);

Regular Expression problem in Java

I am trying to create a regular expression for the replaceAll method in Java. The test string is abXYabcXYZ and the pattern is abc. I want to replace any symbol except the pattern with +. For example the string abXYabcXYZ and pattern [^(abc)] should return ++++abc+++, but in my case it returns ab++abc+++.
public static String plusOut(String str, String pattern) {
pattern= "[^("+pattern+")]" + "".toLowerCase();
return str.toLowerCase().replaceAll(pattern, "+");
}
public static void main(String[] args) {
String text = "abXYabcXYZ";
String pattern = "abc";
System.out.println(plusOut(text, pattern));
}
When I try to replace the pattern with + there is no problem - abXYabcXYZ with pattern (abc) returns abxy+xyz. Pattern (^(abc)) returns the string without replacement.
Is there any other way to write NOT(regex) or group symbols as a word?
What you are trying to achieve is pretty tough with regular expressions, since there is no way to express “replace strings not matching a pattern”. You will have to use a “positive” pattern, telling what to match instead of what not to match.
Furthermore, you want to replace every character with a replacement character, so you have to make sure that your pattern matches exactly one character. Otherwise, you will replace whole strings with a single character, returning a shorter string.
For your toy example, you can use negative lookaheads and lookbehinds to achieve the task, but this may be more difficult for real-world examples with longer or more complex strings, since you will have to consider each character of your string separately, along with its context.
Here is the pattern for “not ‘abc’”:
[^abc]|a(?!bc)|(?<!a)b|b(?!c)|(?<!ab)c
It consists of five sub-patterns, connected with “or” (|), each matching exactly one character:
[^abc] matches every character except a, b or c
a(?!bc) matches a if it is not followed by bc
(?<!a)b matches b if it is not preceded with a
b(?!c) matches b if it is not followed by c
(?<!ab)c matches c if it is not preceded with ab
The idea is to match every character that is not in your target word abc, plus every word character that, according to the context, is not part of your word. The context can be examined using negative lookaheads (?!...) and lookbehinds (?<!...).
You can imagine that this technique will fail once you have a target word containing one character more than once, like example. It is pretty hard to express “match e if it is not followed by x and not preceded by l”.
Especially for dynamic patterns, it is by far easier to do a positive search and then replace every character that did not match in a second pass, as others have suggested here.
[^ ... ] will match one character that is not any of ...
So your pattern "[^(abc)]" is saying "match one character that is not a, b, c or the left or right bracket"; and indeed that is what happens in your test.
It is hard to say "replace all characters that are not part of the string 'abc'" in a single trivial regular expression. What you might do instead to achieve what you want could be some nasty thing like
while the input string still contains "abc"
find the next occurrence of "abc"
append to the output a string containing as many "+"s as there are characters before the "abc"
append "abc" to the output string
skip, in the input string, to a position just after the "abc" found
append to the output a string containing as many "+"s as there are characters left in the input
or possibly if the input alphabet is restricted you could use regular expressions to do something like
replace all occurrences of "abc" with a single character that does not occur anywhere in the existing string
replace all other characters with "+"
replace all occurrences of the target character with "abc"
which will be more readable but may not perform as well
Negating regexps is usually troublesome. I think you might want to use negative lookahead. Something like this might work:
String pattern = "(?<!ab).(?!abc)";
I didn't test it, so it may not really work for degenerate cases. And the performance might be horrible too. It is probably better to use a multistep algorithm.
Edit: No I think this won't work for every case. You will probably spend more time debugging a regexp like this than doing it algorithmically with some extra code.
Try to solve it without regular expressions:
String out = "";
int i;
for(i=0; i<text.length() - pattern.length() + 1; ) {
if (text.substring(i, i + pattern.length()).equals(pattern)) {
out += pattern;
i += pattern.length();
}
else {
out += "+";
i++;
}
}
for(; i<text.length(); i++) {
out += "+";
}
Rather than a single replaceAll, you could always try something like:
#Test
public void testString() {
final String in = "abXYabcXYabcHIH";
final String expected = "xxxxabcxxabcxxx";
String result = replaceUnwanted(in);
assertEquals(expected, result);
}
private String replaceUnwanted(final String in) {
final Pattern p = Pattern.compile("(.*?)(abc)([^a]*)");
final Matcher m = p.matcher(in);
final StringBuilder out = new StringBuilder();
while (m.find()) {
out.append(m.group(1).replaceAll(".", "x"));
out.append(m.group(2));
out.append(m.group(3).replaceAll(".", "x"));
}
return out.toString();
}
Instead of using replaceAll(...), I'd go for a Pattern/Matcher approach:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Main {
public static String plusOut(String str, String pattern) {
StringBuilder builder = new StringBuilder();
String regex = String.format("((?:(?!%s).)++)|%s", pattern, pattern);
Matcher m = Pattern.compile(regex).matcher(str.toLowerCase());
while(m.find()) {
builder.append(m.group(1) == null ? pattern : m.group().replaceAll(".", "+"));
}
return builder.toString();
}
public static void main(String[] args) {
String text = "abXYabcXYZ";
String pattern = "abc";
System.out.println(plusOut(text, pattern));
}
}
Note that you'll need to use Pattern.quote(...) if your String pattern contains regex meta-characters.
Edit: I didn't see a Pattern/Matcher approach was already suggested by toolkit (although slightly different)...

How do I know if a regexp has more than one possible match?

I am writing Java code that has to distinguish regular expressions with more than one possible match from regular expressions that have only one possible match.
For example:
"abc." can have several matches ("abc1", abcf", ...),
while "abcd" can only match "abcd".
Right now my best idea was to look for all unescaped regexp special characters.
I am convinced that there is a better way to do it in Java. Ideas?
(Late addition):
To make things clearer - there is NO specific input to test against. A good solution for this problem will have to test the regex itself.
In other words, I need a method who'se signature may look something like this:
boolean isSingleResult(String regex)
This method should return true if only for one possible String s1. The expression s1.matches(regex) will return true. (See examples above.)
This sounds dirty, but it might be worth having a look at the Pattern class in the Java source code.
Taking a quick peek, it seems like it 'normalize()'s the given regex (Line 1441), which could turn the expression into something a little more predictable. I think reflection can be used to tap into some private resources of the class (use caution!). It could be possible that while tokenizing the regex pattern, there are specific indications if it has reached some kind "multi-matching" element in the pattern.
Update
After having a closer look, there is some data within package scope that you can use to leverage the work of the Pattern tokenizer to walk through the nodes of the regex and check for multiple-character nodes.
After compiling the regular expression, iterate through the compiled "Node"s starting at Pattern.root. Starting at line 3034 of the class, there are the generalized types of nodes. For example class Pattern.All is multi-matching, while Pattern.SingleI or Pattern.SliceI are single-matching, and so on.
All these token classes appear to be in package scope, so it should be possible to do this without using reflection, but instead creating a java.util.regex.PatternHelper class to do the work.
Hope this helps.
If it can only have one possible match it isn't reeeeeally an expression, now, is it? I suspect your best option is to use a different tool altogether, because this does not at all sound like a job for regular expressions, but if you insist, well, no, I'd say your best option is to look for unescaped special characters.
The only regular expression that can ONLY match one input string is one that specifies the string exactly. So you need to match expressions with no wildcard characters or character groups AND that specify a start "^" and end "$" anchor.
"the quick" matches:
"the quick brownfox"
"the quick brown dog"
"catch the quick brown fox"
"^the quick brown fox$" matches ONLY:
"the quick brown fox"
Now I understand what you mean. I live in Belgium...
So this is something what work on most expressions. I wrote this by myself. So maybe I forgot some rules.
public static final boolean isSingleResult(String regexp) {
// Check the exceptions on the exceptions.
String[] exconexc = "\\d \\D \\w \\W \\s \\S".split(" ");
for (String s : exconexc) {
int index = regexp.indexOf(s);
if (index != -1) // Forbidden char found
{
return false;
}
}
// Then remove all exceptions:
String regex = regexp.replaceAll("\\\\.", "");
// Now, all the strings how can mean more than one match
String[] mtom = "+ . ? | * { [:alnum:] [:word:] [:alpha:] [:blank:] [:cntrl:] [:digit:] [:graph:] [:lower:] [:print:] [:punct:] [:space:] [:upper:] [:xdigit:]".split(" ");
// iterate all mtom-Strings
for (String s : mtom) {
int index = regex.indexOf(s);
if (index != -1) // Forbidden char found
{
return false;
}
}
return true;
}
Martijn
I see that the only way is to check if regexp matches multiple times for particular input.
package com;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class AAA {
public static void main(String[] args) throws Exception {
String input = "123 321 443 52134 432";
Pattern pattern = Pattern.compile("\\d+");
Matcher matcher = pattern.matcher(input);
int i = 0;
while (matcher.find()) {
++i;
}
System.out.printf("Matched %d times%n", i);
}
}

How to check a string starts with numeric number?

I have a string which contains alphanumeric character.
I need to check whether the string is started with number.
Thanks,
See the isDigit(char ch) method:
https://docs.oracle.com/javase/1.5.0/docs/api/java/lang/Character.html
and pass it to the first character of the String using the String.charAt() method.
Character.isDigit(myString.charAt(0));
Sorry I didn't see your Java tag, was reading question only. I'll leave my other answers here anyway since I've typed them out.
Java
String myString = "9Hello World!";
if ( Character.isDigit(myString.charAt(0)) )
{
System.out.println("String begins with a digit");
}
C++:
string myString = "2Hello World!";
if (isdigit( myString[0]) )
{
printf("String begins with a digit");
}
Regular expression:
\b[0-9]
Some proof my regex works: Unless my test data is wrong?
I think you ought to use a regex:
import java.util.regex.*;
public class Test {
public static void main(String[] args) {
String neg = "-123abc";
String pos = "123abc";
String non = "abc123";
/* I'm not sure if this regex is too verbose, but it should be
* clear. It checks that the string starts with either a series
* of one or more digits... OR a negative sign followed by 1 or
* more digits. Anything can follow the digits. Update as you need
* for things that should not follow the digits or for floating
* point numbers.
*/
Pattern pattern = Pattern.compile("^(\\d+.*|-\\d+.*)");
Matcher matcher = pattern.matcher(neg);
if(matcher.matches()) {
System.out.println("matches negative number");
}
matcher = pattern.matcher(pos);
if (matcher.matches()) {
System.out.println("positive matches");
}
matcher = pattern.matcher(non);
if (!matcher.matches()) {
System.out.println("letters don't match :-)!!!");
}
}
}
You may want to adjust this to accept floating point numbers, but this will work for negatives. Other answers won't work for negatives because they only check the first character! Be more specific about your needs and I can help you adjust this approach.
This should work:
String s = "123foo";
Character.isDigit(s.charAt(0));
System.out.println(Character.isDigit(mystring.charAt(0));
EDIT: I searched for java docs, looked at methods on string class which can get me 1st character & looked at methods on Character class to see if it has any method to check such a thing.
I think, you could do the same before asking it.
EDI2: What I mean is, try to do things, read/find & if you can't find anything - ask.
I made a mistake when posting it for the first time. isDigit is a static method on Character class.
Use a regex like ^\d

Categories