Regex to find an integer within a string

Regex to find an integer within a string - java

I'd like to use regex with Java.
What I want to do is find the first integer in a string.
Example:
String = "the 14 dogs ate 12 bones"
Would return 14.
String = "djakld;asjl14ajdka;sdj"
Would also return 14.
This is what I have so far.
Pattern intsOnly = Pattern.compile("\\d*");
Matcher makeMatch = intsOnly.matcher("dadsad14 dssaf jfdkasl;fj");
makeMatch.find();
String inputInt = makeMatch.group();
System.out.println(inputInt);
What am I doing wrong?

You're asking for 0 or more digits. You need to ask for 1 or more:
"\\d+"

It looks like the other solutions failed to handle +/- and cases like 2e3, which java.lang.Integer.parseInt(String) supports, so I'll take my go at the problem. I'm somewhat inexperienced at regex, so I may have made a few mistakes, used something that Java's regex parser doesn't support, or made it overly complicated, but the statements seemed to work in Kiki 0.5.6.
All regular expressions are provided in both an unescaped format for reading, and an escaped format that you can use as a string literal in Java.
To get a byte, short, int, or long from a string:
unescaped: ([\+-]?\d+)([eE][\+-]?\d+)?
escaped: ([\\+-]?\\d+)([eE][\\+-]?\\d+)?
...and for bonus points...
To get a double or float from a string:
unescaped: ([\+-]?\d(\.\d*)?|\.\d+)([eE][\+-]?(\d(\.\d*)?|\.\d+))?
escaped: ([\\+-]?\\d(\\.\\d*)?|\\.\d+)([eE][\\+-]?(\\d(\\.\\d*)?|\\.\\d+))?

Use one of them:
Pattern intsOnly = Pattern.compile("[0-9]+");
or
Pattern intsOnly = Pattern.compile("\\d+");

Heres a handy one I made for C# with generics. It will match based on your regular expression and return the types you need:
public T[] GetMatches<T>(string Input, string MatchPattern) where T : IConvertible
{
List<T> MatchedValues = new List<T>();
Regex MatchInt = new Regex(MatchPattern);
MatchCollection Matches = MatchInt.Matches(Input);
foreach (Match m in Matches)
MatchedValues.Add((T)Convert.ChangeType(m.Value, typeof(T)));
return MatchedValues.ToArray<T>();
}
then if you wanted to grab only the numbers and return them in an string[] array:
string Test = "22$data44abc";
string[] Matches = this.GetMatches<string>(Test, "\\d+");
Hopefully this is useful to someone...

In addition to what PiPeep said, if you are trying to match integers within an expression, so that 1 + 2 - 3 will only match 1, 2, and 3, rather than 1, + 2 and - 3, you actually need to use a lookbehind statement, and the part you want will actually be returned by Matcher.group(2) rather than just Matcher.group().
unescaped: ([0-9])?((?(1)(?:[\+-]?\d+)|)(?:[eE][\+-]?\d+)?)
escaped: ([0-9])?((?(1)(?:[\\+-]?\\d+)|)(?:[eE][\\+-]?\\d+)?)
Also, for things like someNumber - 3, where someNumber is a variable name or something like that, you can use
unescaped: (\w)?((?(1)(?:[\+-]?\d+)|)(?:[eE][\+-]?\d+)?)
escaped: (\\w)?((?(1)(?:[\\+-]?\\d+)|)(?:[eE][\\+-]?\\d+)?)
Although of course that wont work if you are parsing a string like The net change to blahblah was +4

the java spec actually gives this monster of a regex for parsing doubles.
however it is considered bad practice, just trying to parse with the intended type, and catching the error, tends to be slightly more readable.
DOUBLE_PATTERN = Pattern
.compile("[\\x00-\\x20]*[+-]?(NaN|Infinity|((((\\p{Digit}+)(\\.)?((\\p{Digit}+)?)"
+ "([eE][+-]?(\\p{Digit}+))?)|(\\.((\\p{Digit}+))([eE][+-]?(\\p{Digit}+))?)|"
+ "(((0[xX](\\p{XDigit}+)(\\.)?)|(0[xX](\\p{XDigit}+)?(\\.)(\\p{XDigit}+)))"
+ "[pP][+-]?(\\p{Digit}+)))[fFdD]?))[\\x00-\\x20]*");

Related

"Safe way" to use java matcher.replaceAll() / appendReplacement()

Most of the cases we replace string segments using regular expression, when the replacement text is a variable, so basically it is not known by the programmer.
However we always forget, that the behavior of java matcher.replaceAll() will very much dependent on the replacement itself. Thus the replacement should not contain any '$' or '\' characters, to provide a naive result.
E.g. the following code throw "java.lang.IndexOutOfBoundsException: No group 2" in case the variable salary equals "$2".
String salary = "$2";
Pattern p = Pattern.compile("SAL");
Matcher m = p.matcher("Salary: SAL");
String s = m.replaceAll(salary);
System.out.println(s);
I know, that if '$' sign is escaped with '\', then we will get the expected result. But then again, the '\' should be escaped with '\' as well. So the proper solution would be:
String salary = "$2";
Pattern p = Pattern.compile("SAL");
Matcher m = p.matcher("Salary: SAL");
String s = m.replaceAll(salary.replace("\\", "\\\\").replace("$", "\\$"));
System.out.println(s);
Now first of all this is not so convenient to use, but also not great performance-wise. (And the same stands for the appendReplacement() method.)
So can you please recommend some more generic solution for the problem?

In case if you only want to replace a specific substring with the specified literal replacement sequence, you can simply use String.replace(). Something like so:
String source = "Salary: SAL";
String target = "SAL";
String salary = "$2";
String result = source.replace(target, salary);
System.out.println(result); // prints "Salary: $2"
It is worth noting, that it only replaces literal substring sequences and won't work if target is a regex.

How to add a space after certain characters using regex Java

I have a string consisting of 18 digits Eg. 'abcdefghijklmnopqr'. I need to add a blank space after 5th character and then after 9th character and after 15th character making it look like 'abcde fghi jklmno pqr'. Can I achieve this using regular expression?
As regular expressions are not my cup of tea hence need help from regex gurus out here. Any help is appreciated.
Thanks in advance

Regex finds a match in a string and can't preform a replacement. You could however use regex to find a certain matching substring and replace that, but you would still need a separate method for replacement (making it a two step algorithm).
Since you're not looking for a pattern in your string, but rather just the n-th char, regex wouldn't be of much use, it would make it unnecessary complex.
Here are some ideas on how you could implement a solution:
Use an array of characters to avoid creating redundant strings: create a character array and copy characters from the string before
the given position, put the character at the position, copy the rest
of the characters from the String,... continue until you reach the end
of the string. After that construct the final string from that
array.
Use Substring() method: concatenate substring of the string before
the position, new character, substring of the string after the
position and before the next position,... and so on, until reaching the end of the original string.
Use a StringBuilder and its insert() method.
Note that:
First idea listed might not be a suitable solution for very large strings. It needs an auxiliary array, using additional space.
Second idea creates redundant strings. Strings are immutable and final in Java, and are stored in a pool. Creating
temporary strings should be avoided.

Yes you can use regex groups to achieve that. Something like that:
final Pattern pattern = Pattern.compile("([a-z]{5})([a-z]{4})([a-z]{6})([a-z]{3})");
final Matcher matcher = pattern.matcher("abcdefghijklmnopqr");
if (matcher.matches()) {
String first = matcher.group(0);
String second = matcher.group(1);
String third = matcher.group(2);
String fourth = matcher.group(3);
return first + " " + second + " " + third + " " + fourth;
} else {
throw new SomeException();
}
Note that pattern should be a constant, I used a local variable here to make it easier to read.
Compared to substrings, which would also work to achieve the desired result, regex also allow you to validate the format of your input data. In the provided example you check that it's a 18 characters long string composed of only lowercase letters.
If you had a more interesting examples, with for example a mix of letters and digits, you could check that each group contains the correct type of data with the regex.
You can also do a simpler version where you just replace with:
"abcdefghijklmnopqr".replaceAll("([a-z]{5})([a-z]{4})([a-z]{6})([a-z]{3})", "$1 $2 $3 $4")
But you don't have the benefit of checking because if the string doesn't match the format it will just not replaced and this is less efficient than substrings.
Here is an example solution using substrings which would be more efficient if you don't care about checking:
final Set<Integer> breaks = Set.of(5, 9, 15);
final String str = "abcdefghijklmnopqr";
final StringBuilder stringBuilder = new StringBuilder();
for (int i = 0; i < str.length(); i++) {
if (breaks.contains(i)) {
stringBuilder.append(' ');
}
stringBuilder.append(str.charAt(i));
}
return stringBuilder.toString();

String MUST contain a hexadecimal value - Regex for this? [duplicate]

I have never done regex before, and I have seen they are very useful for working with strings. I saw a few tutorials (for example) but I still cannot understand how to make a simple Java regex check for hexadecimal characters in a string.
The user will input in the text box something like: 0123456789ABCDEF and I would like to know that the input was correct otherwise if something like XTYSPG456789ABCDEF when return false.
Is it possible to do that with a regex or did I misunderstand how they work?

Yes, you can do that with a regular expression:
^[0-9A-F]+$
Explanation:
^ Start of line.
[0-9A-F] Character class: Any character in 0 to 9, or in A to F.
+ Quantifier: One or more of the above.
$ End of line.
To use this regular expression in Java you can for example call the matches method on a String:
boolean isHex = s.matches("[0-9A-F]+");
Note that matches finds only an exact match so you don't need the start and end of line anchors in this case. See it working online: ideone
You may also want to allow both upper and lowercase A-F, in which case you can use this regular expression:
^[0-9A-Fa-f]+$

May be you want to use the POSIX character class \p{XDigit}, so:
^\p{XDigit}+$
Additionally, if you plan to use the regular expression very often, it is recommended to use a constant in order to avoid recompile it each time, e.g.:
private static final Pattern REGEX_PATTERN =
Pattern.compile("^\\p{XDigit}+$");
public static void main(String[] args) {
String input = "0123456789ABCDEF";
System.out.println(
REGEX_PATTERN.matcher(input).matches()
); // prints "true"
}

Actually, the given answer is not totally correct. The problem arises because the numbers 0-9 are also decimal values. PART of what you have to do is to test for 00-99 instead of just 0-9 to ensure that the lower values are not decimal numbers. Like so:
^([0-9A-Fa-f]{2})+$
To say these have to come in pairs! Otherwise - the string is something else! :-)
Example:
(Pick one)
var a = "1e5";
var a = "10";
var a = "314159265";
If I used the accepted answer in a regular expression it would return TRUE.
var re1 = new RegExp( /^[0-9A-Fa-f]+$/ );
var re2 = new RegExp( /^([0-9A-Fa-f]{2})+$/ );
if( re1.test(a) ){ alert("#1 = This is a hex value!"); }
if( re2.test(a) ){ alert("#2 = This IS a hex string!"); }
else { alert("#2 = This is NOT a hex string!"); }
Note that the "10" returns TRUE in both cases. If an incoming string only has 0-9 you can NOT tell, easily if it is a hex value or a decimal value UNLESS there is a missing zero in front of off length strings (hex values always come in pairs - ie - Low byte/high byte). But values like "34" are both perfectly valid decimal OR hexadecimal numbers. They just mean two different things.
Also note that "3.14159265" is not a hex value no matter which test you do because of the period. But with the addition of the "{2}" you at least ensure it really is a hex string rather than something that LOOKS like a hex string.

How to check if a string contains only digits in Java

In Java for String class there is a method called matches, how to use this method to check if my string is having only digits using regular expression. I tried with below examples, but both of them returned me false as result.
String regex = "[0-9]";
String data = "23343453";
System.out.println(data.matches(regex));
String regex = "^[0-9]";
String data = "23343453";
System.out.println(data.matches(regex));

Try
String regex = "[0-9]+";
or
String regex = "\\d+";
As per Java regular expressions, the + means "one or more times" and \d means "a digit".
Note: the "double backslash" is an escape sequence to get a single backslash - therefore, \\d in a java String gives you the actual result: \d
References:
Java Regular Expressions
Java Character Escape Sequences
Edit: due to some confusion in other answers, I am writing a test case and will explain some more things in detail.
Firstly, if you are in doubt about the correctness of this solution (or others), please run this test case:
String regex = "\\d+";
// positive test cases, should all be "true"
System.out.println("1".matches(regex));
System.out.println("12345".matches(regex));
System.out.println("123456789".matches(regex));
// negative test cases, should all be "false"
System.out.println("".matches(regex));
System.out.println("foo".matches(regex));
System.out.println("aa123bb".matches(regex));
Question 1:
Isn't it necessary to add ^ and $ to the regex, so it won't match "aa123bb" ?
No. In java, the matches method (which was specified in the question) matches a complete string, not fragments. In other words, it is not necessary to use ^\\d+$ (even though it is also correct). Please see the last negative test case.
Please note that if you use an online "regex checker" then this may behave differently. To match fragments of a string in Java, you can use the find method instead, described in detail here:
Difference between matches() and find() in Java Regex
Question 2:
Won't this regex also match the empty string, "" ?*
No. A regex \\d* would match the empty string, but \\d+ does not. The star * means zero or more, whereas the plus + means one or more. Please see the first negative test case.
Question 3
Isn't it faster to compile a regex Pattern?
Yes. It is indeed faster to compile a regex Pattern once, rather than on every invocation of matches, and so if performance implications are important then a Pattern can be compiled and used like this:
Pattern pattern = Pattern.compile(regex);
System.out.println(pattern.matcher("1").matches());
System.out.println(pattern.matcher("12345").matches());
System.out.println(pattern.matcher("123456789").matches());

You can also use NumberUtil.isNumber(String str) from Apache Commons

Using regular expressions is costly in terms of performance. Trying to parse string as a long value is inefficient and unreliable, and may be not what you need.
What I suggest is to simply check if each character is a digit, what can be efficiently done using Java 8 lambda expressions:
boolean isNumeric = someString.chars().allMatch(x -> Character.isDigit(x));

One more solution, that hasn't been posted, yet:
String regex = "\\p{Digit}+"; // uses POSIX character class

You must allow for more than a digit (the + sign) as in:
String regex = "[0-9]+";
String data = "23343453";
System.out.println(data.matches(regex));

Long.parseLong(data)
and catch exception, it handles minus sign.
Although the number of digits is limited this actually creates a variable of the data which can be used, which is, I would imagine, the most common use-case.

We can use either Pattern.compile("[0-9]+.[0-9]+") or Pattern.compile("\\d+.\\d+"). They have the same meaning.
the pattern [0-9] means digit. The same as '\d'.
'+' means it appears more times.
'.' for integer or float.
Try following code:
import java.util.regex.Pattern;
public class PatternSample {
public boolean containNumbersOnly(String source){
boolean result = false;
Pattern pattern = Pattern.compile("[0-9]+.[0-9]+"); //correct pattern for both float and integer.
pattern = Pattern.compile("\\d+.\\d+"); //correct pattern for both float and integer.
result = pattern.matcher(source).matches();
if(result){
System.out.println("\"" + source + "\"" + " is a number");
}else
System.out.println("\"" + source + "\"" + " is a String");
return result;
}
public static void main(String[] args){
PatternSample obj = new PatternSample();
obj.containNumbersOnly("123456.a");
obj.containNumbersOnly("123456 ");
obj.containNumbersOnly("123456");
obj.containNumbersOnly("0123456.0");
obj.containNumbersOnly("0123456a.0");
}
}
Output:
"123456.a" is a String
"123456 " is a String
"123456" is a number
"0123456.0" is a number
"0123456a.0" is a String

According to Oracle's Java Documentation:
private static final Pattern NUMBER_PATTERN = Pattern.compile(
"[\\x00-\\x20]*[+-]?(NaN|Infinity|((((\\p{Digit}+)(\\.)?((\\p{Digit}+)?)" +
"([eE][+-]?(\\p{Digit}+))?)|(\\.((\\p{Digit}+))([eE][+-]?(\\p{Digit}+))?)|" +
"(((0[xX](\\p{XDigit}+)(\\.)?)|(0[xX](\\p{XDigit}+)?(\\.)(\\p{XDigit}+)))" +
"[pP][+-]?(\\p{Digit}+)))[fFdD]?))[\\x00-\\x20]*");
boolean isNumber(String s){
return NUMBER_PATTERN.matcher(s).matches()
}

Refer to org.apache.commons.lang3.StringUtils
public static boolean isNumeric(CharSequence cs) {
if (cs == null || cs.length() == 0) {
return false;
} else {
int sz = cs.length();
for(int i = 0; i < sz; ++i) {
if (!Character.isDigit(cs.charAt(i))) {
return false;
}
}
return true;
}
}

In Java for String class, there is a method called matches(). With help of this method you can validate the regex expression along with your string.
String regex = "^[\\d]{4}$";
String value = "1234";
System.out.println(data.matches(value));
The Explanation for the above regex expression is:-
^ - Indicates the start of the regex expression.
[] - Inside this you have to describe your own conditions.
\\\d - Only allows digits. You can use '\\d'or 0-9 inside the bracket both are same.
{4} - This condition allows exactly 4 digits. You can change the number according to your need.
$ - Indicates the end of the regex expression.
Note: You can remove the {4} and specify + which means one or more times, or * which means zero or more times, or ? which means once or none.
For more reference please go through this website: https://www.rexegg.com/regex-quickstart.html

Offical regex way
I would use this regex for integers:
^[-1-9]\d*$
This will also work in other programming languages because it's more specific and doesn't make any assumptions about how different programming languages may interpret or handle regex.
Also works in Java
\\d+
Questions regarding ^ and $
As #vikingsteve has pointed out in java, the matches method matches a complete string, not parts of a string. In other words, it is unnecessary to use ^\d+$ (even though it is the official way of regex).
Online regex checkers are more strict and therefore they will behave differently than how Java handles regex.

Try this part of code:
void containsOnlyNumbers(String str)
{
try {
Integer num = Integer.valueOf(str);
System.out.println("is a number");
} catch (NumberFormatException e) {
// TODO: handle exception
System.out.println("is not a number");
}
}

Regex to check string contains only Hex characters

I have never done regex before, and I have seen they are very useful for working with strings. I saw a few tutorials (for example) but I still cannot understand how to make a simple Java regex check for hexadecimal characters in a string.
The user will input in the text box something like: 0123456789ABCDEF and I would like to know that the input was correct otherwise if something like XTYSPG456789ABCDEF when return false.
Is it possible to do that with a regex or did I misunderstand how they work?

Yes, you can do that with a regular expression:
^[0-9A-F]+$
Explanation:
^ Start of line.
[0-9A-F] Character class: Any character in 0 to 9, or in A to F.
+ Quantifier: One or more of the above.
$ End of line.
To use this regular expression in Java you can for example call the matches method on a String:
boolean isHex = s.matches("[0-9A-F]+");
Note that matches finds only an exact match so you don't need the start and end of line anchors in this case. See it working online: ideone
You may also want to allow both upper and lowercase A-F, in which case you can use this regular expression:
^[0-9A-Fa-f]+$

May be you want to use the POSIX character class \p{XDigit}, so:
^\p{XDigit}+$
Additionally, if you plan to use the regular expression very often, it is recommended to use a constant in order to avoid recompile it each time, e.g.:
private static final Pattern REGEX_PATTERN =
Pattern.compile("^\\p{XDigit}+$");
public static void main(String[] args) {
String input = "0123456789ABCDEF";
System.out.println(
REGEX_PATTERN.matcher(input).matches()
); // prints "true"
}

Actually, the given answer is not totally correct. The problem arises because the numbers 0-9 are also decimal values. PART of what you have to do is to test for 00-99 instead of just 0-9 to ensure that the lower values are not decimal numbers. Like so:
^([0-9A-Fa-f]{2})+$
To say these have to come in pairs! Otherwise - the string is something else! :-)
Example:
(Pick one)
var a = "1e5";
var a = "10";
var a = "314159265";
If I used the accepted answer in a regular expression it would return TRUE.
var re1 = new RegExp( /^[0-9A-Fa-f]+$/ );
var re2 = new RegExp( /^([0-9A-Fa-f]{2})+$/ );
if( re1.test(a) ){ alert("#1 = This is a hex value!"); }
if( re2.test(a) ){ alert("#2 = This IS a hex string!"); }
else { alert("#2 = This is NOT a hex string!"); }
Note that the "10" returns TRUE in both cases. If an incoming string only has 0-9 you can NOT tell, easily if it is a hex value or a decimal value UNLESS there is a missing zero in front of off length strings (hex values always come in pairs - ie - Low byte/high byte). But values like "34" are both perfectly valid decimal OR hexadecimal numbers. They just mean two different things.
Also note that "3.14159265" is not a hex value no matter which test you do because of the period. But with the addition of the "{2}" you at least ensure it really is a hex string rather than something that LOOKS like a hex string.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Regex to find an integer within a string - java

You're asking for 0 or more digits. You need to ask for 1 or more: "\\d+"

Use one of them: Pattern intsOnly = Pattern.compile("[0-9]+"); or Pattern intsOnly = Pattern.compile("\\d+");

Related

"Safe way" to use java matcher.replaceAll() / appendReplacement()

How to add a space after certain characters using regex Java

String MUST contain a hexadecimal value - Regex for this? [duplicate]

How to check if a string contains only digits in Java

Regex to check string contains only Hex characters

Categories

Resources