Regex - how to extract integers only not float from text - java

Given a text:
Why should the number 12.8 be rounded to 13. It must be rather 11
What must be a regex to extract, the integer values only:
13
11
I tried this: \d+(?!\\.)
But still no luck.

You need to use lookarounds (lookbehind, lookahead) to check what happens before and after the digits you match:
a naive approach:
(?<![0-9]|[0-9]\.)[0-9]+(?!\.?[0-9])
an efficient approach:
[0-9](?<![0-9][0-9]|[0-9]\.[0-9])[0-9]*+(?!\.[0-9])
(Because this one quickly discards positions where there is not a digit)
Note: don't forget to escape the backslashes in the java string.
You can also write it like this:
\b[0-9](?<![0-9]\.[0-9])[0-9]*+(?!\.[0-9])

I solved applying two regex. The command line bellow shows how they work:
echo "Why number 12.8 be rounded to 13. It must be rather 11" | grep -Po '\b\d+\.?\d\b' | grep -Po '^\d+$'
The first regex select all numbers, including floating points. The second regex selects only integers.
In java, use "\\b\\d+\\.?\\d\\b" to select all numbers, and "^\\d+$" to select only integers.

Related

Need efficient way to get particular value from given string [duplicate]

I need a regular expression that validates a number, but doesn't require a digit after the decimal.
ie.
123
123.
123.4
would all be valid
123..
would be invalid
Any would be greatly appreciated!
Use the following:
/^\d*\.?\d*$/
^ - Beginning of the line;
\d* - 0 or more digits;
\.? - An optional dot (escaped, because in regex, . is a special character);
\d* - 0 or more digits (the decimal part);
$ - End of the line.
This allows for .5 decimal rather than requiring the leading zero, such as 0.5
/\d+\.?\d*/
One or more digits (\d+), optional period (\.?), zero or more digits (\d*).
Depending on your usage or regex engine you may need to add start/end line anchors:
/^\d+\.?\d*$/
Debuggex Demo
You need a regular expression like the following to do it properly:
/^[+-]?((\d+(\.\d*)?)|(\.\d+))$/
The same expression with whitespace, using the extended modifier (as supported by Perl):
/^ [+-]? ( (\d+ (\.\d*)?) | (\.\d+) ) $/x
or with comments:
/^ # Beginning of string
[+-]? # Optional plus or minus character
( # Followed by either:
( # Start of first option
\d+ # One or more digits
(\.\d*)? # Optionally followed by: one decimal point and zero or more digits
) # End of first option
| # or
(\.\d+) # One decimal point followed by one or more digits
) # End of grouping of the OR options
$ # End of string (i.e. no extra characters remaining)
/x # Extended modifier (allows whitespace & comments in regular expression)
For example, it will match:
123
23.45
34.
.45
-123
-273.15
-42.
-.45
+516
+9.8
+2.
+.5
And will reject these non-numbers:
. (single decimal point)
-. (negative decimal point)
+. (plus decimal point)
(empty string)
The simpler solutions can incorrectly reject valid numbers or match these non-numbers.
this matches all requirements:
^\d+(\.\d+)?$
Try this regex:
\d+\.?\d*
\d+ digits before optional decimal
.? optional decimal(optional due to the ? quantifier)
\d* optional digits after decimal
I ended up using the following:
^\d*\.?\d+$
This makes the following invalid:
.
3.
This is what I did. It's more strict than any of the above (and more correct than some):
^0$|^[1-9]\d*$|^\.\d+$|^0\.\d*$|^[1-9]\d*\.\d*$
Strings that passes:
0
0.
1
123
123.
123.4
.0
.0123
.123
0.123
1.234
12.34
Strings that fails:
.
00000
01
.0.
..
00.123
02.134
you can use this:
^\d+(\.\d)?\d*$
matches:
11
11.1
0.2
does not match:
.2
2.
2.6.9
^[+-]?(([1-9][0-9]*)?[0-9](\.[0-9]*)?|\.[0-9]+)$
should reflect what people usually think of as a well formed decimal number.
The digits before the decimal point can be either a single digit, in which case it can be from 0 to 9, or more than one digits, in which case it cannot start with a 0.
If there are any digits present before the decimal sign, then the decimal and the digits following it are optional. Otherwise, a decimal has to be present followed by at least one digit. Note that multiple trailing 0's are allowed after the decimal point.
grep -E '^[+-]?(([1-9][0-9]*)?[0-9](\.[0-9]*)?|\.[0-9]+)$'
correctly matches the following:
9
0
10
10.
0.
0.0
0.100
0.10
0.01
10.0
10.10
.0
.1
.00
.100
.001
as well as their signed equivalents, whereas it rejects the following:
.
00
01
00.0
01.3
and their signed equivalents, as well as the empty string.
What language? In Perl style: ^\d+(\.\d*)?$
What you asked is already answered so this is just an additional info for those who want only 2 decimal digits if optional decimal point is entered:
^\d+(\.\d{2})?$
^ : start of the string
\d : a digit (equal to [0-9])
+ : one and unlimited times
Capturing Group (.\d{2})?
? : zero and one times
. : character .
\d : a digit (equal to [0-9])
{2} : exactly 2 times
$ : end of the string
1 : match
123 : match
123.00 : match
123. : no match
123.. : no match
123.0 : no match
123.000 : no match
123.00.00 : no match
try this. ^[0-9]\d{0,9}(\.\d{1,3})?%?$ it is tested and worked for me.
Regular expression:
^\d+((.)|(.\d{0,1})?)$
use \d+ instead of \d{0,1} if you want to allow more then one number use \d{0,2} instead of \d{0,1} if you want to allow up to two numbers after coma. See the example below for reference:
or
^\d+((.)|(.\d{0,2})?)$
or
^\d+((.)|(.\d+)?)$
Explanation
(These are generated by regex101)
^ asserts position at start of a line
\d matches a digit (equivalent to [0-9])
+ matches the previous token between one and unlimited times, as many times as possible, giving back as needed (greedy)
1st Capturing Group ((.)|(.\d{0,1})?)
1st Alternative (.)
2nd Capturing Group (.)
. matches any character (except for line terminators)
2nd Alternative (.\d{0,1})?
3rd Capturing Group (.\d{0,1})?
? matches the previous token between zero and one times, as many times as possible, giving back as needed (greedy)
. matches any character (except for line terminators)
\d matches a digit (equivalent to [0-9])
{0,1} matches the previous token between zero and one times, as many times as possible, giving back as needed (greedy)
$ asserts position at the end of a line
Sandbox
Play with regex here: https://regex101.com/
(?<![^d])\d+(?:\.\d+)?(?![^d])
clean and simple.
This uses Suffix and Prefix, RegEx features.
It directly returns true - false for IsMatch condition
^\d+(()|(\.\d+)?)$
Came up with this. Allows both integer and decimal, but forces a complete decimal (leading and trailing numbers) if you decide to enter a decimal.
In Perl, use Regexp::Common which will allow you to assemble a finely-tuned regular expression for your particular number format. If you are not using Perl, the generated regular expression can still typically be used by other languages.
Printing the result of generating the example regular expressions in Regexp::Common::Number:
$ perl -MRegexp::Common=number -E 'say $RE{num}{int}'
(?:(?:[-+]?)(?:[0123456789]+))
$ perl -MRegexp::Common=number -E 'say $RE{num}{real}'
(?:(?i)(?:[-+]?)(?:(?=[.]?[0123456789])(?:[0123456789]*)(?:(?:[.])(?:[0123456789]{0,}))?)(?:(?:[E])(?:(?:[-+]?)(?:[0123456789]+))|))
$ perl -MRegexp::Common=number -E 'say $RE{num}{real}{-base=>16}'
(?:(?i)(?:[-+]?)(?:(?=[.]?[0123456789ABCDEF])(?:[0123456789ABCDEF]*)(?:(?:[.])(?:[0123456789ABCDEF]{0,}))?)(?:(?:[G])(?:(?:[-+]?)(?:[0123456789ABCDEF]+))|))
For those who wanna match the same thing as JavaScript does:
[-+]?(\d+\.?\d*|\.\d+)
Matches:
1
+1
-1
0.1
-1.
.1
+.1
Drawing: https://regexper.com/#%5B-%2B%5D%3F%28%5Cd%2B%5C.%3F%5Cd*%7C%5C.%5Cd%2B%29

Replace everything except positive/negative numbers

There are many questions already answered for replacing everything positive numbers. But, I couldn't find any answer which preserves both positive and negative numbers. I want to replace everything that isn't number(positive or negative). The output should be the following eg.
0 | success. id - 1234| -> 0 1234
and
-10 | failure. id - 2345| -> -10 2345
Apparently this answers for the positive part.
You can use this regex to match positive/negative integers:
[+-]?\b\d+\b
RegEx Demo
to match positive/negative numbers including decimals:
[+-]?\b\d+(?:\.\d+)?\b
Please note that rather than using replace you would be better off using above regex in Pattern and Matcher APIs and just get your matched data.
In case you can only use replace then use:
str = str.replaceAll( "([+-]?\\b\\d+\\b)|\\S+[ \\t]*", "$1" );
Replace Demo
I used this in Kotlin to replace all non-Double characters before parsing to a Double:
val double = str.replace("[^0-9.-]".toRegex(), "").toDoubleOrNull()

Regex for thousand separated numbers

How can I verify with regex in Java if a number is thousand separated (for example with dot)?
Of course it doesn't have to accept any negative number. I've already Googled all around and so far the best I found was [1-9]?\.[0-9]*. However, it's not perfect. For example it accepts 1.000000000 which is not correct.
How can I verify a positive number with a dot thousand separator? For example the number: 1.024.553 or 100.000
It should accept:
123
123.123
0
12.111
But not:
00
kukac
0.111
1...1
1.1
You could use this pattern:
^\d+|\d{1,3}(?:\.\d{3})*$
This will match any simple sequence of digits without thousands separators, or any sequence with . separators between every 3 digits. If you also want to support a comma as a thousands separator, use this:
^\d+|\d{1,3}(?:[,.]\d{3})*$
Of course, to use any of these in Java, you'll need to escape the \ characters:
String pattern = "^\\d+|\\d{1,3}(?:\\.\\d{3})*$";
Update Given your updated specs, I'd recommend this pattern:
^(?:0|[1-9][0-9]{0,2}(?:\.[0-9]{3})*)$
You can test it here: Regex Tester

Regex to match any number (Real, rational along with signs)

I've written a regex to match any number:
Positive and Negative
Decimal
Real Numbers
The following regex does well but there's one drawback
([\+\-]{1}){0,1}?[\d]*(\.{1})?[\\d]*
It is positive for inputs such as + or - as well. Any pointers will be greatly appreciated. Thanks.
The regex should work with the following inputs
5, +5, -5, 0.5, +0.5, -0.5, .5, +.5, -.5
and shouldn't match the following inputs
+
-
+.
-.
.
Here is the answer by tchrist, works perfectly.
(?:(?i)(?:[+-]?)(?:(?=[.]?[0-9])(?:[0-9]*)(?:(?:[.])(?:[0-9]{0,}))?)(?:(?:[E])(?:(?:[+-]?)(?:[0-9]+))|))
If you want something that looks like a C float, here’s how to tickle Perl into coughing out a regex that does that, using the Regexp::Common module from CPAN:
$ perl -MRegexp::Common -le 'print $RE{num}{real}'
(?:(?i)(?:[+-]?)(?:(?=[.]?[0123456789])(?:[0123456789]*)(?:(?:[.])(?:[0123456789]{0,}))?)(?:(?:[E])(?:(?:[+-]?)(?:[0123456789]+))|))
You can tune that a bit if you want, but that gives you the basic idea.
It’s really remarkably flexible. For example, this spits out a pattern for base-2 real numbers taht allow commas every three places:
$ perl -MRegexp::Common -le 'print $RE{num}{real}{-base => 2}{-sep => ","}{-group => 3}'
(?:(?i)(?:[+-]?)(?:(?=[.]?[01])(?:[01]{1,3}(?:(?:[,])[01]{3})*)(?:(?:[.])(?:[01]{0,}))?)(?:(?:[E])(?:(?:[+-]?)(?:[01]+))|))
The documentation shows that the full possible syntax for the numeric patterns it can spit out for you is:
$RE{num}{int}{-base}{-sep}{-group}{-places}
$RE{num}{real}{-base}{-radix}{-places}{-sep}{-group}{-expon}
$RE{num}{dec}{-radix}{-places}{-sep}{-group}{-expon}
$RE{num}{oct}{-radix}{-places}{-sep}{-group}{-expon}
$RE{num}{bin}{-radix}{-places}{-sep}{-group}{-expon}
$RE{num}{hex}{-radix}{-places}{-sep}{-group}{-expon}
$RE{num}{decimal}{-base}{-radix}{-places}{-sep}{-group}
$RE{num}{square}
$RE{num}{roman}
Making it really to customize it for whatever you want. And yes, of course you can use these patterns in Java.
Enjoy.
You need to require at least one digit, i.e. using + instead of * for the \d.
I think you can also drop the {1} in several places since this is implied by default
Similarly {0,1} can be dropped when followed by ?
Giving us:
regex = "[+-]?(\\d+|\\d*\\.?\\d+)";
I think this should do it:
[+-]?\d*(\.\d+)?
EDIT:
I've improved it so it will not match the dot on -123. but it will for 123.456
EDIT2:
So it doesn't match only + or -, you can check that such a sign must precede either a dot or a number, the dot being optional.
[+-]?(?=[\.?\d])\d*(\.\d+)?

RegEx - Testing for 9 adjacent identical numbers

I am trying to test a string (in java) for 9 adjacent identical numbers ... I can test for adjacent identical numbers - but only the next adjacent number ....
boolean result = string.matches("/([0-9])\1/g");
I want to match 9 characters - anyone able to help me ?
Thanks
EDIT : Some examples
"1111111111" should match
"222222222" should match
"3311111111133" should match
"1234567890" should not match
Try this regex: ([0-9])\1{8}
Java Greedy quantifier:
X{n} machtches: X, exactly n times
X would be any number = [0-9] and n would be 9
([0-9]{9})
edit:
This will match 9 identical numbers:
([0-9]\1{8})
[0-9] machtes any number
\1 is the first match, which is performed
\1{8} matches 8 times the first match
I found Kirill Polishchuk answer fine and good. However in rare case if you have to print 9 adjacent characters which are separated by space ( like in a text file), you can do the following:
Input:
1111111111 4444444444 4455555555558899 567833333333339
The regex may be modified as follows:
'([0-9]) ?\1{8}' If you have only one space between columns.
or
'([0-9]) *\1{1}' If you have uneven number of spaces between columns.
If you want to use with grep you can do it this way:
grep -E '([0-9]) ?\1{8}'
or
grep -E '([0-9]) *\1{8}'
Hope this helps.

Categories