Regex to match any number (Real, rational along with signs) - java

I've written a regex to match any number:
Positive and Negative
Decimal
Real Numbers
The following regex does well but there's one drawback
([\+\-]{1}){0,1}?[\d]*(\.{1})?[\\d]*
It is positive for inputs such as + or - as well. Any pointers will be greatly appreciated. Thanks.
The regex should work with the following inputs
5, +5, -5, 0.5, +0.5, -0.5, .5, +.5, -.5
and shouldn't match the following inputs
+
-
+.
-.
.
Here is the answer by tchrist, works perfectly.
(?:(?i)(?:[+-]?)(?:(?=[.]?[0-9])(?:[0-9]*)(?:(?:[.])(?:[0-9]{0,}))?)(?:(?:[E])(?:(?:[+-]?)(?:[0-9]+))|))

If you want something that looks like a C float, here’s how to tickle Perl into coughing out a regex that does that, using the Regexp::Common module from CPAN:
$ perl -MRegexp::Common -le 'print $RE{num}{real}'
(?:(?i)(?:[+-]?)(?:(?=[.]?[0123456789])(?:[0123456789]*)(?:(?:[.])(?:[0123456789]{0,}))?)(?:(?:[E])(?:(?:[+-]?)(?:[0123456789]+))|))
You can tune that a bit if you want, but that gives you the basic idea.
It’s really remarkably flexible. For example, this spits out a pattern for base-2 real numbers taht allow commas every three places:
$ perl -MRegexp::Common -le 'print $RE{num}{real}{-base => 2}{-sep => ","}{-group => 3}'
(?:(?i)(?:[+-]?)(?:(?=[.]?[01])(?:[01]{1,3}(?:(?:[,])[01]{3})*)(?:(?:[.])(?:[01]{0,}))?)(?:(?:[E])(?:(?:[+-]?)(?:[01]+))|))
The documentation shows that the full possible syntax for the numeric patterns it can spit out for you is:
$RE{num}{int}{-base}{-sep}{-group}{-places}
$RE{num}{real}{-base}{-radix}{-places}{-sep}{-group}{-expon}
$RE{num}{dec}{-radix}{-places}{-sep}{-group}{-expon}
$RE{num}{oct}{-radix}{-places}{-sep}{-group}{-expon}
$RE{num}{bin}{-radix}{-places}{-sep}{-group}{-expon}
$RE{num}{hex}{-radix}{-places}{-sep}{-group}{-expon}
$RE{num}{decimal}{-base}{-radix}{-places}{-sep}{-group}
$RE{num}{square}
$RE{num}{roman}
Making it really to customize it for whatever you want. And yes, of course you can use these patterns in Java.
Enjoy.

You need to require at least one digit, i.e. using + instead of * for the \d.
I think you can also drop the {1} in several places since this is implied by default
Similarly {0,1} can be dropped when followed by ?
Giving us:
regex = "[+-]?(\\d+|\\d*\\.?\\d+)";

I think this should do it:
[+-]?\d*(\.\d+)?
EDIT:
I've improved it so it will not match the dot on -123. but it will for 123.456
EDIT2:
So it doesn't match only + or -, you can check that such a sign must precede either a dot or a number, the dot being optional.
[+-]?(?=[\.?\d])\d*(\.\d+)?

Related

Java regex that handles several possibilites

I am trying to find a regex for the following user generated possibilities:
÷2%3%x#4%2$#
OR
÷2%x#4%2$#
OR
÷2%x#4$#
OR
÷2%x#
To understand the expression, it is a fraction whose numerator lies between
the ÷ and the first %, and the denominator lies from first % to the #.
But, the denominator has an exponent, which lies from the # to $.
The user can input whatever number he/she desires, but the structure stays the same. Notice that the number can also be a decimal.
The structure is as follows: ÷(a number, if its two or more digits a % will be in between the digits)x(a group that consists of a number(s), also the symbols # , $ and a %(s) which can also alternate between the digits)#
Remember, the number can be a decimal number.
I am trying to use the following regex with no success:
"[÷]-?\\d+(\\.\\d*)?[%](-?\\d+(\\.\\d*)?){0,1}[x]([#]-?\\d+(\\.\\d*)?[$]){0,1}[#]"
I think that the group (-?\d+(\.\d*)?){0,1} is complicating things up.
Also, I have not accounted for the % within this group which could occur.
any suggestions, thank you
Edit: Deleted the old post content.
According to your new testcases I improved your regex to match all cases and simplified the regex:
÷[0-9%]+?x(#[0-9%]+?\$)?# OR ÷[\d%]+?x(#[\d%]+?\$)?#
Note:
The [] mark groups of allowed characters so it has no use to have the parenthesis.
Also [÷][0-9]+[0-9[%]]+? is just the same as ÷[0-9]+[0-9%]+? the first part in your example matches any number 0-9 n-times and then you check for either (0-9 or %) for n-times (non greedy fetching). So instead you can just use the second check for the whole thing.
By wrapping the exponent in a regex-class: () we can make the whole exponent optional with ? ==> this will make your 4th test-case work.
You could also substitute 0-9 with \d (any digit) if you want.
I found a regex that works, I tested from the bottom up:
Here it is:
[÷][0-9[%][\\.]]+?[x][0-9[%][\\.][#][$]]*?[#]
This regex works for all types of cases. Even those that include decimal numbers, or not exponents.
the group [0-9[%][\.][#][$]]*? allows the regex to search for exponent, which can occur zero(that's why the * is there) or more times and the ? makes it optional. Similarly, I followed the same idea for the coefficient of x(read the post if you don't know where the coefficient lies) and the numerator. Thank you for everyone that put effort in brainstorming this problem. I have chosen to use my answer for my programming.

Regex - how to extract integers only not float from text

Given a text:
Why should the number 12.8 be rounded to 13. It must be rather 11
What must be a regex to extract, the integer values only:
13
11
I tried this: \d+(?!\\.)
But still no luck.
You need to use lookarounds (lookbehind, lookahead) to check what happens before and after the digits you match:
a naive approach:
(?<![0-9]|[0-9]\.)[0-9]+(?!\.?[0-9])
an efficient approach:
[0-9](?<![0-9][0-9]|[0-9]\.[0-9])[0-9]*+(?!\.[0-9])
(Because this one quickly discards positions where there is not a digit)
Note: don't forget to escape the backslashes in the java string.
You can also write it like this:
\b[0-9](?<![0-9]\.[0-9])[0-9]*+(?!\.[0-9])
I solved applying two regex. The command line bellow shows how they work:
echo "Why number 12.8 be rounded to 13. It must be rather 11" | grep -Po '\b\d+\.?\d\b' | grep -Po '^\d+$'
The first regex select all numbers, including floating points. The second regex selects only integers.
In java, use "\\b\\d+\\.?\\d\\b" to select all numbers, and "^\\d+$" to select only integers.

I need a regex command that isolates all numbers not adjacent to a caret (^)

I am having a lot of trouble figuring Regex command out, and can't seem to find the right combination to fit what I want
Example:
Input: 1x^3+5x^2+6x+2
Output: 1 5 6 2
I need to isolate those values, as they are the coefficients of my polynomial. The input is a String so I figured the best way to do this was by using the .split() function with a custom Regex command.
You can use this regular expression:
(?<!\^)\d+(?!\^)
This uses a negative lookahead and lookbehind to remove characters next to ^.
Since you want to extract coefficients, it finds one or more digits. Modified the middle part if needed.
You can use it this way in Java, for example:
Matcher m = Pattern.compile("(?<!\\^)\\d+(?!\\^)").matcher("1x^3+5x^2+6x+2");
while (m.find()) {
System.out.println("Coefficient: " + m.group());
}
EDIT:
If you also want to detect negative coefficients, you can check for an optional - before digits:
(?<!\^)-?\d+(?!\^)
Keep in mind that as you try to capture more complicated patterns, regular expressions become less suitable as you may get lost in a number of cases to cover.

Regex expression to validate a formula

I am new to regex and currently building web application in Java.I have the following requirements to validate a formula:
Formula must start with “T”
A formula can contain the following set of characters:
Digit: 0 - 9
Alpha: A - Z
Operators: *, /, +, -
Separator: ;
An operator must always be followed by a digit
The character “T” must always be followed by a digit or an alpha.
The separator must always be followed by “T”.
The character “M” must always be followed by an operator.
I manage to build up the following expression as shown below:
^[T][A-Z0-9 -- \\+*;]*
But i don't know how to add the following validation with regex above:
An operator must always be followed by a digit
The character “T” must always be followed by a digit or an alpha.
The separator must always be followed by “T”
The character “M” must always be followed by an operator.
Valid sample: TA123;T1*2/32M+
Invalid Sample: T+qMg;Y
^(?!.*[*+/-]\\D)(?!.*T\\W)(?!.*[;:][^T])(?!.*M[^*+/-])[T][A-Z0-9 +/*;:-]*$
You can use this.See demo.
https://regex101.com/r/sS2dM8/7
We lack a bit of information to fully understand what you want. A couple examples would help.
For now, a small regexp :
^(T[A-LN-Z0-9]*M[+-/*][0-9];?)*
EDIT :
From my understanding, this should be close to what you're looking for :
^(T([A-LN-Z0-9]*M?[+\-/*]?[0-9]?)*;?)+
https://regex101.com/r/hT7aP2/1
This regexp forces the line to begin with a T, then have 0 to many [A-LN-Z0-9] range, meaning all your alphas and digits except M.
Then it needs to have a M followed by an operator in the range of [+-/*] *(pretty much +, -, / and , except that - and / are special characters so we tell the regexp that we want these characters, and not the meaning they're supposed to have).
Then it continues by one to many digits, and ends by a ";" that might or might not be there.
And everything in the parenthesis can be repeated from 0 to several times
I would have liked examples of what you want to validate... For example, we don't know if the line HAVE to end with a ";"
Depending on what you want, splitting the string you want to validate using the character ";" and validating each of the generated string with that regexp might work

Regular expression for UK postal codes

I'm making an application that asks the user to enter a postcode and outputs the postcode if it is valid.
I found the following pattern, which works correctly:
String pattern = "^([A-PR-UWYZ](([0-9](([0-9]|[A-HJKSTUW])?)?)|([A-HK-Y][0-9]([0-9]|[ABEHMNPRVWXY])?)) [0-9][ABD-HJLNP-UW-Z]{2})";
I don't know much about regex and it would be great if someone could talk me through this statement. I mainly don't understand the ? and use of ().
Your regex has the following:
^ and $ - anchors for indicating start and end of matching input.
[A-PR-UWYZ] - Any character among A to P or R to U or W,Y,Z. Characters enclosed in square brackets form a character class, which allows any of the enclosed characters and - is for indicating a sequence of characters like [A-D] allowing A,B,C or D.
([0-9]|[A-HJKSTUW])? - An optional character any of 0-9 or characters indicated by [A-HJKSTUW]. ? makes the preceding part optional. | is for an OR. The () combines the two parts to be ORed. Here you may use [0-9A-HJKSTUW] instead of this.
[ABD-HJLNP-UW-Z]{2} - Sequence of length 2 formed by characters allowed by the character class. {2} indicates the length 2. So [ABD-HJLNP-UW-Z]{2} is equivalent to [ABD-HJLNP-UW-Z][ABD-HJLNP-UW-Z]
the ? means occurs 0 or 1 times and the brackets do grouping as you might expect, modifiers will work on groups. A regex tutorial is probably the best thing here
http://www.vogella.com/articles/JavaRegularExpressions/article.html
i had a brief look and it seems reasonable also for practice/play see this applet
http://www.cis.upenn.edu/~matuszek/General/RegexTester/regex-tester.html
simple example (ab)?
means 'ab' once or not at all

Categories