Java regex that handles several possibilites - java

I am trying to find a regex for the following user generated possibilities:
÷2%3%x#4%2$#
OR
÷2%x#4%2$#
OR
÷2%x#4$#
OR
÷2%x#
To understand the expression, it is a fraction whose numerator lies between
the ÷ and the first %, and the denominator lies from first % to the #.
But, the denominator has an exponent, which lies from the # to $.
The user can input whatever number he/she desires, but the structure stays the same. Notice that the number can also be a decimal.
The structure is as follows: ÷(a number, if its two or more digits a % will be in between the digits)x(a group that consists of a number(s), also the symbols # , $ and a %(s) which can also alternate between the digits)#
Remember, the number can be a decimal number.
I am trying to use the following regex with no success:
"[÷]-?\\d+(\\.\\d*)?[%](-?\\d+(\\.\\d*)?){0,1}[x]([#]-?\\d+(\\.\\d*)?[$]){0,1}[#]"
I think that the group (-?\d+(\.\d*)?){0,1} is complicating things up.
Also, I have not accounted for the % within this group which could occur.
any suggestions, thank you

Edit: Deleted the old post content.
According to your new testcases I improved your regex to match all cases and simplified the regex:
÷[0-9%]+?x(#[0-9%]+?\$)?# OR ÷[\d%]+?x(#[\d%]+?\$)?#
Note:
The [] mark groups of allowed characters so it has no use to have the parenthesis.
Also [÷][0-9]+[0-9[%]]+? is just the same as ÷[0-9]+[0-9%]+? the first part in your example matches any number 0-9 n-times and then you check for either (0-9 or %) for n-times (non greedy fetching). So instead you can just use the second check for the whole thing.
By wrapping the exponent in a regex-class: () we can make the whole exponent optional with ? ==> this will make your 4th test-case work.
You could also substitute 0-9 with \d (any digit) if you want.

I found a regex that works, I tested from the bottom up:
Here it is:
[÷][0-9[%][\\.]]+?[x][0-9[%][\\.][#][$]]*?[#]
This regex works for all types of cases. Even those that include decimal numbers, or not exponents.
the group [0-9[%][\.][#][$]]*? allows the regex to search for exponent, which can occur zero(that's why the * is there) or more times and the ? makes it optional. Similarly, I followed the same idea for the coefficient of x(read the post if you don't know where the coefficient lies) and the numerator. Thank you for everyone that put effort in brainstorming this problem. I have chosen to use my answer for my programming.

Related

Java - Regex for oracle NUMBER(10,8) field

I have an Oracle column of data type NUMBER(10,8). I need to validate the input data via java regex before the storing the data in tables. As per oracle's data type, valid values include:
10 digits
2 digits . 8 digits
3 digits . 7 digits
4 digits . 6 digits
no digits . 8 digits (is saved in Oracle as 0.12345678 but the input value can be like .12345678)
and so on. Negative values of these cases are also valid.
I can write regex for one case at a time. i.e we can check for 1234567891 with one regex. Then with changes in the range, we can write respective regex for all the possible combinations of the scale.
My sample regex : ^-?\\d{0,2}(?>\\.\\d{1,8})?$ : checks for 2 digits . 8 digits case.
Now I want to know, is there any easier way of checking all such values in one regex? One can always use a '|' operator but then the total number of such OR regex would be equal to the scale part of the data type.
Is there any elegant possible solution? Any pointers, suggestions are welcome!
UPDATE :
After #Andreas pointed out the actual meaning of (10,8), the question does seem to be misguided. Removing the invalid cases from the above mentioned, the valid cases are :
(0/1/2 digits).(0/1/2/../8 digits)
0/1/2 digits
negative cases
You've misunderstood the meaning of NUMBER(10,8):
Specify a fixed-point number using the following form:
NUMBER(p,s)
where:
p is the precision, or the maximum number of significant decimal digits, where the most significant digit is the left-most nonzero digit, and the least significant digit is the right-most known digit. [...]
s is the scale, or the number of digits from the decimal point to the least significant digit. [...]
It means maximum 10 significant decimal digits, with 8 digits from the decimal point, i.e. 2.8 only. The scale is not floating. Sure, you can have fewer on each side of the decimal point, but no mote than 2 on the left and 8 on the right.
Oracle names this a fixed-point number, and it is very distinct from a floating-point number, which uses the same keyword but without limits, i.e. NUMBER.
As for Oracle Database number literal format, the format is:
If you exclude scientific notation, that means a regex of:
^[+-]?(?:\d{1,2}(?:\.\d{0,8})?|\.\d{1,8})$
EDIT
Please note that this answer was provided prior to the OP's update and is no longer correct. I will leave this here in case it helps any future readers, but at the time of writing this edit Andreas' answer seems to be correct.
This may be easier to achieve by simply splitting the string on . and then testing each side's length, but this can still be achieved using regex alone.
See this regex in use here
^(?=[\d.]{1,10}$)(?:\d+(?:\.\d+)?|\.\d{1,8})$
^ Assert position at the start of the line
(?=[\d.]{1,10}$) Positive lookahead to ensure the line has a maximum of 10 characters and is composed of only digits and ..
(?:\d+(?:\.\d+)?|\.\d{1,8}) Match either of the following options
\d+(?:\.\d+)? Match any digit one or more times, optionally followed by a decimal point . and more digits (one or more). You can change \.\d+ to \.\d* to match numbers like 1. that don't have decimal numbers specified.
\.\d{1,8} Match . followed by a digit between 1-8 times. Since the 0 is implied here, it's actually the 9th digit for this number (the dot being the tenth).
$ Assert position at the end of the line
For matching the possibility of + or - at the start of the number, the following may be used.
See regex in use here
^(?=[-+\d.]{1,10}$)(?:[-+]?\d+(?:\.\d+)?|\.\d{1,8}|[+-]\.\d{1,7})$
Regex is a chomsky language of typ-3, which can not calculate something. You can only or-combine all possible formats, which results in a long and unmaintainable regex. So the easiest solution is a functional check.

RegExp Password Checker for Specific Lengths/Digit Counts

Here's my RegExp checker which currently doesn't work:
String pattern = "(?=.*[0-9]{3})(?=.*[a-z])(?=.*[A-Z])(?=.*[##$%^&+=]{1})(?=\\S+$).{5,15}";
Here are the parameters I cannot meet yet:
5-15 characters {5,15}
exactly 3 digits (?=.*[0-9]{3})
Neither the character limit nor the digit check are working, and I can't find any examples for some reason. Where I am I going wrong? Clearly it's a placement issue, as I'm a total novice. Any help would be appreciated. The others (at least one upper/lowercase/special) I can meet, but these two simple pieces I'm still struggling with.
For three digits checking add this anywhere in your regex as you are using positive lookahead.
(?=^([^0-9]*[0-9]){3}[^0-9]*$)
For 5-15 digit check add this one:
(?=^.{5,15}$)
You can use the regex on the site https://regex101.com/ and it will give you the explanation on the right hand side.
[0-9]{3} is 3 consecutive integers. To allow three integers somewhere in the string you need to check for each integer part.
(?=^[^0-9]*[0-9][^0-9]*[0-9][^0-9]*[0-9][^0-9]*$)
.{5,15} is 5 to 15 characters but that is anywhere in the string, to have it affect the whole string that needs to be anchored. So your full expression should be:
^(?=^[^0-9]*[0-9][^0-9]*[0-9][^0-9]*[0-9][^0-9]*$)(?=.*[a-z])(?=.*[A-Z])(?=.*[##$%^&+=]{1})(?=\\S+$).{5,15}$
Demo: https://regex101.com/r/UVK7ev/1

Regex for numbers between 0 and 180 and decimals places in Javacc

So I'm creating a token in JavaCC by using regex.
I'm trying to only allow 3 digit numbers and is only between 0 - 180.
Also, I'm trying to only allow (in a separate token) 2 digit numbers between 0 and 59.9999 (4 decimal places).
I have no idea how to create the regex for these two tokens in JavaCC...
Any help would with an explanation would be awesome thanks :)
For the first case, your pattern needs to allow 1-digit numbers, 2-digit numbers, 3-digit numbers whose first digit is 1 and whose second digit is in the range 0-7, and the special case 180. The regex would look like
[0-9]{1,2}|1[0-7][0-9]|180
(I don't know javacc, so I don't know how this regex would be used, or whether you need something else to prevent something like 1800 from being parsed as a number, or as two numbers. You might need \b on the ends to indicate a word boundary, but I have no idea how javacc works.)
For the second case, the part to the left of the decimal point is either one digit, or two digits where the first digit is in the range 0-5. Your requirements aren't clear, but if the token is required to have a decimal point and one to four digits to the right of the decimal point, the regex would be
([0-9]|[0-5][0-9])\.[0-9]{1,4}
Again, I don't know how javacc handles the word boundaries.
Note that if this were a Java program, I would recommend (in the first case) just parsing it as an integer and comparing it to 0 and 180. Too many questioners try to use regexes to solve every problem, but they are not suited for every problem. Since this is for javacc, it may be a context in which regexes are simple to use and numeric comparisons are not--as I've mentioned, I don't know anything about javacc.

This regex line exceeds my understanding "(?=(?:\d{3})++(?!\d))"

i am pretty ok with basic reg-ex. but this line of code used to make the thousand separation in large numbers exceeds my knowledge and googling it quite a bit did also not satisfy my curiosity. can one of u please take a minute to explain to me the following line of code?
someString.replaceAll("(\\G-?\\d{1,3})(?=(?:\\d{3})++(?!\\d))", "$1,");
i especially don't understand the regex structure "(?=(?:\d{3})++(?!\d))".
thanks a lot in advance.
"(?=(?:\d{3})++(?!\d))" is a lookahead assertion.
It means "only match if followed by ((three digits that we don't need to capture) repeated one or more times (and again repeated one or more times) (not followed by a digit))". See this explanation about (?:...) notation. It's called non-capturing group and means you don't need to reference this group after the match.
"(\\G-?\\d{1,3})" is the part that should actually match (but only if the above-described conditions are met).
Edit: I think + must be a special character, otherwise it's just a plus. If it's a special character (and quick search suggests that it is in Java, too), the second one is redundant.
Edit 2: Thanks to Alan Moore, it's now clear. The second + means possessive matching, so it means that if after checking as many 3-digit groups as possible it won't find that they're not followed by a non-digit, the engine will immediately give up instead of stepping one 3-digit group back.
this expression has some advanced stuff in it.
first , the easiest: \d{3} means exactly three digits. These are your thousands.
then: the ++ is a variant of + (which means one or more), but possessive, which means it will eat all of the thousands. Im not completely sure why this is necessary.
?:means it is a non-capturing group - i think this is just there for performance reasons and could be omitted.
?=is a positive lookahead - i think this means it is only checked whether that group exists but will not count towards the matched string - meaning it wont be replaced.
?! is a negative lookahead - i dont quite understand that but i think it means it must NOT match, which in turn means there cannot be another digit at the end of the matched sequence. This makes sure the first group gets the right digits. E.g. 10000 can only be matched as 10(000) but not 1(000)0 if you see what i mean.
Through the lookaheads, if i understand it correctly (i havent tested it), only the first group would actually be replaced, as it is the one that matches.
To me, the most interesting part of that regex is the \G. It took me a while to remember what it's for: to prevent adding commas to the fraction part if there is one. If the regex were simply:
(-?\d{1,3})(?=(?:\d{3})++(?!\d))
...this number:
12345.67890
...would end up as:
12,345.67,890
But adding \G to the beginning means a match can only start at the beginning of the string or at the position where the previous match ended. So it doesn't match 345 because of the . following it, and it doesn't match 67 because it would have to skip over some of the string to do so. And so it correctly returns:
12,345.67890
I know this isn't an answer to the question, but I thought it was worth a mention.

codingBat separateThousands using regex (and unit testing how-to)

This question is a combination of regex practice and unit testing practice.
Regex part
I authored this problem separateThousands for personal practice:
Given a number as a string, introduce commas to separate thousands. The number may contain an optional minus sign, and an optional decimal part. There will not be any superfluous leading zeroes.
Here's my solution:
String separateThousands(String s) {
return s.replaceAll(
String.format("(?:%s)|(?:%s)",
"(?<=\\G\\d{3})(?=\\d)",
"(?<=^-?\\d{1,3})(?=(?:\\d{3})+(?!\\d))"
),
","
);
}
The way it works is that it classifies two types of commas, the first, and the rest. In the above regex, the rest subpattern actually appears before the first. A match will always be zero-length, which will be replaceAll with ",".
The rest basically looks behind to see if there was a match followed by 3 digits, and looks ahead to see if there's a digit. It's some sort of a chain reaction mechanism triggered by the previous match.
The first basically looks behind for ^ anchor, followed by an optional minus sign, and between 1 to 3 digits. The rest of the string from that point must match triplets of digits, followed by a nondigit (which could either be $ or \.).
My question for this part is:
Can this regex be simplified?
Can it be optimized further?
Ordering rest before first is deliberate, since first is only needed once
No capturing group
Unit testing part
As I've mentioned, I'm the author of this problem, so I'm also the one responsible for coming up with testcases for them. Here they are:
INPUT, OUTPUT
"1000", "1,000"
"-12345", "-12,345"
"-1234567890.1234567890", "-1,234,567,890.1234567890"
"123.456", "123.456"
".666666", ".666666"
"0", "0"
"123456789", "123,456,789"
"1234.5678", "1,234.5678"
"-55555.55555", "-55,555.55555"
"0.123456789", "0.123456789"
"123456.789", "123,456.789"
I haven't had much experience with industrial-strength unit testing, so I'm wondering if others can comment whether this is a good coverage, whether I've missed anything important, etc (I can always add more tests if there's a scenario I've missed).
This works for me:
return s.replaceAll("(\\G-?\\d{1,3})(?=(?:\\d{3})++(?!\\d))", "$1,");
The first time through, \G acts the same as ^, and the lookahead forces \d{1,3} to consume only as many characters as necessary to leave the match position at a three-digit boundary. After that, \d{1,3} consumes the maximum three digits every time, with \G to keep it anchored to the end of the previous match.
As for your unit tests, I would just make it clear in the problem description that the input will always be valid number, with at most one decimal point.
When you state the requirements are you intending for them to be enforced by your method?
The number may contain an optional
minus sign, and an optional decimal
part. There will not be any
superfluous leading zeroes.
If your intent is to have the method detect when those constraints are violated you will need additional to write additional unit-tests to ensure that contract is being enforced.
What about testing for 1234.5678.91011?
Do you expect your method to return 1,234.5678.91011 or just ignore the whole thing?
Best to write a test to verify your expectations

Categories