Java - Regex for oracle NUMBER(10,8) field - java

I have an Oracle column of data type NUMBER(10,8). I need to validate the input data via java regex before the storing the data in tables. As per oracle's data type, valid values include:
10 digits
2 digits . 8 digits
3 digits . 7 digits
4 digits . 6 digits
no digits . 8 digits (is saved in Oracle as 0.12345678 but the input value can be like .12345678)
and so on. Negative values of these cases are also valid.
I can write regex for one case at a time. i.e we can check for 1234567891 with one regex. Then with changes in the range, we can write respective regex for all the possible combinations of the scale.
My sample regex : ^-?\\d{0,2}(?>\\.\\d{1,8})?$ : checks for 2 digits . 8 digits case.
Now I want to know, is there any easier way of checking all such values in one regex? One can always use a '|' operator but then the total number of such OR regex would be equal to the scale part of the data type.
Is there any elegant possible solution? Any pointers, suggestions are welcome!
UPDATE :
After #Andreas pointed out the actual meaning of (10,8), the question does seem to be misguided. Removing the invalid cases from the above mentioned, the valid cases are :
(0/1/2 digits).(0/1/2/../8 digits)
0/1/2 digits
negative cases

You've misunderstood the meaning of NUMBER(10,8):
Specify a fixed-point number using the following form:
NUMBER(p,s)
where:
p is the precision, or the maximum number of significant decimal digits, where the most significant digit is the left-most nonzero digit, and the least significant digit is the right-most known digit. [...]
s is the scale, or the number of digits from the decimal point to the least significant digit. [...]
It means maximum 10 significant decimal digits, with 8 digits from the decimal point, i.e. 2.8 only. The scale is not floating. Sure, you can have fewer on each side of the decimal point, but no mote than 2 on the left and 8 on the right.
Oracle names this a fixed-point number, and it is very distinct from a floating-point number, which uses the same keyword but without limits, i.e. NUMBER.
As for Oracle Database number literal format, the format is:
If you exclude scientific notation, that means a regex of:
^[+-]?(?:\d{1,2}(?:\.\d{0,8})?|\.\d{1,8})$

EDIT
Please note that this answer was provided prior to the OP's update and is no longer correct. I will leave this here in case it helps any future readers, but at the time of writing this edit Andreas' answer seems to be correct.
This may be easier to achieve by simply splitting the string on . and then testing each side's length, but this can still be achieved using regex alone.
See this regex in use here
^(?=[\d.]{1,10}$)(?:\d+(?:\.\d+)?|\.\d{1,8})$
^ Assert position at the start of the line
(?=[\d.]{1,10}$) Positive lookahead to ensure the line has a maximum of 10 characters and is composed of only digits and ..
(?:\d+(?:\.\d+)?|\.\d{1,8}) Match either of the following options
\d+(?:\.\d+)? Match any digit one or more times, optionally followed by a decimal point . and more digits (one or more). You can change \.\d+ to \.\d* to match numbers like 1. that don't have decimal numbers specified.
\.\d{1,8} Match . followed by a digit between 1-8 times. Since the 0 is implied here, it's actually the 9th digit for this number (the dot being the tenth).
$ Assert position at the end of the line
For matching the possibility of + or - at the start of the number, the following may be used.
See regex in use here
^(?=[-+\d.]{1,10}$)(?:[-+]?\d+(?:\.\d+)?|\.\d{1,8}|[+-]\.\d{1,7})$

Regex is a chomsky language of typ-3, which can not calculate something. You can only or-combine all possible formats, which results in a long and unmaintainable regex. So the easiest solution is a functional check.

Related

Java regex that handles several possibilites

I am trying to find a regex for the following user generated possibilities:
÷2%3%x#4%2$#
OR
÷2%x#4%2$#
OR
÷2%x#4$#
OR
÷2%x#
To understand the expression, it is a fraction whose numerator lies between
the ÷ and the first %, and the denominator lies from first % to the #.
But, the denominator has an exponent, which lies from the # to $.
The user can input whatever number he/she desires, but the structure stays the same. Notice that the number can also be a decimal.
The structure is as follows: ÷(a number, if its two or more digits a % will be in between the digits)x(a group that consists of a number(s), also the symbols # , $ and a %(s) which can also alternate between the digits)#
Remember, the number can be a decimal number.
I am trying to use the following regex with no success:
"[÷]-?\\d+(\\.\\d*)?[%](-?\\d+(\\.\\d*)?){0,1}[x]([#]-?\\d+(\\.\\d*)?[$]){0,1}[#]"
I think that the group (-?\d+(\.\d*)?){0,1} is complicating things up.
Also, I have not accounted for the % within this group which could occur.
any suggestions, thank you
Edit: Deleted the old post content.
According to your new testcases I improved your regex to match all cases and simplified the regex:
÷[0-9%]+?x(#[0-9%]+?\$)?# OR ÷[\d%]+?x(#[\d%]+?\$)?#
Note:
The [] mark groups of allowed characters so it has no use to have the parenthesis.
Also [÷][0-9]+[0-9[%]]+? is just the same as ÷[0-9]+[0-9%]+? the first part in your example matches any number 0-9 n-times and then you check for either (0-9 or %) for n-times (non greedy fetching). So instead you can just use the second check for the whole thing.
By wrapping the exponent in a regex-class: () we can make the whole exponent optional with ? ==> this will make your 4th test-case work.
You could also substitute 0-9 with \d (any digit) if you want.
I found a regex that works, I tested from the bottom up:
Here it is:
[÷][0-9[%][\\.]]+?[x][0-9[%][\\.][#][$]]*?[#]
This regex works for all types of cases. Even those that include decimal numbers, or not exponents.
the group [0-9[%][\.][#][$]]*? allows the regex to search for exponent, which can occur zero(that's why the * is there) or more times and the ? makes it optional. Similarly, I followed the same idea for the coefficient of x(read the post if you don't know where the coefficient lies) and the numerator. Thank you for everyone that put effort in brainstorming this problem. I have chosen to use my answer for my programming.

Regex for numbers between 0 and 180 and decimals places in Javacc

So I'm creating a token in JavaCC by using regex.
I'm trying to only allow 3 digit numbers and is only between 0 - 180.
Also, I'm trying to only allow (in a separate token) 2 digit numbers between 0 and 59.9999 (4 decimal places).
I have no idea how to create the regex for these two tokens in JavaCC...
Any help would with an explanation would be awesome thanks :)
For the first case, your pattern needs to allow 1-digit numbers, 2-digit numbers, 3-digit numbers whose first digit is 1 and whose second digit is in the range 0-7, and the special case 180. The regex would look like
[0-9]{1,2}|1[0-7][0-9]|180
(I don't know javacc, so I don't know how this regex would be used, or whether you need something else to prevent something like 1800 from being parsed as a number, or as two numbers. You might need \b on the ends to indicate a word boundary, but I have no idea how javacc works.)
For the second case, the part to the left of the decimal point is either one digit, or two digits where the first digit is in the range 0-5. Your requirements aren't clear, but if the token is required to have a decimal point and one to four digits to the right of the decimal point, the regex would be
([0-9]|[0-5][0-9])\.[0-9]{1,4}
Again, I don't know how javacc handles the word boundaries.
Note that if this were a Java program, I would recommend (in the first case) just parsing it as an integer and comparing it to 0 and 180. Too many questioners try to use regexes to solve every problem, but they are not suited for every problem. Since this is for javacc, it may be a context in which regexes are simple to use and numeric comparisons are not--as I've mentioned, I don't know anything about javacc.

Regular expression to check if a String is a positive natural number

I want to check if a string is a positive natural number but I don't want to use Integer.parseInt() because the user may enter a number larger than an int. Instead I would prefer to use a regex to return false if a numeric String contains all "0" characters.
if(val.matches("[0-9]+")){
// We know that it will be a number, but what if it is "000"?
// what should I change to make sure
// "At Least 1 character in the String is from 1-9"
}
Note: the string must contain only 0-9 and it must not contain all 0s; in other words it must have at least 1 character in [1-9].
You'd be better off using BigInteger if you're trying to work with an arbitrarily large integer, however the following pattern should match a series of digits containing at least one non-zero character.
\d*[1-9]\d*
Debuggex Demo
Debugex's unit tests seem a little buggy, but you can play with the pattern there. It's simple enough that it should be reasonably cross-language compatible, but in Java you'd need to escape it.
Pattern positiveNumber = Pattern.compile("\\d*[1-9]\\d*");
Note the above (intentionally) matches strings we wouldn't normally consider "positive natural numbers", as a valid string can start with one or more 0s, e.g. 000123. If you don't want to match such strings, you can simplify the pattern further.
[1-9]\d*
Debuggex Demo
Pattern exactPositiveNumber = Pattern.compile("[1-9]\\d*");
If you want to match positive natural numbers, written in the standard way, without a leading zero, the regular expression you want is
[1-9]\d*
which matches any string of characters consisting only of digits, where the first digit is not zero. Don't forget to double the backslash ("[1-9]\\d*") if you write it as a Java String literal.
I made the following regex for only positive natural numbers:
^[1-9]\d*$
This will will check if a number starts with 1 to 9 (so there can't be any zero's in the beginning) and there rest of the numbers need to be numbers from 0 to 9. You can test it at https://regex101.com

Degrees Minutes Seconds (DMS) RegEx

I have a regular expression that I want to match a latitude/longitude pair in a variety of fashions, e.g.
123 34 42
-123* 34' 42"
123* 34' 42"
+123* 34' 42"
45* 12' 22"N
45 12' 22"S
90:00:00.0N
I want to be able to match these in a pair such that
90:00:00.0N 180:00:00.0E is a latitude/longitude pair.
or
45* 12' 22"N 46* 12' 22"E is a latitude/longitude pair (1 degree by 1 degree cell).
or
123* 34' 42" 124* 34' 42" is a latitude/longitude pair
etc
Using the below regular expression, when I type in 123, it matches. I suppose this is true since 123 00 00 is a valid coordinate. However, I want to use this regular expression to match pairs in the same format above
"([-|\\+]?\\d{1,3}[d|D|\u00B0|\\s](\\s*\\d{1,2}['|\u2019|\\s])?"
+ "(\\s*\\d{1,2}[\"|\u201d|\\s])?\\s*([N|n|S|s|E|e|W|w])?\\s?)"
I am using Java.
* denotes a degree.
What am I doing wrong in my regular expression?
Well, for one thing, you're filling your character sets with a bunch of unnecessary pipe characters - alternation is implied in a [] pair. Additional cleanup: + doesn't need to be escaped in a character class. Your regular expression seems to be addressing a bigger problem statement than you gave us - you make no mention of d or D as matchable character. And you've made pretty much the entire back half of your RegEx optional. Going off of what I think your original problem statement is, I built the following regular expression:
^\s*([+-]?\d{1,3}\*?\s+\d{1,2}'?\s+\d{1,2}"?[NSEW]?|\d{1,3}(:\d{2}){2}\.\d[NSEW]\s*){1,2}$
It's a bit of a doozy, but I'll break it down for you, or anyone who happens across this in the future (Hello, future!).
^
Start of string, simple.
\s*
Any amount of whitespace - even none.
(
Denotes the beginning of a group - we'll get back to that.
[-+]?
An optional sign
\d{1,3}
1 to three digits
\*?
An optional Asterisk - the escape here is key for an asterisk, but if you want to replace this with the unicode codepoint for an actual degree, you won't need it.
\s+
At least one character of whitespace
\d{1,2}
1 or two digits.
'?
Optional apostrophe
\s+\d{1,2}+
You've seen these before, but there's a new curveball - there's a plus after the {1,2} quantifier! This makes it a possessive quantifier, meaning that the matcher won't give up its matches for this group to make another one possible. This is almost exclusively here to prevent 1 1 11 1 1 from matching, but can be used to increase speed anywhere you're 100% sure you don't need to be able to backtrack.
"?
Optional double quote. You'll have to escape this in Java.
[NSEW]?
An optional cardinal direction, designated by letter
|
OR - you can match everything in the group before this, or everything in the group after this.
\d{1,3}
Old news.
(:\d{2})
A colon, followed by two characters...
{2}
twice!
\.\d
Decimal point, followed by a single digit.
[NSEW]
Same as before, but this time it's mandatory.
\s*)
Some space, and finally the end of the group. Now, the first group has matched an entire longitude/latitude denotation, with an arbitrary amount of space at the end. Followed closely by:
{1,2}
Do that one, or two times - to match a single or a pair, then finally:
$
The end of the string.
This isn't perfect, but it's pretty close, and I think it answers the original problem statement. Plus, I feel my explanation has demystified it enough that you can edit it to further suit your needs. The one thing it doesn't (and won't) do, is enforce that the first coordinate matches the second in style. That's just too much to ask of Regular Expressions.
Doubters: Here it is in action. Please, enjoy.
Generally, I dont think that this is a good approach.
In your interface try to have DMS coordinates in one specific format.
The User should enter this in 3 separate text fields.
Further this regex is not very maintainable.
There are much more possibilities to notate a DMS coordinate,
you even cannot imagine. Humans are creative.
Eg:
Put N,S in front
or: North, 157 deg 50 min 55.796 sec
or: from wiki: The NGS now says in 1993 that point was 21-18-02.54891 N 157-50-45.90280 W
I'm not a RE wizard but with your formats you'd need to have some kind of convention for which pair comes first (probably latitude) if you're doing parsing from a single text box.
From there, you have six numeric fields (deg, min, sec for each, possibly with a decimal point), two signs (+ or - for each) and up to two hemispheres (one for each).
As far as I can see, parsing these 8-10 fields from your input would occur in the same order each time if you demanded only that the latitude is first, and the longitude second. The rest of the symbols (save the decimal point(s)) can be treated essentially as separators.
Does that make it easier?

Is there Any End of Underscores in Numeric Literals?

I just read Enhancements in Java7(
http://docs.oracle.com/javase/7/docs/technotes/guides/language/underscores-literals.html).
In that I see Underscores Numeric Literals and try Like....
int i=9_000; its OK.
But I see rules for that it also allows like...
int i=9____________________________________________________________________________________000;
Is there any end of Underscores?
There's no limit. Why should there be ? On the other hand, the only reason I see to use any number of underscores, is to be able to do fancy stuff like in the following piece of code (created by Joshua Bloch, if I'm not mistaken):
private static final int BOND =
0000_____________0000________0000000000000000__000000000000000000+
00000000_________00000000______000000000000000__0000000000000000000+
000____000_______000____000_____000_______0000__00______0+
000______000_____000______000_____________0000___00______0+
0000______0000___0000______0000___________0000_____0_____0+
0000______0000___0000______0000__________0000___________0+
0000______0000___0000______0000_________0000__0000000000+
0000______0000___0000______0000________0000+
000______000_____000______000________0000+
000____000_______000____000_______00000+
00000000_________00000000_______0000000+
0000_____________0000________000000007;
No, there's no limit. Java allows any amount of underscores although, depending on how your compiler is implemented, you may run into problems for bizarre edge cases like several billion of them :-)
In those places where you can have underscores, the language specification does not limit the quantity. I emphasise "can" there because there are places where they're not allowed, such as before the first digit, after the last, next to the decimal point and so on. But that's a different issue.
However, rather than ask if it's possible, you should instead ask what would be the point of more than one consecutive underscore.
One underscore aids readability by naturally grouping the numbers:
1_000_000
4072_1199_6645_1234 whereas more than one tends to reduce readability:
1_0_0________000_0
4072___________________________11_9_9_6641234
Here is the definition of a decimal literal from the JLS:
DecimalNumeral:
0
NonZeroDigit Digitsopt
NonZeroDigit Underscores Digits
Digits:
Digit
Digit DigitsAndUnderscoresopt Digit
Digit:
0
NonZeroDigit
NonZeroDigit: one of
1 2 3 4 5 6 7 8 9
DigitsAndUnderscores:
DigitOrUnderscore
DigitsAndUnderscores DigitOrUnderscore
DigitOrUnderscore:
Digit
_
Underscores:
_
Underscores _
Notice the recursive definition for Underscores, fun!
No. Here is the document:
http://docs.oracle.com/javase/7/docs/technotes/guides/language/underscores-literals.html
In Java SE 7 and later, any number of underscore characters (_) can
appear anywhere between digits in a numerical literal.

Categories