Regular expression to check if a String is a positive natural number - java

I want to check if a string is a positive natural number but I don't want to use Integer.parseInt() because the user may enter a number larger than an int. Instead I would prefer to use a regex to return false if a numeric String contains all "0" characters.
if(val.matches("[0-9]+")){
// We know that it will be a number, but what if it is "000"?
// what should I change to make sure
// "At Least 1 character in the String is from 1-9"
}
Note: the string must contain only 0-9 and it must not contain all 0s; in other words it must have at least 1 character in [1-9].

You'd be better off using BigInteger if you're trying to work with an arbitrarily large integer, however the following pattern should match a series of digits containing at least one non-zero character.
\d*[1-9]\d*
Debuggex Demo
Debugex's unit tests seem a little buggy, but you can play with the pattern there. It's simple enough that it should be reasonably cross-language compatible, but in Java you'd need to escape it.
Pattern positiveNumber = Pattern.compile("\\d*[1-9]\\d*");
Note the above (intentionally) matches strings we wouldn't normally consider "positive natural numbers", as a valid string can start with one or more 0s, e.g. 000123. If you don't want to match such strings, you can simplify the pattern further.
[1-9]\d*
Debuggex Demo
Pattern exactPositiveNumber = Pattern.compile("[1-9]\\d*");

If you want to match positive natural numbers, written in the standard way, without a leading zero, the regular expression you want is
[1-9]\d*
which matches any string of characters consisting only of digits, where the first digit is not zero. Don't forget to double the backslash ("[1-9]\\d*") if you write it as a Java String literal.

I made the following regex for only positive natural numbers:
^[1-9]\d*$
This will will check if a number starts with 1 to 9 (so there can't be any zero's in the beginning) and there rest of the numbers need to be numbers from 0 to 9. You can test it at https://regex101.com

Related

How to extract all integers from a string and store them in an int array in java

I want to take a string and isolate all integer numbers in it then store them up in an array.
The input string will only ever contain letters a-z(both upper and lower case), digits 0-9 and "-" (read as minus sign).
So far I've written:
String str = readString();
String[] arr = str.split("[^0-9-]+");
if my input string is for example "15abc-59abc31abc100" the code above works fine and I just need to convert each element from the string array into int, however if my input string has no letters between the numbers to seperate them it won't work properly. example: abc59-12abc56-10abc10 will produce an array that only has 3 elements: 59-12, 56-10, 10 how do I make it recognize the minus sign as a start of a new element in the array without losing the sign itself?
Ideally I want the input "abc59-12abc56-10abc10" to look like this after the split:
String[] arr = {"59","-12","56","-10","10");
readString(); method will always provide the type of string I described above btw.
What you really want here is that - is a valid 'number symbol', but only at the start of any given number block.
However, by attempting to match the negative space in between them, you've made it hard on yourself: That is tricky to put in terms of regexes.
But, had you gone the positive route (write a regexp that describes a number), that'd be trivial: Pattern.compile("-?\\d+") describes it perfectly: An optional minus sign followed by 1 or more digits. Simple enough. That'll even 'work' if your input is "aa----5", which has one matching sequence (-5).
So.. do that, then. You're abusing split here. Don't abuse systems, it tends to go badly once you make things even a tiny little bit more complicated.
private static final Pattern NUMBER = Pattern.compile("-?\\d+");
public List<Integer> getNumbers(String in) {
Matcher m = NUMBER.matcher(in);
var out = new ArrayList<Integer>();
while (m.find()) out.add(Integer.parseInt(m.group(0)));
return out;
}
A few tricks are being used here:
m.find() finds the next subsequence in the input string that matches the provided regexp.
Regexps are 'greedy' by default. meaning, the string "1234" can be interpreted equally legally in many ways: Is that a single item (1234)? Is that Just the '1' also matches "-?\d+", after all. "Greedy" means that regexpes will match the longest sequence they can. Which is exactly what you want, no doubt.
m.group(0) gets you the match. 0 is a special group comprising the entire found sequence. If you use parentheses in regexes, you make groups, and you can get those too, e.g. if you want to exclude the minus sign you could have done "-?(\\d+)" - note the parentheses. Now you can do m.group(1).
Note that if you must have an int[], converting a list of integers to int[] requires a loop. toArray can't do it (Integer is not int, but List<int> is, for now, as yet illegal java).
It's a one-liner: Strip leading and trailing non-digits, then split on non-digit/minus, convert to int then collecting to an array:
int[] numbers = Arrays.stream(str.replaceAll("^\\D+|\\D+$", "").split("(?=-\\d)|[^-\\d]+"))
.mapToInt(Integer::parseInt).toArray();
The key here is the split, which does a zero-width spit when the following characters are a minus then a digit, or on chars that are neither minus nor digits.

Is there a wildcard for intergers to use in regex?

I want to know if there is a way to check if a string contains a certain pattern for a regex.
For example:
string.matches("something[0-9]x") would check if the string contains a substring of "something" with any single digit integer following it followed by "x". But lets say if I want to check the same thing, but there is no limit for that int, ie it could be 1000000. Is there like a wildcard for an int that I can use?
Just use modifier + after your character class which match the preceding token one or more time :
string.matches("something[0-9]+x")
Regular expressions work on characters; they have no semantic understanding of those characters. So it doesn't make sense to talk about "integers" here; the best that you can do is to talk about "digits". The number "1" is one digit; "1234" is four.
In a regular expression, you can match one or more of the preceding pattern using "+", so the regex "something[0-9]+x" should do what you want. If you want an upper bound on the number of digits, than you can try something like "something[0-9]{1,5}x"
Yes, simply use *, so in your example string.matches("something[0-9]+x")
It would match a string something followed by any digit from 0 to 9, which have to occur at least one time, so * means zero or more times, while + means it have to occur at least one time but can occur more times if it wants.
If you do [0-9]{n,m} you can specify with m and n in which range it can occur for example:
[0-9]{2,3} will match any digit and it have to occur 2 or 3 times, if you only use one digit in this bracs [0-9]{2} it has to occur at least 2 times.
But at last: simply learn to use google ... there are so many regexp sites with tutorials and stuff.

Regex for numbers between 0 and 180 and decimals places in Javacc

So I'm creating a token in JavaCC by using regex.
I'm trying to only allow 3 digit numbers and is only between 0 - 180.
Also, I'm trying to only allow (in a separate token) 2 digit numbers between 0 and 59.9999 (4 decimal places).
I have no idea how to create the regex for these two tokens in JavaCC...
Any help would with an explanation would be awesome thanks :)
For the first case, your pattern needs to allow 1-digit numbers, 2-digit numbers, 3-digit numbers whose first digit is 1 and whose second digit is in the range 0-7, and the special case 180. The regex would look like
[0-9]{1,2}|1[0-7][0-9]|180
(I don't know javacc, so I don't know how this regex would be used, or whether you need something else to prevent something like 1800 from being parsed as a number, or as two numbers. You might need \b on the ends to indicate a word boundary, but I have no idea how javacc works.)
For the second case, the part to the left of the decimal point is either one digit, or two digits where the first digit is in the range 0-5. Your requirements aren't clear, but if the token is required to have a decimal point and one to four digits to the right of the decimal point, the regex would be
([0-9]|[0-5][0-9])\.[0-9]{1,4}
Again, I don't know how javacc handles the word boundaries.
Note that if this were a Java program, I would recommend (in the first case) just parsing it as an integer and comparing it to 0 and 180. Too many questioners try to use regexes to solve every problem, but they are not suited for every problem. Since this is for javacc, it may be a context in which regexes are simple to use and numeric comparisons are not--as I've mentioned, I don't know anything about javacc.

Set minimum and maximum characters in a regular expression

I've written a regular expression that matches any number of letters with any number of single spaces between the letters. I would like that regular expression to also enforce a minimum and maximum number of characters, but I'm not sure how to do that (or if it's possible).
My regular expression is:
[A-Za-z](\s?[A-Za-z])+
I realized it was only matching two sets of letters surrounding a single space, so I modified it slightly to fix that. The original question is still the same though.
Is there a way to enforce a minimum of three characters and a maximum of 30?
Yes
Just like + means one or more you can use {3,30} to match between 3 and 30
For example [a-z]{3,30} matches between 3 and 30 lowercase alphabet letters
From the documentation of the Pattern class
X{n,m} X, at least n but not more than m times
In your case, matching 3-30 letters followed by spaces could be accomplished with:
([a-zA-Z]\s){3,30}
If you require trailing whitespace, if you don't you can use: (2-29 times letter+space, then letter)
([a-zA-Z]\s){2,29}[a-zA-Z]
If you'd like whitespaces to count as characters you need to divide that number by 2 to get
([a-zA-Z]\s){1,14}[a-zA-Z]
You can add \s? to that last one if the trailing whitespace is optional. These were all tested on RegexPlanet
If you'd like the entire string altogether to be between 3 and 30 characters you can use lookaheads adding (?=^.{3,30}$) at the beginning of the RegExp and removing the other size limitations
All that said, in all honestly I'd probably just test the String's .length property. It's more readable.
This is what you are looking for
^[a-zA-Z](\s?[a-zA-Z]){2,29}$
^ is the start of string
$ is the end of string
(\s?[a-zA-Z]){2,29} would match (\s?[a-zA-Z]) 2 to 29 times..
Actually Benjamin's answer will lead to the complete solution to the OP's question.
Using lookaheads it is possible to restrict the total number of characters AND restrict the match to a set combination of letters and (optional) single spaces.
The regex that solves the entire problem would become
(?=^.{3,30}$)^([A-Za-z][\s]?)+$
This will match AAA, A A and also fail to match AA A since there are two consecutive spaces.
I tested this at http://regexpal.com/ and it does the trick.
You should use
[a-zA-Z ]{20}
[For allowed characters]{for limiting of the number of characters}

Why does "3.5".matches("[0-9]+") return false?

I use the method String.matches(String regex) to find if a string matches the regex expression
From my point of view the regular expression regex="[0-9]+" means a String that contains at least one figure between 0 and 9
But when I debug "3.5".matches("[0-9]+") it returns false.
So what is wrong ?
matches determines if the regex matches the whole string. It won't return true if the string contains a match.
To test if the string contains a match to a given regex, use Pattern.compile(regex).matcher(string).find().
(Your regex, [0-9]+, will match any string that contains only digits from 0 to 9, and at least one digit. It doesn't magically match against any real number. If you want something matching any real number, look at e.g. the Javadoc for Double.valueOf(String), which specifies a regex used in validating doubles. That regex allows hexadecimal input, NaNs, and infinities, but it should give you a better idea of what's required.)
Alternately, edit the regex so it directly matches any string containing one or more digits, e.g. .*[0-9]+.* would do the job.
If you want to match decimal numbers, your reg ex needs to be \d*\.?\d+. If you want negatives as well, then \-?\d*\.?\d+.
. is not 0-9 and matches tests the entire string.

Categories