How to capture a period within an optional block - java

I have the following Strings and wish to capture numbers only including decimal points.
So want to capture following type of numbers.
1
10
100
10.20
This is the regex which works for the Strings at the end.
Regex
(\$|£|$|£)(\d+(?:\.\d+)?)\b(?!\.)
See below where I have other Strings which all works less the following String.
$0 text $99.<sup>¤</sup>
This is cos the $99 is followed by a . It is not a decimal thus I don't want to capture it plus it is optional, not always gonna occur. How could I modify the regex so that I can still capture the value 99 in above String as matcher 2?

You can use
(\$|£|$|£)(\d+(?:\.\d+)?)(?!\.?\d)
The (?!\.?\d) lookahead will only fail if there is an optional dot and then a digit immediately to the right of the current location.

Related

Regex format for a particular Match

I am trying to write a regex for the following format
PA-123456-067_TY
It's always PA, followed by a dash, 6 digits, another dash, then 3 digits, and ends with _TY
Apparently, when I write this regex to match the above format it shows the output correctly
^[^[PA]-]+-(([^-]+)-([^_]+))_([^.]+)
with all the Negation symbols ^
This does not work if I write the regex in the below format without negation symbols
[[PA]-]+-(([-]+)-([_]+))_([.]+)
Can someone explain to me why is this so?
The negation symbol means that the character cannot be anything within the specified class. Your regex is much more complicated than it needs to be and is therefore obfuscating what you really want.
You probably want something like this:
^PA-(\d+)-(\d+)_TY$
... which matches anything that starts with PA-, then includes two groups of numbers separated by a dash, then an underscore and the letters TY. If you want everything after the PA to be what you capture, but separated into the three groups, then it's a little more abstract:
^PA-(.+)-(.+)_(.+)$
This matches:
PA-
a capture group of any characters
a dash
another capture group of any characters
an underscore
all the remaining characters until end-of-line
Character classes [...] are saying match any single character in the list, so your first capture group (([^-]+)-([^_]+)) is looking for anything that isn't a dash any number of times followed by a dash (which is fine) followed by anything that isn't an underscore (again fine). Having the extra set of parentheses around that creates another capture group (probably group 1 as it's the first parentheses reached by the regex engine)... that part is OK but probably makes interpreting the answer less intuitive in this case.
In the re-write however, your first capture group (([-]+)-([_]+)) matches [-]+, which means "one or more dashes" followed by a dash, followed by any number of underscores followed by an underscore. Since your input does not have a dash immediately following PA-, the entire regex fails to find anything.
Putting the PA inside embedded character classes is also making things complicated. The first part of your first one is looking for, well, I'm not actually sure how [^[PA]-]+ is interpreted in practice but I suspect it's something like "not either a P or an A or a dash any number of times". The second one is looking for the opposite, I think. But you don't want any of that, you just want to start without anything other than the actual sequence of characters you care about, which is just PA-.
Update: As per the clarifications in the comments on the original question, knowing you want fixed-size groups of digits, it would look like this:
^PA-(\d{6})-(\d{3})_TY$
That captures PA-, then a 6-digit number, then a dash, then a 3-digit number, then _TY. The six digit number and 3 digit numbers will be in capture groups 1 and 2, respectively.
If the sizes of those numbers could ever change, then replace {x} with + to just capture numbers regardless of max length.
according to your comment this would be appropriate PA-\d{6}-\d{3}_TY
EDIT: if you want to match a line use it with anchors: ^PA-\d{6}-\d{3}_TY$

Can you fix this Java Regex to match currency such as -10 USD, 12.35 AUD ... (Java)?

I have a need to validate the Currency String as followings:
1. The Currency Unit must be in Uppercase and must contain 3 characters from A to Z
2. The number can contain negative (-) or positive (+) sign.
3. The number can contain the decimal fraction, but if the number contain
the decimal fraction then the fraction must be 2 Decimal only.
4. There is no space in the number part
So see this example:
10 USD ------> match
+10 USD ------> match
-10 USD ------> match
10.23 AUD ------> match
-12.11 FRC ------> match
- 11.11 USD ------> NOT match because there is space between negative sign and the number
10 AUD ------> NOT match because there is 2 spaces between the number and currency unit
135.1 AUD ------> NOT match because there is only 1 Decimal in the fraction
126.33 YE ------> NOT match because the currency unit must contain 3 Uppercase characters
So here is what I tried but failed
if(text != null && text.matches("^[+-]\\d+[\\.\\d{2}] [A-Z]{3}$")){
return true;
}
The "^\\d+ [A-Z]{3}$" only match number without any sign and decimal part.
So Can you fix this Java Regex to match currency that meets the above requirements?
Some other questions in the internet do not match my requirements.
It seems you don't know about ? quantifier which means that element which this quantifier describes can appear zero times or once, making it optional.
So to say that string can contain optional - or + at start just add [-+]?.
To say that it can contain optional decimal part in form .XX where X would be digit just add (\\.\\d{2})?
So try with "^[-+]?\\d+(\\.\\d{2})? [A-Z]{3}$"
BTW If you are using yourString.matches(regex) then you don't have to add ^ or $ to regex. This method will match only if entire string will match regex so these metacharacters are not necessary.
BTW2 Normally you should escape - in character class [...] because it represents range of characters like [A-Z] but in this case - can't be used this way because it is at start of character class so there is no "first" range character, so you don't have to escape - here. Same goes if - is last character in [..-]. Here it also can't represent range so it is simple literal.
Try with:
text.matches("[+-]?\\d+(\\.\\d\\d)? [A-Z]{3}")
Note that since you use .matches(), the regex is automatically anchored (blame the Java API desingers for that: .matches() is woefully misnamed)
you could start your regex with
^(\\+|\\-)?
Which means that it will accept either one + sign, one - sign or nothing at all before the digit. But that's only one of your problems.
Now the decimal point:
"3. The number can contain the decimal fraction, but if the number contain
the decimal fraction then the fraction must be 2 Decimal only."
so after the digit \\d+ the next part should be in ( )? to indicate that it is optional (meaning 1 time or never). So either there are exactly one dot and two digits or nothing
(\\.\\d{2})?
Here you can find a reference for regex and test them. Just have a look at what else you could use to identify the 3 Letters for the currency. E.g. the \s could help you to identify a whitespace
This will match all your cases:
^[-+]?\d+(\.\d{2})?\s[A-Z]{3}$
(Demo # regex101)
To use it in Java you have to escape the \:
text.matches("^[-+]?\\d+(\\.\\d{2})?\\s[A-Z]{3}$")
Your regex wasn't far from the goal, but it contains several mistakes.
The most important one is: [] denotes a character class while () is a capturing group. So when you specify a character group like [\\.\\d{2}] it will match on the characters \,.,d,{,2, and}, while you want to match on the pattern .\d{2}.
The other answers already taught you the ? quantifier, so I won't repeat this.
On a sidenote: regular-expressions.info is a great source to learn these things!
Explanation of the regex used above:
^ #start of the string/line
[-+]? #optionally a - or a + (but not both; only one character)
\d+ #one or more numbers
( #start of optional capturing group
\.\d{2} #the character . followed by exactly two numbers (everything optional)
)? #end of optional capturing group
\s #a whitespace
[A-Z]{3} #three characters in the range from A-Z (no lowercase)
$ #end of the string/line

Match a possibly negative number that may or may not be double

I am building a temperature converter in Java. The user will input a number into a JTextField, pick their starting label using a JComboBox, and pick their ending label using JRadioButtons in a ButtonGroup. The number that the user enters can be of varying possiblities. It can be
A single integer, such as 5
A multiple integer, such as 55
A double, such as 5.5
A negative version of any of the above, such as -5, -55, or -5.5
The JTextField has a method getText(), which returns the value of the string. This is then converted to a double, and finally converted to the desired ending label. Because the String has to be converted to a double, alpha characters can't be allowed in the JTextField. So I am using regex to solve this. I currently have
String tempV = startTempValInput.getText();
if (tempV.matches("-?[0-9]+\\.[0-9]+")) {
// Code Here
}
However, this doesn't recognize single or multiple integers. How can I modify this to include integers?
tempV.matches("-?[0-9]+(\\.[0-9]+)?")
Breaking it down:
-? - This will match a negative number 0 or 1 time
[0-9]+ - This will match a numeric character 1 or more times
(\\.[0-9]+)? - This will match a possible decimal place to infinity
\\. - This will match a period. The double escape is needed, because Java recognizes Regexes as normal strings. This means you have to escape the backslash
[0-9]+ - This will match a numeric character 1 or more times
This is all wrapped in ()? because it is optional to have a decimal place. If you were to try -?[0-9]+\\.?[0-9]+ instead, it would not recognize a single integer. It would see the negative and period as optional, but since + returns 1 or more, it would require at least two integers.
An alternative would be
tempV.matches("-?([0-9]+\\.)?[0-9]+")
Because the String has to be converted to a double, alpha characters can't be allowed in the JTextField
What about 2.0E3?
I would just use Double.parseDouble, and catch the NumberFormatException:
try {
Double.parseDouble(inStr);
} catch (NumberFormatException e) {
// string doesn't represent a double
}
You could do it yourself with a regex, but there are various edge cases to consider: scientific notation (as shown above), a leading +, etc. It's certainly manageable, but why write code to accomplish something that's already done for you?
A pattern that catches a broad range of double input is:
^(?![+-]$|[-+]E|E|$|\.$)[+-]?\d*(\.?\d*)?(E-?\d+)?$
which matches edge cases of:
1
1.
.1
4E7
3.4E-5
+7.4
-5
See a live demo.
Because every term is optional, the negative look ahead is needed to catch degenerate cases that would otherwise match.
Note that this regex does not prevent over/under flow, eg 1E9999, which is too large for double to represent.
You could cover all the bases with this possibly:
edit - ensure a digit somewhere, thanks #yshavit
^(?=\D*\d)(?=[\d.-]+$)-?\d*\.?\d*$
Could be extended to cover more as well:
modified - Simplified version, some assertions are not needed.
# (?i)^(?=[^e]*\d)[+-]?\d*\.?\d*(?:e[+-]?\d+)?$
# "(?i)^(?=[^e]*\\d)[+-]?\\d*\\.?\\d*(?:e[+-]?\\d+)?$"
(?i) # Case insensitive modifier
^ # Beginning of string
(?= [^e]* \d ) # Lookahead must be a digit (and before exponent)
[+-]? \d* \.? \d* # Consume correct numeric form
(?: e [+-]? \d+ )? # Consume correct exponent form
$ # End of string

Only capture digits instead of the other text in input for a regex like below

Currently this regular expression:
^(?:\\S+\\s+)*?(\\S+)\\s+(?:No\\.\\s+)?(\\S+)(?:\\s+\\(.*?\\))?$
captures 418—FINAL in group number 2 for an input like:
String text="H.B. 418—FINAL VERSION";
How do I change this regular expression to only capture the number (digits) of "418" in group2 ?
EDIT:
I'd still like to capture "H.B." in a preceding group.
Just change the boundaries of the second group to only include the digits. To also save the "H.B." part, add paranthesis around that part too:
^(?:(\\S+)\\s+)*?(\\d+)\\S+\\s+(?:No\\.\\s+)?(\\S+)(?:\\s+\\(.*?\\))?$
I'm not entirely sure what your exact requirements are (your regex is looking for an optional "No." but you haven't given any examples). But this will work on the example you give:
^(?:\\S+\\s+)*?(\\S+)\\s+(?:No\\.\\s+)?(\\d+).*(?:\\s+\\(.*?\\))?$
assuming you don't need the text following the digits. That is, just change the second \S to \d. I also added .* after this to match any remaining characters following the digits up to an optional parenthesized part (without capturing them, but you can capture them if you want to).

Regular Expression to match one or more digits 1-9, one '|', one or more '*" and zero or more ','

I'm new to regular expressions and I need to find a regular expression that matches one or more digits [1-9] only ONE '|' sign, one or more '*' sign and zero or more ',' sign.
The string should not contain any other characters.
This is what I have:
if(this.ruleString.matches("^[1-9|*,]*$"))
{
return true;
}
Is it correct?
Thanks,
Vinay
I think you should test separately for every type of symbols rather than write complex expression.
First, test that you don't have invalid symbols - "^[0-9|*,]$"
Then test for digits "[1-9]", it should match at least one.
Then test for "\\|", "\\*" and "\\," and check the number of matches.
If all test are passed then your string is valid.
Nope, try this:
"^[1-9]+\\|\\*+,*$"
Please give us at least 10 possible matching strings of what you are looking to accept, and 10 of what you want to reject, and tell us if either this have to keep some sequence or its order doesn't matter. So we can make a reliable regex.
By now, all I can offer is:
^[1-9]+\|{1}\*+,*$
This RegEx was tested against these sample strings, accepting them:
56421|*****,,,
2|*********,,,
1|*
7|*,
18|****
123456789|*
12|********,,
1516332|**,,,
111111|*
6|*****,,,,
And it was tested against these sample strings, rejecting them:
10|*,
2***525*|*****,,,
123456,15,22*66*****4|,,,*167
1|2*3,4,5,6*
,*|173,
|*,
||12211
12
1|,*
1233|54|***,,,,
I assume your given order is strict and all conditions apply at the same time.
It looks like the pattern you need is
n-n, one or more times seperated by commas
then a bar (|)
then n*n, one or more times seperated by commas.
Here is a regular expression for that.
([1-9]{1}[0-9]*\-[0-9]+){1}
(,[1-9]{1}[0-9]*\-[0-9]+)*
\|
([1-9]{1}[0-9]*\*[0-9]+){1}
(,[1-9]{1}[0-9]*\*[0-9]+)*
But it is so complex, and does not take into account the details, such as
for the case of n-m, you want
n less than m
(I guess).
And you likely want the same number of n-m before the bar, and x*y after the bar.
Depends whether you want to check the syntax completely or not.
(I hope you do want to.)
Since this is so complex, it should be done with a set of code instead of a single regular expression.
this regex should work
"^[1-9\\|\\*,-]*$"
Assert position at the beginning of the string «^»
Match a single character present in the list below «[1-9\|*,-]»
Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «»
A character in the range between “1” and “9” «1-9»
A | character «\|»
A * character «*»
The character “,” «,»
The character “-” «-»
Assert position at the end of the string (or before the line break at the end of the string, if any) «$»

Categories