regex not matching with some numbers - java

I have this regex: ^(\d*.?\d*)$ for all numbers, but some numbers won't match with this regex
Some Examples:
54139 // work
24.711 // won't work, not a float but dot is the separator
0 // won't work
60 // won't work
I used this regex in RegexValidator. I'm validating a textfield:
TextField textField = new TextField(caption);
textField.setValue(value);
textField.addValidator(new StringLengthValidator(value + " ...",10, 50, true));
textField.addValidator(new RegexpValidator("^(\\d*.?\\d*)$", value + " ..."));
I tried it with another regex: ^[0-9,.]+$

If I understood your problem correctly, you're validating the content of a multi-line TextField and want to accept a content consisting of one or multiple lines of non-empty sequences of .-separated floats and integers.
This statement can be simplified as follows : you're looking to match a sequence of numbers separated by . or linefeeds, where a number can contain a decimal part.
If would then use the following RegexpValidator :
new RegexpValidator("\\d+(,\\d+)?([.\\n]\\d+(,\\d+)?)*", true, value + "...")
In this regular expression, a number is represented as \\d+(,\\d+)?, which represents a mandatory integer part followed by an optional decimal part with its comma separator.
The global regular expression accepts a number such as defined above, followed by a (possibly empty) sequence of other numbers preceded by one of the accepted delimiters, a . or a linefeed.
I verified my answer with the following expression which returns true :
"54139\n12.5\n1,2.5,3\n12,5".matches("\\d+(,\\d+)?([.\\n]\\d+(,\\d+)?)*");
Note that I removed the anchors you used and used the complete parameter of RegexpValidator instead.
The regex could arguably be reduced to \d+([.,\n]\d+)*, but would then stop representing the difference between . which is a separator between numbers and , which is part of the numbers. It doesn't matter at the runtime for a validator, but could bring confusion to people maintaining your code, and couldn't easily be reused if you later want to match the different numbers.

Related

How to capture a period within an optional block

I have the following Strings and wish to capture numbers only including decimal points.
So want to capture following type of numbers.
1
10
100
10.20
This is the regex which works for the Strings at the end.
Regex
(\$|£|$|£)(\d+(?:\.\d+)?)\b(?!\.)
See below where I have other Strings which all works less the following String.
$0 text $99.<sup>¤</sup>
This is cos the $99 is followed by a . It is not a decimal thus I don't want to capture it plus it is optional, not always gonna occur. How could I modify the regex so that I can still capture the value 99 in above String as matcher 2?
You can use
(\$|£|$|£)(\d+(?:\.\d+)?)(?!\.?\d)
The (?!\.?\d) lookahead will only fail if there is an optional dot and then a digit immediately to the right of the current location.

Regex single character user name validation

Regular expression mentioned below is used to validate a user input in Java.
username.matches("^\\p{L}+[\\p{L}\\p{Z}.']+")
The regular expression is working for more than one character input, but fails for single character input.
As '+' denotes one and more than one characters, I confused how to support one character input as valid input.
That's because both parts in your regex are requiring at least one character each (see the + almost at the end of the regex). If you want that part to be optional, it should be * instead.
The regex you have will match 2 or more symbols. The reason is, this is symbol one (or more):
\\p{L}+
And this is symbol 2 (or more):
[\\p{L}\\p{Z}.']+
Most likely you want the last part to be "0 or more", like this:
"^\\p{L}+[\\p{L}\\p{Z}.']*"
Your regex requires a minimum of 2 characters.
"^\p{L}+" - minimum of 1
"[\p{L}\p{Z}.']+" - minimum of 1
The "+" does denote one or more characters.

Regex expression to validate a formula

I am new to regex and currently building web application in Java.I have the following requirements to validate a formula:
Formula must start with “T”
A formula can contain the following set of characters:
Digit: 0 - 9
Alpha: A - Z
Operators: *, /, +, -
Separator: ;
An operator must always be followed by a digit
The character “T” must always be followed by a digit or an alpha.
The separator must always be followed by “T”.
The character “M” must always be followed by an operator.
I manage to build up the following expression as shown below:
^[T][A-Z0-9 -- \\+*;]*
But i don't know how to add the following validation with regex above:
An operator must always be followed by a digit
The character “T” must always be followed by a digit or an alpha.
The separator must always be followed by “T”
The character “M” must always be followed by an operator.
Valid sample: TA123;T1*2/32M+
Invalid Sample: T+qMg;Y
^(?!.*[*+/-]\\D)(?!.*T\\W)(?!.*[;:][^T])(?!.*M[^*+/-])[T][A-Z0-9 +/*;:-]*$
You can use this.See demo.
https://regex101.com/r/sS2dM8/7
We lack a bit of information to fully understand what you want. A couple examples would help.
For now, a small regexp :
^(T[A-LN-Z0-9]*M[+-/*][0-9];?)*
EDIT :
From my understanding, this should be close to what you're looking for :
^(T([A-LN-Z0-9]*M?[+\-/*]?[0-9]?)*;?)+
https://regex101.com/r/hT7aP2/1
This regexp forces the line to begin with a T, then have 0 to many [A-LN-Z0-9] range, meaning all your alphas and digits except M.
Then it needs to have a M followed by an operator in the range of [+-/*] *(pretty much +, -, / and , except that - and / are special characters so we tell the regexp that we want these characters, and not the meaning they're supposed to have).
Then it continues by one to many digits, and ends by a ";" that might or might not be there.
And everything in the parenthesis can be repeated from 0 to several times
I would have liked examples of what you want to validate... For example, we don't know if the line HAVE to end with a ";"
Depending on what you want, splitting the string you want to validate using the character ";" and validating each of the generated string with that regexp might work

Regex for excluding a pattern of more than 2 "==" in Java

I need to extract any text between one or more than 2 equals to(i.e. ==,===,===,==== etc) and subsequent text until it searches for next one or more than 2 equals and store in array list.
Ex:
==Notes and references== {{Refli=st|35e=m}}=====Bibliography=====Text starts
Expected output:
[==Notes and references== {{Refli=st|35e=m}}, =====Bibliography=====Text starts]
I have got the regex syntax:
"==+([^==+]*)==+([^==+]*)";
Output i am getting until it encounters single =:
[==Notes and references== {{Refli, =====Bibliography=====Text starts]
[^==+]* matches all characters except = and +. This is not what you want.
Here, it might be easier to use something like:
"==+(.*?)==+(.*?)(?===|$)";
So that you can allow single = signs in between the multiple =.
(?===|$) is a positive lookahead ((?= ... )) and makes sure there's either two consecutive = signs ahead or there's the end of the string.
Or if you want to negate specifically the ==+ in the parts in between you can use negative lookaheads:
"==+((?:(?!==+).)*)==+((?:(?!==+).)*)";
This syntax ((?:(?!==+).)*) will check for every character and make sure it isn't a == (or more).

Regular Expression to select first five CSVs from a string

I have a CSV string like apple404, orange pie, wind\,cool, sun\\mooon, earth, in Java. To be precise each value of the csv string could be any thing provided commas and backslash are escaped using a back slash.
I need a regular expression to find the first five values. After some goggling I came up with the following. But it wont allow escaped commas within the values.
Pattern pattern = Pattern.compile("([^,]+,){0,5}");
Matcher matcher = pattern.matcher("apple404, orange pie, wind\\,cool, sun\\\\mooon, earth,");
if (matcher.find()) {
System.out.println(matcher.group());
} else {
System.out.println("No match found.");
}
Does anybody know how to make it work for escaped commas within values?
Following negative look-behind based regex will work:
Pattern pattern = Pattern.compile("(?:.*?(?<!(?:(?<!\\\\)\\\\)),){0,5}");
However for full fledged CSV parsing better use a dedicated CSV parser like JavaCSV.
You can use String.split() here. By specifying the limit as 6 the first five elements (index 0 to 4) would always be the first five column values from your CSV string. If in case any extra column values are present they would all overflow to index 5.
The regex (?<!\\\\), makes sure the CSV string is only split at a , comma not preceded with a \.
String[] cols = "apple404, orange pie, wind\\,cool, sun\\\\mooon, earth, " +
"mars, venus, pluto".split("(?<!\\\\),", 6);
System.out.println(cols.length); // 6
System.out.println(Arrays.toString(cols));
// [apple404, orange pie, wind\,cool, sun\\mooon, earth, mars, venus, pluto]
System.out.println(cols[4]); // 5th = earth
System.out.println(cols[5]); // 6th discarded = mars, venus, pluto
This regular expression works well. It also properly recognizes not only backslash-escaped commas, but also backslash-escaped backslashes. Also, the matches it produces do not contain the commas.
/(?:\\\\|\\,|[^,])*/g
(I am using standard regular expression notation with the understanding that you would replace the delimiters with quote marks and double all backslashes when representing this regular expression within a Java string literal.)
example input
"apple404, orange pie, wind\,cool, sun\\,mooon, earth"
produces this output
"apple404"
" orange pie"
" wind\,cool"
" sun\\"
"mooon"
Note that the double backslash after "sun" is escaped and therefore does not escape the following comma.
The way this regular expression works is by atomizing the input into longest sequences first, beginning with double backslashes (treating them as one possible multi-byte character value alternative), followed by escaped commas (a second possible multi-byte character alternative), followed by any non-comma value. Any number of these atoms are matched, followed by a literal comma.
In order to obtain the first N fields, one may simply splice the array of matches from the previous answer or surround the main expression in additional parentheses, include an optional comma in order to match the contents between fields, anchor it to the beginning of the string to prevent the engine from returning further groups of N fields, and quantify it (with N = 5 here):
/^((?:\\\\|\\,|[^,])*,?){0,5}/g
Once again, I am using standard regular expression notation, but here I will also do the trivial exercise of quoting this as a Java string:
"^((?:\\\\\\\\|\\\\,|[^,])*,?){0,5}"
This is the only solution on this page so far which actually answers both parts of the precise requirements specified by the OP, "...commas and backslash are escaped using a back slash." For the input fi\,eld1\\,field2\\,field3\\,field4\\,field5\\,field6\\,, it properly matches only the first five fields fi\,eld1\\,field2\\,field3\\,field4\\,field5\\,.
Note: my first answer made the same assumption that is implicitly part of the OP's original code and example data, which required a comma to be following every field. The problem was that if input is exactly 5 fields or less, and the last field not followed by a comma (equivalently, by an empty field), then final field would not be matched. I did not like this, and so I updated both of my answers so that they do not require following commas.
The shortcoming with this answer is that it follows the OP's assumption that values between commas contain "anything" plus escaped commas or escaped backslashes (i.e., no distinction between strings in double quotes, etc., but only recognition of escaped commas and backslashes). My answer fulfills the criteria of that imaginary scenario. But in the real world, someone would expect to be able to use double quotes around a CSV field in order to include commas within a field without using backslashes.
So I echo the words of #anubhava and suggest that a "real" CSV parser should always be used when handling CSV data. Doing otherwise is just being a script kiddie and not in any way truly "handling" CSV data.

Categories