How to match a string in this way?

How to match a string in this way? - java

I need to check if a String matches this specific pattern.
The pattern is:
(Numbers)(all characters allowed)(numbers)
and the numbers may have a comma ("." or ",")!
For instance the input could be 500+400 or 400,021+213.443.
I tried Pattern.matches("[0-9],?.?+[0-9],?.?+", theequation2), but it didn't work!
I know that I have to use the method Pattern.match(regex, String), but I am not being able to find the correct regex.

Dealing with numbers can be difficult. This approach will deal with your examples, but check carefully. I also didn't do "all characters" in the middle grouping, as "all" would include numbers, so instead I assumed that finding the next non-number would be appropriate.
This Java regex handles the requirements:
"((-?)[\\d,.]+)([^\\d-]+)((-?)[\\d,.]+)"
However, there is a potential issue in the above. Consider the following:
300 - -200. The foregoing won't match that case.
Now, based upon the examples, I think the point is that one should have a valid operator. The number of math operations is likely limited, so I would whitelist the operators in the middle. Thus, something like:
"((-?)[\\d,.]+)([\\s]*[*/+-]+[\\s]*)((-?)[\\d,.]+)"
Would, I think, be more appropriate. The [*/+-] can be expanded for the power operator ^ or whatever. Now, if one is going to start adding words (such as mod) in the equation, then the expression will need to be modified.
You can see this regular expression here

In your regex you have to escape the dot \. to match it literally and escape the \+ or else it would make the ? a possessive quantifier. To match 1+ digits you have to use a quantifier [0-9]+
For your example data, you could match 1+ digits followed by an optional part which matches either a dot or a comma at the start and at the end. If you want to match 1 time any character you could use a dot.
Instead of using a dot, you could also use for example a character class [-+*] to list some operators or list what you would allow to match. If this should be the only match, you could use anchors to assert the start ^ and the end $ of the string.
\d+(?:[.,]\d+)?.\d+(?:[.,]\d+)?
In Java:
String regex = "\\d+(?:[.,]\\d+)?.\\d+(?:[.,]\\d+)?";
Regex demo
That would match:
\d+(?:[.,]\d+)? 1+ digits followed by an optional part that matches . or , followed by 1+ digits
. Match any character (Use .+) to repeat 1+ times
Same as the first pattern

Related

Regex return true if even a substring follows the pattern

I was just practicing regex and found something intriguing
for a string
"world9 a9$ b6$" my regular expression "^(?=.*[\\d])(?=\\S+\\$).{2,}$"
will return false as there is a space in between before the look ahead finds the $ sign with at least one digit and non space character.
As a whole the string doesn't matches the pattern.
What should be the regular expression if I want to return true even if a substring follows a pattern?
as in this one a9$ and b6$ both follow the regular expression.

You can use
^(?=\D*\d)(?=.*\S\$).{2,}$
See the regex demo. As The fourth bird mentions, since \S\$ matches two chars, you may simply move the pattern to the consuming part, and use ^(?=\D*\d).*\S\$.*$, see this regex demo.
Details
^ - start of string (implicit if used in .matches())
(?=\D*\d) - a positive lookahead that requires zero or more non-digit chars followed with a digit char immediately to the right of the current location
(?=.*\S\$) - a positive lookahead that requires zero or more chars other than line break chars, as many as possible, followed with a non-whitespace char and a $ char immediately to the right of the current location
.{2,} - any two or more chars other than line break chars, as many as possible
$ - end of string (implicit if used in .matches())

Mostly, knock out the ^ and $ bits, as those force this into a full string match, and you want substring matches. In general, look-ahead seems like a mistake here, what are you trying to accomplish by using that? (Look-ahead/look-behind is rarely needed in general). All you need is:
Pattern.compile("\\S+\\$");
possibly, if you want an element (such as a9$) to stand on its own, use \b which is regexpese for word break: Basically, whitespace (and a few other characters, such as underscores. Most non-letter, non-digits characters are considered a break. Think [^a-zA-Z0-9]) - but \b also matches start/end of input. Thus:
Pattern.compile("\\b\\S+\\$\\b")
still matches foo a9$ bar, or a9$ just fine.
If you MUST put this in terms of a full match, e.g. because matches() (which always does a full string match) is run and you can't change that, well, put ^.* in front and .*$ at the back of it, simple as that.
Absolutely nothing about this says "This can only be needed with lookahead".

How to make Java String split greedy with lookahead?

Code is basically:
String[] result = "T&&T&T".split("(?=\\w|&+)");
I was expecting the lookahead to be greedy but instead it is returning the array:
T, &, &, T, &, T
What I am aiming for is:
T, &&, T, &, T
Is this possible for split and lookahead?
I have tried the following split regex values but the result is still not greedy for the ampersand:
"(?=\\w|&&?)"
"(?=\\w|&{1,2})"

It is already greedy, but I think you are misunderstanding how your split is working. The problem is that you are thinking of the characters but not the space between them (this is one of the places where regexes can get away from you).
You are asking to split at the places in the string where the next character is either a word character or a series of ampersands. In your string, let's mark the places that satisfy that:
T|&|&|T|&|T
In the space between the first T and the first ampersand, the next character is an ampersand (matches (?=&) which is valid in your regex), the space between the two ampersands also matches for this same reason. The space between the ampersands and the second T also matches (matches (?=\w)), and so on.
The split function will test each space in the string to determine if it is a candidate for a split position. To do what you want, you have to be careful about using the lookahead, so that we don't allow allow splits in the middle of a string of ampersands.
There are multiple ways you may overcome this; Wiktor Stribiżew provides a suggestion that works in his comment.
Usually using a look-behind to check that you are not repeating an undesired character will work, or if possible you can use a look-behind to identify the matching places, and a look-ahead to avoid the undesired repetitions. For example, if we wish to split at all characters keeping repeated characters together, you could do (?<=(.))(?!\\1) which splits your example as T, &&, T, &, T.

Lookarounds cannot be greedy or reluctant, they just check if the adjoining text to the left (lookbehind) and to the right (lookahead) matches the lookaround subpattern. If there is a match, and the lookaround is positive, the empty location is matched. If the lookaround is not anchored, each location in string is tested against the pattern in the lookaround, even the beginning and end. See this screenshot showing that (with your (?=\w|&&?)):
Since the lookaround is a zero-width assertion and it does not consume characters, all locations (before each character and at the end) are tested. Thus, you get matches between each character.
The (?=\w|&&?) checks the first location before T: it gets matched with \w, so this location is matched (see the first |). Then comes the next location, after the first T before the &. It is matched as it is followed woth &&. Then the regex engine goes on to check the location after the first & and the second &. It is matched as there is a & after it. This way, we match up to the end. The end location is not matched as it is not followed with & or a word character.
You may restrict the pattern inside a lookaround with another lookaround to avoid matching specific locations inside the input string.
(?=\w|(?<!&)&)
^^^^^^
The (?<!&)& pattern will match a & that is not preceded with another &. See the regex demo.
IDEONE demo:
String[] result = "T&&T&T".split("(?=\\w|(?<!&)&)");
System.out.println(Arrays.toString(result));
// => [T, &&, T, &, T]
The lookaround solution is a generic one. If we are to consider the current case, you can surely "shorten" the pattern to \b (which will also find a match at the end of the string, though Java String#split will safely remove trailing empty elements from the resulting array) that matches all locations between a non-word and word characters and also at the start/end of the string if there is a word character at its start/end. This won't work if the alternatives (like \w and & in your regex) belong to the same type (say, both are word characters.

How about this:
"(?=\\w)|(?<=\\w)"
or allowing repeat of T:
"(?<!\\w)(?=\\w)|(?<=\\w)(?!\\w)"
or the best form here

It looks like you want to split between different chars, so generally:
String[] parts = input.split("(?<=T)(?=&)|(?<=&)(?=T)");
But in this case, you can split on word boundaries except at start/end:
String[] parts = input.split("(?<=.)\b(?=.)");

Regex Exceptions

I am trying to come up with a java regex that will match numbers with 2 too 3 decimals and not match any decimal number more than 3.
this is my regex
[0-9]{2}[.][0-9]{3}
It matches 41.51778000 and 18.740
but I only want it to match numbers that have exactly 3 decimal places and not numbers with more than three

You need to ask the regex to match the end and start as well.
^[0-9]{2}[.][0-9]{3}$

You must use word boundary on either side to stop unexpected matches:
\b[0-9]{2}[.][0-9]{2,3}\b
In Java it would be:
\\b\\d{2}\\.\\d{2,3}\\b

You can invoke Matcher#matches or String.matches instead of, say, Matcher#find to match the whole String.
Otherwise, you can prepend ^ and append $ to your pattern, to delimit start and end of input.
Finally, you can surround your pattern with something like \\D, or \\b or \\w to respectively match non-digits, word boundaries or whitespace around it, if you need to invoke find on an input containing more than 1 instance of the pattern.

Why does my regex containing \d{1,} together with a negative lookahead still match, where it shouldn't?

I'm trying to match a coordinate pair in a String using a Regex in Java. I explicitly want to exclude strings using negative lookahead.
to be matched:
558,228
558,228,
558,228,589
558,228,A,B,C
NOT to be matched:
558,228,<Text>
The Regex ^558,228(?!,<).* does the job, while ^\d{1,},\d{1,}(?!,<).* doesn't. It's the same regex with the metacharacter \d instead of values. Any ideas why?

The reason is the .* part at the end. It matches everything that wasn't matched earlier.
In combination with \d{1,}, which allows to match less than 3 digits, it will go like this:
^\d{1,},\d{1,}(?!,<) will match 558,22 and .* will match the remaining part 8,<Text>.

The problem is the \d{1,} part in combination with the .* at the end.
In your case
558,228,<Text>
The ^\d{1,},\d{1,}(?!,<) matches ">558,22" and the .* matches the rest "8,<Text>"
You can solve this using the possessive quanitifier ++
^\d+,\d++(?!,<)(.*)
See it here online on Regexr
\d++ is a seldom used possessive quantifier, which is here useful. ++ means match at least once as many as you can and do not backtrack. That means it will not give back the digits once it has found them.
Java Quantifier tutorial

regex for specific digit prefix

I am trying to have the following regx rule, but couldn't find solution.
I am sorry if I didn't make it clear. I want for each rule different regx. I am using Java.
rule should fail for all digit inputs start with prefix '1900' or '1901'.
(190011 - fail, 190111 - fail, 41900 - success...)
rule should success for all digit inputs with the prefix '*'
different regex for each rule (I am not looking for the combination of both of them together)

Is this RE fitting the purpose ? :
'\A(\*|(?!190[01])).*'
\A means 'the beginning of string' . I think it's the same in Java's regexes
.
EDIT
\A : "from the very beginning of the string ....". In Python (which is what I know, in fact) this can be omitted if we use the function match() that always analyzes from the very beginning, instead of search() that search everywhere in a string. If you want the regex able to analyze lines from the very beginning of each line, this must be replaced by ^
(...|...) : ".... there must be one of the two following options : ....."
\* : "...the first option is one character only, a star; ..." . As a star is special character meaning 'zero, one or more times what is before' in regex's strings, it must be escaped to strictly mean 'a star' only.
(?!190[01]) : "... the second option isn't a pattern that must be found and possibly catched but a pattern that must be absent (still after the very beginning). ...". The two characters ?! are what says 'there must not be the following characters'. The pattern not to be found is 4 integer characters long, '1900' or '1901' .
(?!.......) is a negative lookahead assertion. All kinds of assertion begins with (? : the parenthese invalidates the habitual meaning of ? , that's why all assertions are always written with parentheses.
If \* have matched, one character have been consumed. On the contrary, if the assertion is verified, the corresponding 4 first characters of the string haven't been consumed: the regex motor has gone through the analysed string until the 4th character to verify them, and then it has come back to its initial position, that is to say, presently, at the very beginning of the string.
If you want the bi-optional part (...|...) not to be a capturing group, you will write ?: just after the first paren, then '\A(?:\*|(?!190[01])).*'
.* : After the beginning pattern (one star catched/matched, or an assertion verified) the regex motor goes and catch all the characters until the end of the line. If the string has newlines and you want the regex to catch all the characters until the end of the string, and not only of a line, you will specify that . must match the newlines too (in Python it is with re.MULTILINE), or you will replace .* with (.|\r|\n)*
I finally understand that you apparently want to catch strings composed of digits characters. If so the RE must be changed to '\A(?:\*|(?!190[01]))\d*' . This RE matches with empty strings. If you want no-match with empty strings, put \d+ in place of \d* . If you want that only strings with at least one digit, even after the star when it begins with a star, match, then do '\A(?:\*|(?!190[01]))(?=\d)\d*'

For the first rule, you should use a combo regex with two captures, one to capture the 1900/1901-prefixed case, and one the capture the rest. Then you can decide whether the string should succeed or fail by examining the two captures:
(190[01]\d+)|(\d+)
Or just a simple 190[01]\d+ and negate your logic.
Regex's are not really very good at excluding something.
You may exclude a prefix using negative look-behind, but it won't work in this case because the prefix is itself a stream of digits.
You seem to be trying to exclude 1-900/901 phone numbers in the US. If the number of digits is definite, you can use a negative look-behind to exclude this prefix while matching the remaining exact number digits.
For the second rule, simply:
\*\d+

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to match a string in this way? - java

Related

Regex return true if even a substring follows the pattern

How to make Java String split greedy with lookahead?

Regex Exceptions

Why does my regex containing \d{1,} together with a negative lookahead still match, where it shouldn't?

regex for specific digit prefix

Categories

Resources