Negate a regex backreference, without using lookarounds - java

So another question led me to wondering if there is a way to negate a regex backreference without using lookarounds? The original post is here, but the approved answer is using lookarounds. In places those are not supported, is there a clean way to handle negative backreferences? I have found on a regex site [^\x] where x is the backref. number, does not work as intended.
For example:
Find a number followed directly by any other number (but dynamic in the number).
It would make sense to have (\d)[^\1], but inside a character class, everything is taken literally.

is a way to negate a regex backreference without using lookarounds?
No.
In theory, how are the lookarounds being generated by the regex engine?
Lookarounds are not "generated" with other simpler regex constructs, they are a distinct feature of their own.

Related

I want to Capture a alphanumeric group without underscore

I want to Capture an alphanumeric group in regex such that it does not capture starting underscore. For example _reverse(abc) should return reverse(. I am using (?<name>\w+) but it return _reverse(.
You can try this,
[^a-zA-Z0-9()\\s+]
The output will be reverse(abc)
You can specify characters explicitly, e.g.:
[a-zA-Z0-9]+
From what you are showing, I assume you want to strip underscores and content behind the opening parentheses.
Basically, that should work with a regex like this:
"_([a-zA-Z0-9]+\()"
this can be used in conjunction with a Matcher to extract all capturing groups (in this case, [a-zA-Z0-9]+\() and return them.
Note that you can find almost all the help you need with Regular Expressions on utility sites like RegEx 101 and RegEx Per, the latter being a nice visualizer but only working with javaScript-like expressions.
Also, RegEx 101 contains a Regex Debugger to help avoid dangerous regular expressions

Regex - Disallow certain characters to appear consecutively

I'm not sure if this is possible or not:
Writing a program to convert an infix notation to postfix notation. All is working well so far but trying to implement validation is proving difficult.
I'm trying to use a regex to validate an infix notation, conforming to the following rules:
String must only start with a number or ( (program does not allow negative numbers)
String must only end with number or )
String must only contain 0-9*/()-+
String must not allow following characters to appear together +*/-
I have a regex which conforms to the first 3 rules:
(^[0-9(])([0-9+()*]+)([0-9)]+$)
Is it possible to use regex to implement the last rule?
I will answer only to fourth rule as you have problem only with it.
Yes, there is a possibility, but I think regex is not appropriate tool to check that...
This pattern ^(?(?=.*\+)(?!.*[\*\/-])).+$ will match any string that contain + and not contain other characters: /,*,-. For one character is already lengthy and hard to read. See demo.
It uses conditional expression (?...) to check if lookahead checking for + was successfull, if it is, then negative lookahead assures that you won't have any of \*- characters.
For all characters, the regex will become very big and hard to maintain.
That's why I don't recommend it for this task.
I agree with MichaƂ Turczyn that regex is not the task for this, but not with the reason. It is easy to implement your restrictions. However, your restrictions also allow expressions like (0+3, 2(*4), ((((1, and other things you likely don't want - so regex validation is kind of pointless. If you were writing this with a regex engine with some significant power like PCRE (Perl, PHP) or Onigmo (Ruby), you can fake a parser in regex; but in Java, regex is quite restricted in what it can do. It is enough for the requirements in the question, though:
^[0-9(](?:(?![+*/-][+*/-])[0-9+*/()-])*[0-9)]$
starts with digit or paren
any number of repetitions of any allowed character, such that that character and the next character aren't both operators
ends with digit or thesis.

regular expressions: "at least one of the captures must match"

Consider the following regular expression:
/^(A....)?(B..)?(C...)?$/
Is there a way to limit the regex in the way that "at least one of the captures must match"?
Actually I need this for java regular expressions. I could imagine, that it's not possible. Maybe with the more powerful perl regex machine?
Of course, I could post-process it, but maybe there is another way..
Just make sure that something matches with a look-ahead
/^(?=.)(A....)?(B..)?(C...)?$/
You can use this pattern:
^(A....)?(B..)?(C...)?(?<=.)$
The zero-width assertion at the end ensures that there is at least one character in the overall match, which will not be the case if none of the groups matches anything.
RegEx match pattern:
/apples|bananas|carrots/
Source text:
This allows apples or bananas or carrots to appear anywhere in the source text, confirming that the source contains at least of of the choices.
See https://regex101.com/r/gZ1cW5/1
But, as I commented, it is not clear what you really want. It's not clear if you need to know which one was matched.
You should always post example source text to be evaluated.

How to bound +/* for a regex group?

Say I have the regex:
(CC|NP)*
As such it creates problems in look-before regexes in Java. How shall I write it to avoid those problem?
I thought of re-writing it as:
(CC|NP){1,9}
Testing on regexr it seems like the upperbound is ignored completely.
In Java those quantitiers {} seem to work only on non-group regex elements as in:
\w+\[\S{1,9}\]
Sorry, look behind patterns usually have restrictions on the sub pattern. See f.x. Why doesn't finite repetition in lookbehind work in some flavors?p. Or search for "lookbehind pattern restrictions" on the web.
You may try to write down all fixed length variants of the look behind pattern as alternating pattern. But this might be many...
You may also simulate lookbehind by normally matching the inner pattern and match and group your actual target: (?:CC|NP)*(.*)
I'm not sure of where you percieve the problem. Quantifiers act on groups just like any entity.
So, \w+\[\S{1,9}\] could have been written \w+\[(\S){1,9}\] with the same result.
As far as your example on regexr, nothing is broken there. It matches what it's supposed to.
(PUN|CC|NP){1,3} will greedily try to match any of the alternations (in left-to-right priority). There will be no breaks in what it will match. It matches 1-3 consecutive occurances of PUN or CC or NP.
The sample string you provided had a space between CC's, so since a space does not exist in the regex, it is not matched. The only thing that is matching is a single CC.
If you want to account for a space, it can be added to the grouping like this:
(?:(?:PUN|CC|NP)\s*){1,3}
If you want to only allow spaces between the alternation's, it can be done like this:
(?:PUN|CC|NP)(?:\s*(?:PUN|CC|NP)){0,2}

Simple regex required

I've never used regexes in my life and by jove it looks like a deep pool to dive into. Anyway,
I need a regex for this pattern (AN is alphanumeric (a-z or 0-9), N is numeric (0-9) and A is alphabetic (a-z)):
AN,AN,AN,AN,AN,N,N,N,N,N,N,AN,AN,AN,A,A
That's five AN's, followed by six N's, followed by three AN's, followed finally by two A's.
If it makes a difference, the language I'm using is Java.
[a-z0-9]{5}[0-9]{6}[a-z0-9]{3}[a-z]{2}
should work in most RE dialects for the tasks as you specified it -- most of them will also support abbreviations such as \d (digit) in lieu of [0-9] (but if alphabetics need to be lowercase, as you appear to be requesting, you'll probably need to spell out the a-z parts).
Replace each AN by [a-z0-9], each N by [0-9], and each A by [a-z].
30 seconds in Expresso:
[a-zA-Z0-9]{5}[0-9]{6}[a-zA-Z0-9]{3}[0-9]{2}
Case insensitive, but you can probably define that in Java instead of the regex.
For the example you posted, the following should work fine.
(([A-Za-z\d])*,){5}+(([\d])*,){6}+(([A-Za-z\d])*,){3}+([\d])*,[\d]*
In Java you should be able use it like this:
boolean foundMatch = subjectString.matches("(([A-Za-z\\d])*,){5}+(([\\d])*,){6}+(([A-Za-z\\d])*,){3}+([\\d])*,[\\d]*");
I used, this tool to help in learning RegEx, it also make this really easy.
http://www.regexbuddy.com/
Try looking at some simple java regex tutorials such as this
They'll tell you how you form regular expressions and also how to use it in java.
This should match the pattern you request.
[a-z0-9]{5}[0-9]{6}[a-z0-9]{3}[a-z]{2}
In addition, you could add Beginning of String / End of String matches, if your string match should fail if any other chars are in it:
^[a-z0-9]{5}[0-9]{6}[a-z0-9]{3}[a-z]{2}$

Categories