I am new to regex and currently building web application in Java.I have the following requirements to validate a formula:
Formula must start with “T”
A formula can contain the following set of characters:
Digit: 0 - 9
Alpha: A - Z
Operators: *, /, +, -
Separator: ;
An operator must always be followed by a digit
The character “T” must always be followed by a digit or an alpha.
The separator must always be followed by “T”.
The character “M” must always be followed by an operator.
I manage to build up the following expression as shown below:
^[T][A-Z0-9 -- \\+*;]*
But i don't know how to add the following validation with regex above:
An operator must always be followed by a digit
The character “T” must always be followed by a digit or an alpha.
The separator must always be followed by “T”
The character “M” must always be followed by an operator.
Valid sample: TA123;T1*2/32M+
Invalid Sample: T+qMg;Y
^(?!.*[*+/-]\\D)(?!.*T\\W)(?!.*[;:][^T])(?!.*M[^*+/-])[T][A-Z0-9 +/*;:-]*$
You can use this.See demo.
https://regex101.com/r/sS2dM8/7
We lack a bit of information to fully understand what you want. A couple examples would help.
For now, a small regexp :
^(T[A-LN-Z0-9]*M[+-/*][0-9];?)*
EDIT :
From my understanding, this should be close to what you're looking for :
^(T([A-LN-Z0-9]*M?[+\-/*]?[0-9]?)*;?)+
https://regex101.com/r/hT7aP2/1
This regexp forces the line to begin with a T, then have 0 to many [A-LN-Z0-9] range, meaning all your alphas and digits except M.
Then it needs to have a M followed by an operator in the range of [+-/*] *(pretty much +, -, / and , except that - and / are special characters so we tell the regexp that we want these characters, and not the meaning they're supposed to have).
Then it continues by one to many digits, and ends by a ";" that might or might not be there.
And everything in the parenthesis can be repeated from 0 to several times
I would have liked examples of what you want to validate... For example, we don't know if the line HAVE to end with a ";"
Depending on what you want, splitting the string you want to validate using the character ";" and validating each of the generated string with that regexp might work
Related
Regular expression mentioned below is used to validate a user input in Java.
username.matches("^\\p{L}+[\\p{L}\\p{Z}.']+")
The regular expression is working for more than one character input, but fails for single character input.
As '+' denotes one and more than one characters, I confused how to support one character input as valid input.
That's because both parts in your regex are requiring at least one character each (see the + almost at the end of the regex). If you want that part to be optional, it should be * instead.
The regex you have will match 2 or more symbols. The reason is, this is symbol one (or more):
\\p{L}+
And this is symbol 2 (or more):
[\\p{L}\\p{Z}.']+
Most likely you want the last part to be "0 or more", like this:
"^\\p{L}+[\\p{L}\\p{Z}.']*"
Your regex requires a minimum of 2 characters.
"^\p{L}+" - minimum of 1
"[\p{L}\p{Z}.']+" - minimum of 1
The "+" does denote one or more characters.
I want to be able to write a regular expression in java that will ensure the following pattern is matched.
<D-05-hello-87->
For the letter D, this can either my 'D' or 'E' in capital letters and only either of these letters once.
The two numbers you see must always be a 2 digit decimal number, not 1 or 3 numbers.
The string must start and end with '<' and '>' and contain '-' to seperate parts within.
The message in the middle 'hello' can be any character but must not be more than 99 characters in length. It can contain white spaces.
Also this pattern will be repeated, so the expression needs to recognise the different individual patterns within a logn string of these pattersn and ensure they follow this pattern structure. E.g
So far I have tried this:
([<](D|E)[-]([0-9]{2})[-](.*)[-]([0-9]{2})[>]\z)+
But the problem is (.*) which sees anything after it as part of any character match and ignores the rest of the pattern.
How might this be done? (Using Java reg ex syntax)
Try making it non-greedy or negation:
(<([DE])-([0-9]{2})-(.*?)-([0-9]{2})>)
Live Demo: http://ideone.com/nOi9V3
Update: tested and working
<([DE])-(\d{2})-(.{1,99}?)-(\d{2})>
See it working: http://rubular.com/r/6Ozf0SR8Cd
You should not wrap -, < and > in [ ]
Assuming that you want to stop at the first dash, you could use [^-]* instead of .*. This will match all non-dash characters.
I am trying to isolate the words, brackets and => and <=> from the following input:
(<=>A B) OR (C AND D) AND(A AND C)
So far I've come to isolating just the words (see Scanner#useDelimeter()):
sc.useDelimeter("[^a-zA-Z]");
Upon using :
sc.useDelimeter("[\\s+a-zA-Z]");
I get the output just the brackets.
which I don't want but want AND ).
How do I do that? Doing \\s+ gives the same result.
Also, how is a delimiter different from regex? I'm familiar with regex in PHP. Is the notation used the same?
Output I want:
(
<=>
A
(and so on)
You need a delimitimg regex that can be zero width (because you have adjacent terms), so look-arounds are the only option. Try this:
sc.useDelimeter("((?<=[()>])\\s*)|(\\s*\\b\\s*)");
This regex says "after a bracket or greater-than or at a word boundary, discarding spaces"
Also note that the character class [\\s+a-zA-Z] includes the + character - most characters lose any special regex meaning when inside a character class. It seems you were trying to say "one or more spaces", but that's not how you do that.
Inside [] the ^ means 'not', so the first regex, [^a-zA-Z], says 'give me everything that's not a-z or A-Z'
The second regex, [\\s+a-zA-Z], says 'give me everything that is space, +, a-z or A-Z'. Note that "+" is a literal plus sign when in a character class.
I'm making an application that asks the user to enter a postcode and outputs the postcode if it is valid.
I found the following pattern, which works correctly:
String pattern = "^([A-PR-UWYZ](([0-9](([0-9]|[A-HJKSTUW])?)?)|([A-HK-Y][0-9]([0-9]|[ABEHMNPRVWXY])?)) [0-9][ABD-HJLNP-UW-Z]{2})";
I don't know much about regex and it would be great if someone could talk me through this statement. I mainly don't understand the ? and use of ().
Your regex has the following:
^ and $ - anchors for indicating start and end of matching input.
[A-PR-UWYZ] - Any character among A to P or R to U or W,Y,Z. Characters enclosed in square brackets form a character class, which allows any of the enclosed characters and - is for indicating a sequence of characters like [A-D] allowing A,B,C or D.
([0-9]|[A-HJKSTUW])? - An optional character any of 0-9 or characters indicated by [A-HJKSTUW]. ? makes the preceding part optional. | is for an OR. The () combines the two parts to be ORed. Here you may use [0-9A-HJKSTUW] instead of this.
[ABD-HJLNP-UW-Z]{2} - Sequence of length 2 formed by characters allowed by the character class. {2} indicates the length 2. So [ABD-HJLNP-UW-Z]{2} is equivalent to [ABD-HJLNP-UW-Z][ABD-HJLNP-UW-Z]
the ? means occurs 0 or 1 times and the brackets do grouping as you might expect, modifiers will work on groups. A regex tutorial is probably the best thing here
http://www.vogella.com/articles/JavaRegularExpressions/article.html
i had a brief look and it seems reasonable also for practice/play see this applet
http://www.cis.upenn.edu/~matuszek/General/RegexTester/regex-tester.html
simple example (ab)?
means 'ab' once or not at all
I am trying to have the following regx rule, but couldn't find solution.
I am sorry if I didn't make it clear. I want for each rule different regx. I am using Java.
rule should fail for all digit inputs start with prefix '1900' or '1901'.
(190011 - fail, 190111 - fail, 41900 - success...)
rule should success for all digit inputs with the prefix '*'
different regex for each rule (I am not looking for the combination of both of them together)
Is this RE fitting the purpose ? :
'\A(\*|(?!190[01])).*'
\A means 'the beginning of string' . I think it's the same in Java's regexes
.
EDIT
\A : "from the very beginning of the string ....". In Python (which is what I know, in fact) this can be omitted if we use the function match() that always analyzes from the very beginning, instead of search() that search everywhere in a string. If you want the regex able to analyze lines from the very beginning of each line, this must be replaced by ^
(...|...) : ".... there must be one of the two following options : ....."
\* : "...the first option is one character only, a star; ..." . As a star is special character meaning 'zero, one or more times what is before' in regex's strings, it must be escaped to strictly mean 'a star' only.
(?!190[01]) : "... the second option isn't a pattern that must be found and possibly catched but a pattern that must be absent (still after the very beginning). ...". The two characters ?! are what says 'there must not be the following characters'. The pattern not to be found is 4 integer characters long, '1900' or '1901' .
(?!.......) is a negative lookahead assertion. All kinds of assertion begins with (? : the parenthese invalidates the habitual meaning of ? , that's why all assertions are always written with parentheses.
If \* have matched, one character have been consumed. On the contrary, if the assertion is verified, the corresponding 4 first characters of the string haven't been consumed: the regex motor has gone through the analysed string until the 4th character to verify them, and then it has come back to its initial position, that is to say, presently, at the very beginning of the string.
If you want the bi-optional part (...|...) not to be a capturing group, you will write ?: just after the first paren, then '\A(?:\*|(?!190[01])).*'
.* : After the beginning pattern (one star catched/matched, or an assertion verified) the regex motor goes and catch all the characters until the end of the line. If the string has newlines and you want the regex to catch all the characters until the end of the string, and not only of a line, you will specify that . must match the newlines too (in Python it is with re.MULTILINE), or you will replace .* with (.|\r|\n)*
I finally understand that you apparently want to catch strings composed of digits characters. If so the RE must be changed to '\A(?:\*|(?!190[01]))\d*' . This RE matches with empty strings. If you want no-match with empty strings, put \d+ in place of \d* . If you want that only strings with at least one digit, even after the star when it begins with a star, match, then do '\A(?:\*|(?!190[01]))(?=\d)\d*'
For the first rule, you should use a combo regex with two captures, one to capture the 1900/1901-prefixed case, and one the capture the rest. Then you can decide whether the string should succeed or fail by examining the two captures:
(190[01]\d+)|(\d+)
Or just a simple 190[01]\d+ and negate your logic.
Regex's are not really very good at excluding something.
You may exclude a prefix using negative look-behind, but it won't work in this case because the prefix is itself a stream of digits.
You seem to be trying to exclude 1-900/901 phone numbers in the US. If the number of digits is definite, you can use a negative look-behind to exclude this prefix while matching the remaining exact number digits.
For the second rule, simply:
\*\d+