Regular expression for UK postal codes

Regular expression for UK postal codes - java

I'm making an application that asks the user to enter a postcode and outputs the postcode if it is valid.
I found the following pattern, which works correctly:
String pattern = "^([A-PR-UWYZ](([0-9](([0-9]|[A-HJKSTUW])?)?)|([A-HK-Y][0-9]([0-9]|[ABEHMNPRVWXY])?)) [0-9][ABD-HJLNP-UW-Z]{2})";
I don't know much about regex and it would be great if someone could talk me through this statement. I mainly don't understand the ? and use of ().

Your regex has the following:
^ and $ - anchors for indicating start and end of matching input.
[A-PR-UWYZ] - Any character among A to P or R to U or W,Y,Z. Characters enclosed in square brackets form a character class, which allows any of the enclosed characters and - is for indicating a sequence of characters like [A-D] allowing A,B,C or D.
([0-9]|[A-HJKSTUW])? - An optional character any of 0-9 or characters indicated by [A-HJKSTUW]. ? makes the preceding part optional. | is for an OR. The () combines the two parts to be ORed. Here you may use [0-9A-HJKSTUW] instead of this.
[ABD-HJLNP-UW-Z]{2} - Sequence of length 2 formed by characters allowed by the character class. {2} indicates the length 2. So [ABD-HJLNP-UW-Z]{2} is equivalent to [ABD-HJLNP-UW-Z][ABD-HJLNP-UW-Z]

the ? means occurs 0 or 1 times and the brackets do grouping as you might expect, modifiers will work on groups. A regex tutorial is probably the best thing here
http://www.vogella.com/articles/JavaRegularExpressions/article.html
i had a brief look and it seems reasonable also for practice/play see this applet
http://www.cis.upenn.edu/~matuszek/General/RegexTester/regex-tester.html
simple example (ab)?
means 'ab' once or not at all

Related

Java Regex with "Joker" characters

I try to have a regex validating an input field.
What i call "joker" chars are '?' and '*'.
Here is my java regex :
"^$|[^\\*\\s]{2,}|[^\\*\\s]{2,}[\\*\\?]|[^\\*\\s]{2,}[\\?]{1,}[^\\s\\*]*[\\*]{0,1}"
What I'm tying to match is :
Minimum 2 alpha-numeric characters (other than '?' and '*')
The '*' can only appears one time and at the end of the string
The '?' can appears multiple time
No WhiteSpace at all
So for example :
abcd = OK
?bcd = OK
ab?? = OK
ab*= OK
ab?* = OK
??cd = OK
*ab = NOT OK
??? = NOT OK
ab cd = NOT OK
abcd = Not OK (space at the begining)
I've made the regex a bit complicated and I'm lost can you help me?

^(?:\?*[a-zA-Z\d]\?*){2,}\*?$
Explanation:
The regex asserts that this pattern must appear twice or more:
\?*[a-zA-Z\d]\?*
which asserts that there must be one character in the class [a-zA-Z\d] with 0 to infinity questions marks on the left or right of it.
Then, the regex matches \*?, which means an 0 or 1 asterisk character, at the end of the string.
Demo
Here is an alternative regex that is faster, as revo suggested in the comments:
^(?:\?*[a-zA-Z\d]){2}[a-zA-Z\d?]*\*?$
Demo

Here you go:
^\?*\w{2,}\?*\*?(?<!\s)$
Both described at demonstrated at Regex101.
^ is a start of the String
\?* indicates any number of initial ? characters (must be escaped)
\w{2,} at least 2 alphanumeric characters
\?* continues with any number of and ? characters
\*? and optionally one last * character
(?<!\s) and the whole String must have not \s white character (using negative look-behind)
$ is an end of the String

Other way to solve this problem could be with look-ahead mechanism (?=subregex). It is zero-length (it resets regex cursor to position it was before executing subregex) so it lets regex engine do multiple tests on same text via construct
(?=condition1)
(?=condition2)
(?=...)
conditionN
Note: last condition (conditionN) is not placed in (?=...) to let regex engine move cursor after tested part (to "consume" it) and move on to testing other things after it. But to make it possible conditionN must match precisely that section which we want to "consume" (earlier conditions didn't have that limitation, they could match substrings of any length, like lets say few first characters).
So now we need to think about what are our conditions.
We want to match only alphanumeric characters, ?, * but * can appear (optionally) only at end. We can write it as ^[a-zA-Z0-9?]*[*]?$. This also handles non-whitespace characters because we didn't include them as potentially accepted characters.
Second requirement is to have "Minimum 2 alpha-numeric characters". It can be written as .*?[a-zA-Z0-9].*?[a-zA-Z0-9] or (?:.*?[a-zA-Z0-9]){2,} (if we like shorter regexes). Since that condition doesn't actually test whole text but only some part of it, we can place it in look-ahead mechanism.
Above conditions seem to cover all we wanted so we can combine them into regex which can look like:
^(?=(?:.*?[a-zA-Z0-9]){2,})[a-zA-Z0-9?]*[*]?$

Regex not matching when the start or end are empty

Here is my regex as I have inputted it into my java file.
String myRegex = "(?<=[^a-zA-Z0-9])(target)(?=[^a-zA-Z0-9])";
If I have a string as follows:
.target. - it works.
However, if I have a string that JUST says target it does not work. How can I modify the regex so that if there is nothing at the start or the end of the string, it still matches?
EDIT - Examples.
_target - Should succeed!
target_ - Should succeed!
target - Should succeed!
Currently these examples fail with the current regex.

Add "start of input" to your look behind and add "end of input" to your look ahead using a regex alternation (ie | which is a logical "or"):
String myRegex = "(?<=^|[^a-zA-Z0-9])target(?=[^a-zA-Z0-9]|$)";
The problem with your regex is that your look behind required there to be a preceding character that was not a letter/digit.
These look arounds also match start/end of input.
See live demo.

The problem is, there are two negatives happening here. My lookbehinds are can be negative, and my character classes can be negatives. Currently, my lookbehinds are positive and my character classes are negatives. So it's saying: "Look behind and make sure you find something that is not within these classes". So when you there is nothing there, it won't find it and will fail. The solution was to make my look behind negative and make the character classes positive. So now it's saying "Look behind and sure there ISN'T any of these characters". So if it is empty, it won't fail because it meets the condition.
This is the final regex:
String myRegex = "(?<![a-zA-Z0-9])target(?![a-zA-Z0-9])";

If I'm understanding your question correctly, instead of using the look ahead and look behind, you can just use the ? to indicate that there should be 0 or 1 non-alphabetical or numerical character before and after "target".
([^a-zA-Z0-9])?(target)([^a-zA-Z0-9])?

You should be able to match target using the * 0 or more quantifier to match any target which have 0 or more occurrences of the characters you want. So:
[_]*(target)[_]*
should match:
_target
target
target_
_target_
Add any element you want to be matched before or after the word to the brackets. Example to match .target. too:
[\._]*(target)[\._]*
This will match target substring no matter what part of the string it is. If you want to make the rule only for match at the start of the string then add the ^ anchor to it like:
^[\._]*(target)[\._]*
and will match the ones mentioned above only if they start the string.

Regex expression to validate a formula

I am new to regex and currently building web application in Java.I have the following requirements to validate a formula:
Formula must start with “T”
A formula can contain the following set of characters:
Digit: 0 - 9
Alpha: A - Z
Operators: *, /, +, -
Separator: ;
An operator must always be followed by a digit
The character “T” must always be followed by a digit or an alpha.
The separator must always be followed by “T”.
The character “M” must always be followed by an operator.
I manage to build up the following expression as shown below:
^[T][A-Z0-9 -- \\+*;]*
But i don't know how to add the following validation with regex above:
An operator must always be followed by a digit
The character “T” must always be followed by a digit or an alpha.
The separator must always be followed by “T”
The character “M” must always be followed by an operator.
Valid sample: TA123;T1*2/32M+
Invalid Sample: T+qMg;Y

^(?!.*[*+/-]\\D)(?!.*T\\W)(?!.*[;:][^T])(?!.*M[^*+/-])[T][A-Z0-9 +/*;:-]*$
You can use this.See demo.
https://regex101.com/r/sS2dM8/7

We lack a bit of information to fully understand what you want. A couple examples would help.
For now, a small regexp :
^(T[A-LN-Z0-9]*M[+-/*][0-9];?)*
EDIT :
From my understanding, this should be close to what you're looking for :
^(T([A-LN-Z0-9]*M?[+\-/*]?[0-9]?)*;?)+
https://regex101.com/r/hT7aP2/1
This regexp forces the line to begin with a T, then have 0 to many [A-LN-Z0-9] range, meaning all your alphas and digits except M.
Then it needs to have a M followed by an operator in the range of [+-/*] *(pretty much +, -, / and , except that - and / are special characters so we tell the regexp that we want these characters, and not the meaning they're supposed to have).
Then it continues by one to many digits, and ends by a ";" that might or might not be there.
And everything in the parenthesis can be repeated from 0 to several times
I would have liked examples of what you want to validate... For example, we don't know if the line HAVE to end with a ";"
Depending on what you want, splitting the string you want to validate using the character ";" and validating each of the generated string with that regexp might work

Regular Expression to match one or more digits 1-9, one '|', one or more '*" and zero or more ','

I'm new to regular expressions and I need to find a regular expression that matches one or more digits [1-9] only ONE '|' sign, one or more '*' sign and zero or more ',' sign.
The string should not contain any other characters.
This is what I have:
if(this.ruleString.matches("^[1-9|*,]*$"))
{
return true;
}
Is it correct?
Thanks,
Vinay

I think you should test separately for every type of symbols rather than write complex expression.
First, test that you don't have invalid symbols - "^[0-9|*,]$"
Then test for digits "[1-9]", it should match at least one.
Then test for "\\|", "\\*" and "\\," and check the number of matches.
If all test are passed then your string is valid.

Nope, try this:
"^[1-9]+\\|\\*+,*$"

Please give us at least 10 possible matching strings of what you are looking to accept, and 10 of what you want to reject, and tell us if either this have to keep some sequence or its order doesn't matter. So we can make a reliable regex.
By now, all I can offer is:
^[1-9]+\|{1}\*+,*$
This RegEx was tested against these sample strings, accepting them:
56421|*****,,,
2|*********,,,
1|*
7|*,
18|****
123456789|*
12|********,,
1516332|**,,,
111111|*
6|*****,,,,
And it was tested against these sample strings, rejecting them:
10|*,
2***525*|*****,,,
123456,15,22*66*****4|,,,*167
1|2*3,4,5,6*
,*|173,
|*,
||12211
12
1|,*
1233|54|***,,,,
I assume your given order is strict and all conditions apply at the same time.

It looks like the pattern you need is
n-n, one or more times seperated by commas
then a bar (|)
then n*n, one or more times seperated by commas.
Here is a regular expression for that.
([1-9]{1}[0-9]*\-[0-9]+){1}
(,[1-9]{1}[0-9]*\-[0-9]+)*
\|
([1-9]{1}[0-9]*\*[0-9]+){1}
(,[1-9]{1}[0-9]*\*[0-9]+)*
But it is so complex, and does not take into account the details, such as
for the case of n-m, you want
n less than m
(I guess).
And you likely want the same number of n-m before the bar, and x*y after the bar.
Depends whether you want to check the syntax completely or not.
(I hope you do want to.)
Since this is so complex, it should be done with a set of code instead of a single regular expression.

this regex should work
"^[1-9\\|\\*,-]*$"
Assert position at the beginning of the string «^»
Match a single character present in the list below «[1-9\|*,-]»
Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «»
A character in the range between “1” and “9” «1-9»
A | character «\|»
A * character «*»
The character “,” «,»
The character “-” «-»
Assert position at the end of the string (or before the line break at the end of the string, if any) «$»

regex for specific digit prefix

I am trying to have the following regx rule, but couldn't find solution.
I am sorry if I didn't make it clear. I want for each rule different regx. I am using Java.
rule should fail for all digit inputs start with prefix '1900' or '1901'.
(190011 - fail, 190111 - fail, 41900 - success...)
rule should success for all digit inputs with the prefix '*'
different regex for each rule (I am not looking for the combination of both of them together)

Is this RE fitting the purpose ? :
'\A(\*|(?!190[01])).*'
\A means 'the beginning of string' . I think it's the same in Java's regexes
.
EDIT
\A : "from the very beginning of the string ....". In Python (which is what I know, in fact) this can be omitted if we use the function match() that always analyzes from the very beginning, instead of search() that search everywhere in a string. If you want the regex able to analyze lines from the very beginning of each line, this must be replaced by ^
(...|...) : ".... there must be one of the two following options : ....."
\* : "...the first option is one character only, a star; ..." . As a star is special character meaning 'zero, one or more times what is before' in regex's strings, it must be escaped to strictly mean 'a star' only.
(?!190[01]) : "... the second option isn't a pattern that must be found and possibly catched but a pattern that must be absent (still after the very beginning). ...". The two characters ?! are what says 'there must not be the following characters'. The pattern not to be found is 4 integer characters long, '1900' or '1901' .
(?!.......) is a negative lookahead assertion. All kinds of assertion begins with (? : the parenthese invalidates the habitual meaning of ? , that's why all assertions are always written with parentheses.
If \* have matched, one character have been consumed. On the contrary, if the assertion is verified, the corresponding 4 first characters of the string haven't been consumed: the regex motor has gone through the analysed string until the 4th character to verify them, and then it has come back to its initial position, that is to say, presently, at the very beginning of the string.
If you want the bi-optional part (...|...) not to be a capturing group, you will write ?: just after the first paren, then '\A(?:\*|(?!190[01])).*'
.* : After the beginning pattern (one star catched/matched, or an assertion verified) the regex motor goes and catch all the characters until the end of the line. If the string has newlines and you want the regex to catch all the characters until the end of the string, and not only of a line, you will specify that . must match the newlines too (in Python it is with re.MULTILINE), or you will replace .* with (.|\r|\n)*
I finally understand that you apparently want to catch strings composed of digits characters. If so the RE must be changed to '\A(?:\*|(?!190[01]))\d*' . This RE matches with empty strings. If you want no-match with empty strings, put \d+ in place of \d* . If you want that only strings with at least one digit, even after the star when it begins with a star, match, then do '\A(?:\*|(?!190[01]))(?=\d)\d*'

For the first rule, you should use a combo regex with two captures, one to capture the 1900/1901-prefixed case, and one the capture the rest. Then you can decide whether the string should succeed or fail by examining the two captures:
(190[01]\d+)|(\d+)
Or just a simple 190[01]\d+ and negate your logic.
Regex's are not really very good at excluding something.
You may exclude a prefix using negative look-behind, but it won't work in this case because the prefix is itself a stream of digits.
You seem to be trying to exclude 1-900/901 phone numbers in the US. If the number of digits is definite, you can use a negative look-behind to exclude this prefix while matching the remaining exact number digits.
For the second rule, simply:
\*\d+

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Regular expression for UK postal codes - java

Related

Java Regex with "Joker" characters

Regex not matching when the start or end are empty

Regex expression to validate a formula

Regular Expression to match one or more digits 1-9, one '|', one or more '*" and zero or more ','

regex for specific digit prefix

Categories

Resources