Regular expression for UK postcode also matches UUID

Regular expression for UK postcode also matches UUID - java

I am having problems with the following UK Postcode regex
([Gg][Ii][Rr] 0[Aa]{2})|((([A-Za-z][0-9]{1,2})|(([A-Za-z][A-Ha-hJ-Yj-y][0-9]{1,2})|(([A-Za-z][0-9][A-Za-z])|([A-Za-z][A-Ha-hJ-Yj-y][0-9][A-Za-z]?))))\s?[0-9][A-Za-z]{2})
It works for UK postcodes as intended e.g.
AB11AB
However, it also seems to match UUIDs as well e.g.
c25d4f64-2336-4a5d-b94c-14dc12xxxa58
Is there anyway to ignore UUIDs from the regular expression ?
Please find example here
https://regex101.com/r/dI6gD9/19

Option 1
Maybe, we would just add start and end anchors and fail the UUIDs, and change the capturing groups to non, if that'd be OK:
^(?:[Gg][Ii][Rr]\s+0[Aa]{2})|(?:(?:([A-Za-z][0-9]{1,2})|(?:(?:[A-Za-z][A-Ha-hJ-Yj-y][0-9]{1,2})|(?:(?:[A-Za-z][0-9][A-Za-z])|(?:[A-Za-z][A-Ha-hJ-Yj-y][0-9][A-Za-z]?))))\s*[0-9][A-Za-z]{2})$
The expression can be most likely simplified (e.g., non-capturing groups), I have also added extra spaces, just in case.
DEMO 1
Option 2
Another option would be to add word boundaries, then it would become almost improbable that it would match a UUID in our data, that I'm guessing, and we can also add an i flag:
(?i)(?:\bgir\b\s+\b0a{2}\b)|\b(?:[a-z][0-9]{1,2}|[a-z][a-hj-y][0-9]{1,2}|[a-z][0-9][a-z]|[a-z][a-hj-y][0-9][a-z]?)\s*[0-9][a-z]{2}\b
DEMO 2
Test
import java.util.regex.Matcher;
import java.util.regex.Pattern;
final String regex = "^(?:[Gg][Ii][Rr]\\s+0[Aa]{2})|(?:(?:([A-Za-z][0-9]{1,2})|(?:(?:[A-Za-z][A-Ha-hJ-Yj-y][0-9]{1,2})|(?:(?:[A-Za-z][0-9][A-Za-z])|(?:[A-Za-z][A-Ha-hJ-Yj-y][0-9][A-Za-z]?))))\\s*[0-9][A-Za-z]{2})$";
final String string = "c25d4f64-2336-4a5d-b94c-14dc12xxxa58\n"
+ "AB11AB";
final Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
final Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
System.out.println("Full match: " + matcher.group(0));
for (int i = 1; i <= matcher.groupCount(); i++) {
System.out.println("Group " + i + ": " + matcher.group(i));
}
}
The expression is explained on the top right panel of regex101.com, if you wish to explore/simplify/modify it, and in this link, you can watch how it would match against some sample inputs, if you like.
RegEx Circuit
jex.im visualizes regular expressions:

Your regex is fine, you just need to match it with the start and end of the string. Just append a ^ to the start and a $ to the end of the pattern.
^([Gg][Ii][Rr] 0[Aa]{2})|((([A-Za-z][0-9]{1,2})|(([A-Za-z][A-Ha-hJ-Yj-y][0-9]{1,2})|(([A-Za-z][0-9][A-Za-z])|([A-Za-z][A-Ha-hJ-Yj-y][0-9][A-Za-z]?))))\s?[0-9][A-Za-z]{2})$
https://regex101.com/r/jwLqLx/1

You are using the correct regex, that is issued by the UK government.
Below i added examples of how to use it:
Match full string:
When matching to a full string don't use the global flag, because then it will find the occurrences within a string, rather than testing a string to fully match the regex.
So don't use the global and multi-line flags
Notice the gm part in
/your_regex/gm
Try it in this example on regex101.com, where I have already disabled the global and multi-line flag for you.
Match in log file:
For log files, add the word identifier around your regex
Notice the \b parts in
/\byour_regex\b/gm
Try it in this example which shows this behaviour in an example log file.

Related

can deal with the first line space when i use regex for polynomials

here is my code
String a = "X^5+2X^2+3X^3+4X^4";
String exp[]=a.split("(|\\+\\d)[xX]\\^");
for(int i=0;i<exp.length;i++) {
System.out.println("exp: "+exp[i]+" ");
}
im try to find the output which is 5,2,3,4
but instead i got this answer
exp:
exp:5
exp:2
exp:3
exp:4
i dont know where is the first line space come from, and i cannot find a will to get rid of that, i try to use others regex for this and also use compile,still can get rid of the first line, i try to use new string "X+X^5+2X^2+3X^3+4X^4";the first line shows exp:X.
and i also use online regex compiler to try my problem, but their answer is 5,2,3,4, buy eclipse give a space ,and then 5,2,3,4 ,need a help to figure this out

Try to use regex, e.g:
String input = "X^5+2X^2+3X^3+4X^4";
Pattern pattern = Pattern.compile("\\^([0-9]+)");
Matcher matcher = pattern.matcher(input);
for (int i = 1; matcher.find(); i++) {
System.out.println("exp: " + matcher.group(1));
}
It gives output:
exp: 5
exp: 2
exp: 3
exp: 4
How does it work:
Pattern used: \^([0-9]+)
Which matches any strings starting with ^ followed by 1 or more digits (note the + sign). Dash (^) is prefixed with backslash (\) because it has a special meaning in regular expressions - beginning of a string - but in Your case You just want an exact match of a ^ character.
We want to wrap our matches in a groups to refer to them late during matching process. It means we need to mark them using parenthesis ( and ).
Then we want to pu our pattern into Java String. In String literal, \character has a special meaning - it is used as a control character, eg "\n" represents a new line. It means that if we put our pattern into String literal, we need to escape a \ so our pattern becomes: "\\^([0-9]+)". Note double \.
Next we iterate through all matches getting group 1 which is our number match. Note that a ^.character is not covered in our match even if it is a part of our pattern. It is so because wr used parenthesis to mark our searched group, which in our case are only digits

Because you are using the split method which looks for the occurrence of the regex and, well.. splits the string at this position. Your string starts with X^ so it very much matches your regex.

Regex in java to extract specific pattern

I want to match the pattern (including the square brackets, equals, quotes)
[fixedtext="sometext"]
What would be a correct regex expression?
Anything can occur inside quotes. 'fixedtext' is fixed.

Your basic solution (although I'd be skeptical of this, per the comments) is essentially:
"\\[fixedtext=\\\"(.*)\\\"\\]"
which resolves to:
"\[fixedtext=\"(.*)\"\]"
Simple escaping of [] and quotes. The (.*) says capture everything in quotes as a capture group (matcher.group(1)).
But if you had a string of, for example '[fixedtext="abc\"]def"]' you'd get the an answer of abc\ instead of abc\"]def.
If you know the ending bracket ends the line, then use:
"\\[fixedtext=\\\"(.*)\\\"\\]$"
(add the $ at the end to mark end of line) and that should be fairly reliable.

My suggestion is using named-capturing groups.
You can find more details here:
https://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html
Here's an example for your input:
String input = "[fixedtext=\"sometext\"]";
Pattern pattern = Pattern.compile("\\[(?<field>.*)=\"(?<value>.*)\"]");
Matcher matcher = pattern.matcher(input);
if (matcher.matches()) {
System.out.println(matcher.group("field"));
System.out.println(matcher.group("value"));
} else {
System.err.println(input + " doesn't match " + pattern);
}

Java Regex : How to return the whole word if the words ends with a specific string

Using Pattern/Matcher, I'm trying to find a regex in Java for searching in a text for table names that end with _DBF or _REP or _TABLE or _TBL and return the whole table names.
These tables names may contain one or more underscores _ in between the table name.
For example I'd like to retrieve table names like :
abc_def_DBF
fff_aaa_aaa_dbf
AAA_REP
123_frfg_244_gegw_TABLE
etc
Could someone please propose a regex for this ?
Or would it be easier to read text line by line and use String's method endsWith() instead ?
Many thanks in advance,
GK

Regex pattern
You could use a simple regex like this:
\b(\w+(?:_DBF|_REP|_TABLE|_TBL))\b
Working demo
Java code
For java you could use a code like below:
String text = "HERE THE TEXT YOU WANT TO PARSE";
String patternStr = "\\b(\\w+(?:_DBF|_REP|_TABLE|_TBL))\\b";
Pattern pattern = Pattern.compile(patternStr, Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher(text);
while(matcher.find()) {
System.out.println("found: " + matcher.group(1));
}
This is the match information:
MATCH 1
1. [0-11] `abc_def_DBF`
MATCH 2
1. [28-43] `fff_aaa_aaa_dbf`
MATCH 3
1. [45-52] `AAA_REP`
MATCH 4
1. [54-77] `123_frfg_244_gegw_TABLE`
Regex pattern explanation
If you aren't familiar with regex to understand how this pattern works the idea of this regex is:
\b --> use word boundaries to avoid having anything like $%&abc
(\w+ --> table name can contain alphanumeric and underscore characters (\w is a shortcut for [A-Za-z_])
(?:_DBF|_REP|_TABLE|_TBL)) --> must finish with any of these combinations
\b --> word boundaries again

Try this:
System.out.println("blah".matches(".*[_DBF|_REP|_TABLE|_TBL]$"));
System.out.println("blah_TBL".matches(".*[_DBF|_REP|_TABLE|_TBL]$"));
System.out.println("blah_TBL1".matches(".*[_DBF|_REP|_TABLE|_TBL]$"));

This regexp should work to match the whole word:
\w+_([Dd][Bb][Ff]|REP|TABLE)
Here is is:
This regexp should work to match the keywords:
_(DBF)|(REP)|(TABLE)
The _ is matched, followed by either DBF or REP or TABLE.
It is unclear to me if you wish to match _dbf (lower case). If so simply change DBF to [Dd][Bb][Ff]:
_([Dd][Bb][Ff])|(REP)|(TABLE)
If you wish to match any more keywords just add another |(abc) group.
Of course this method works only if you know that these "keywords" will appear only once, and only at the end of the string. If you have 123_frfg_TABLE_244_gegw_TABLE for example you will match both.
Below is a screenshot of regexpal in action:

A simple alternative might be this regex ".*(_DBF|_REP|_TABLE|_TBL)$" which means any string that ends in _DBF or _REP or _TABLE or _TBL.
PS: Specify the regex to be caseless

Java Regex Capturing Groups

I am trying to understand this code block. In the first one, what is it we are looking for in the expression?
My understanding is that it is any character (0 or more times *) followed by any number between 0 and 9 (one or more times +) followed by any character (0 or more times *).
When this is executed the result is:
Found value: This order was placed for QT3000! OK?
Found value: This order was placed for QT300
Found value: 0
Could someone please go through this with me?
What is the advantage of using Capturing groups?
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegexTut3 {
public static void main(String args[]) {
String line = "This order was placed for QT3000! OK?";
String pattern = "(.*)(\\d+)(.*)";
// Create a Pattern object
Pattern r = Pattern.compile(pattern);
// Now create matcher object.
Matcher m = r.matcher(line);
if (m.find()) {
System.out.println("Found value: " + m.group(0));
System.out.println("Found value: " + m.group(1));
System.out.println("Found value: " + m.group(2));
} else {
System.out.println("NO MATCH");
}
}
}

The issue you're having is with the type of quantifier. You're using a greedy quantifier in your first group (index 1 - index 0 represents the whole Pattern), which means it'll match as much as it can (and since it's any character, it'll match as many characters as there are in order to fulfill the condition for the next groups).
In short, your 1st group .* matches anything as long as the next group \\d+ can match something (in this case, the last digit).
As per the 3rd group, it will match anything after the last digit.
If you change it to a reluctant quantifier in your 1st group, you'll get the result I suppose you are expecting, that is, the 3000 part.
Note the question mark in the 1st group.
String line = "This order was placed for QT3000! OK?";
Pattern pattern = Pattern.compile("(.*?)(\\d+)(.*)");
Matcher matcher = pattern.matcher(line);
while (matcher.find()) {
System.out.println("group 1: " + matcher.group(1));
System.out.println("group 2: " + matcher.group(2));
System.out.println("group 3: " + matcher.group(3));
}
Output:
group 1: This order was placed for QT
group 2: 3000
group 3: ! OK?
More info on Java Pattern here.
Finally, the capturing groups are delimited by round brackets, and provide a very useful way to use back-references (amongst other things), once your Pattern is matched to the input.
In Java 6 groups can only be referenced by their order (beware of nested groups and the subtlety of ordering).
In Java 7 it's much easier, as you can use named groups.

This is totally OK.
The first group (m.group(0)) always captures the whole area that is covered by your regular expression. In this case, it's the whole string.
Regular expressions are greedy by default, meaning that the first group captures as much as possible without violating the regex. The (.*)(\\d+) (the first part of your regex) covers the ...QT300 int the first group and the 0 in the second.
You can quickly fix this by making the first group non-greedy: change (.*) to (.*?).
For more info on greedy vs. lazy, check this site.

Your understanding is correct. However, if we walk through:
(.*) will swallow the whole string;
it will need to give back characters so that (\\d+) is satistifed (which is why 0 is captured, and not 3000);
the last (.*) will then capture the rest.
I am not sure what the original intent of the author was, however.

From the doc :
Capturing groups</a> are indexed from left
* to right, starting at one. Group zero denotes the entire pattern, so
* the expression m.group(0) is equivalent to m.group().
So capture group 0 send the whole line.

Java repetitive pattern matching (3)

I am trying to solve a simple Java regex matching problem but still getting conflicting results (following up on this and that question).
More specifically, I am trying to match a repetitive text input, consisting of groups that are delimited by '|' (vertical bar) that may be directly preceded by underscore ('_'), especially if the groups are not empty (i.e., if no two consecutive | delimiters appear in the input).
An example such input is:
Text group 1_|Text group 2_|||Text group 5_|||Text group 8
In addition, I need a way to verify that a match has occurred, in order to avoid applying the processing related to that input to other, totally different inputs that my application also processes, using different regular expressions.
To confirm that a regex works, I am using RegexPal.
After several tests, the closest to what I want are the following two Regular Expressions, suggested in the questions I quoted above:
1. (?:\||^)([^\\|]*)
2. \G([^\|]+?)_?\||\G()\||\G([^\|]*)$
Using either of these, if I run a matcher.find() loop I get:
All the text groups, with the underscore included in the end, from Regex 1
All the text groups apart from the last, with no underscore but 2 empty groups in the end, from Regex 2.
So, apparently Regex 2 is not correct (and RegexPal also does not show it as matching).
I could use Regex 1 and do some post-processing to remove the trailing underscore, although ideally I would like the regex to do that for me.
However, none of the two aforementioned regular expressions returns true for matcher.matches(), whereas matcher.find() is always true even for totally irrelevant input (reasonable, since there will often be at least 1 matching group, even in other text).
I thus have two questions:
Is there a correct (fully working) regex that excludes the trailing underscore?
Is there any way of checking that only the correct regex has matched?
The code used to test Regex 1, is something like
String input = "Text group 1_|Text group 2_|||Text group 5_|||Text group 8";
Matcher matcher = Pattern.compile("(?:\\||^)([^\\\\|]*)").matcher(input);
if (matcher.matches())
{
System.out.println("Input MATCHED: " + input);
while (matcher.find())
{
System.out.println("\t\t" + matcher.group(1));
}
}
else
{
System.out.println("\tInput NOT MATCHED: " + input);
}
Using the above code always results in "NOT MATCHED". Removing the if/else and only using matcher.find() does retrieve all text groups.

Matcher#matches method attempts to match the entire input sequence against the pattern, that is why you are getting the result Input NOT MATCHED. See the documentation here http://download.oracle.com/javase/1.4.2/docs/api/java/util/regex/Matcher.html#matches
If you want to exclude the trailing underscore you can use this regex (slight modification of what you already have)
(?:\\||^)([^\\\\|_]*)
This would work if you are sure that _ comes just before |.

RegexPal is a JavaScript regex tool. The Java and JavaScript regular expression languages differ. Consider using a Java Regex tool; perhaps this one
This may be close to what you want: (?:([^_\|]+)_{0,1}+\|*)+
Edit: Code added.
In java 6 this prints each group (the find() loop).
public static void main(String[] args)
{
String input = "Text group 1_|Text group 2_|||Text group 5_|||Text group 8";
Matcher matcher;
Pattern pattern = Pattern.compile("(?:([^_\\|]+)_{0,1}+\\|*)+");
Pattern groupPattern = Pattern.compile("(?:([^_\\|]+)_{0,1}+\\|*)");
matcher = pattern.matcher(input);
if (matcher.matches())
{
Matcher groupMatcher;
System.out.println("matcher.matches() is true");
int groupCount = matcher.groupCount();
for (int index = 1; index <= groupCount; ++index)
{
System.out.print("group (pattern)[");
System.out.print(index);
System.out.print("]: ");
System.out.println(matcher.group(index));
}
groupMatcher = groupPattern.matcher(input);
while (groupMatcher.find())
{
System.out.print("group (groupPattern):");
System.out.println(groupMatcher.group());
System.out.println(groupMatcher.group(1));
}
}
else
{
System.out.println("No match");
}
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Regular expression for UK postcode also matches UUID - java

Related

can deal with the first line space when i use regex for polynomials

Regex in java to extract specific pattern

Java Regex : How to return the whole word if the words ends with a specific string

Java Regex Capturing Groups

Java repetitive pattern matching (3)

Categories

Resources