Dangling Meta Character and Regular expression Pattern for the String - java

I want to create a pattern for the following format of string. I have come with the following format but I am stuck as I am not able to scan it properly. Below are the details
Example String: JAS 5F W 123 or BWER34 23 C 23
Above String has the following rules to be followed.
The last digits can be 2 or 3 digit numbers only (123 && 023 or
23)
Before that only a single character is allowed case insensitive (W or c)
Before that only 2 digits or one digit and a character only "f"or"F" is allowed.
Starting of string can be any String alphanumeric string of any length.
All the parts are separated by space
I came up with the following String pattern but when i run my java program it gives dangling meta character.
"*\\s([0-9][fF]|[1-9][0-9])\\s([a-zA-Z])\\s(\\d\\d|\\d\\d\\d)$"
Please help me in creating the correct pattern for the above String

First of all you use a quantifier but don't quantify anything: remove the first * or add something before it. This causes the "dangling metacharacter" message.
Second \\d\\d|\\d\\d\\d could be rewritten to \\d{2,3} (two or three digits).
Finally, you can make the expression case insensitive by adding a (?i) prefix thus allowing you to write it as follows:
"(?i).*\\s(\\df|[1-9]\\d)\\s([a-z])\\s(\\d{2,3})$"
Note that I assume you want to match anything before the query and thus I added a dot before the asterisk: .*. If you use Pattern directly (i.e. not String#matches()) you don't even need that.
Before that only 2 digits or one digit and a character only "f"or"F" is allowed.
Would that allow 05 as well (those are two digits)? If so, you could rewrite that part as \\df|\\d{2}

Related

Java 8 regex: a capturing group in a pattern doesn't match, yet the whole pattern does match

This is my first question. Nice to e-meet everyone.
I have created the following regex pattern in Java 8 (this is just a simplified example of what I actually have in my code - for the sake of clarity):
(?<!a)([0-9])\,([0-9])(?!a)|(?<!b)([0-9]) ([0-9])(?!b)|(?<!c)([0-9])([0-9])(?!c)
so in general it consists of three alternatives:
1st one matches two single digits separated with a comma, for example:
1,1
2,0
4,5
2nd one matches two single digits separated with a space, for example:
1 1
2 0
4 5
3rd one matches two single digits in a row, for example:
11
20
45
Each alternative uses lookarounds and their content has to be slightly different for each one of them - that's why I couldn't just put everything together like that:
([0-9])[, ]?([0-9])
Each of the matched digits is enclosed in a capturing group and now I have a second line to 'call out' these captured numbers like this:
(?<!n)($1 $2|$3 $4|$5 $6)(?!n)
So at the end I need to match a text that would have the same digits separated with single space and not surrounded by 'n'. So if any of the examples shown above would be matched by the pattern from the 1st line, the 2nd line pattern should match these:
1 1
2 0
4 5
11 11
22 00
44 55
And not any of these:
n1 1
2,0
45
asd asd asd
The problem is the following: it returns a match even if I do not have these captured digits in the tested text, but I do have space in it... So here I do not get match and that is correct:
aaaaaaaaa
bbbbbbbbb
aasdfasdf
but here I get a match on the following things (most apparently because there is a space/spaces):
abc abc
q w r t y
as df
Does anyone know if this is normal that despite the fact that the characters in capturing groups are not captured by the 1st line, the 'non capturing group' part (so a single space) will be matched and therefore the whole pattern returns match, as if a capturing group could be a zero-length match in the second line if nothing is captured by the first line? Thanks in advance for any comment on this.
Your regex matches whitespace because the resulting pattern for the 1,1 string is (?<!n)(1 1| | )(?!n), and it can match a space that is neither preceded nor followed with a space.
When a replacement backreference does not match any string in a .replaceAll/.replaceFirst it is assigned an empty string (it is assigned null when using .find() / .matches()), and thus you still get the blank alternatives in the resulting pattern.
You may leverage this functionality AND the fact that each alternative has exactly two capturing groups by concatenating replacement backreferences in the string replacement pattern, getting rid of the alternations altogether:
SEARCH: (?<!a)([0-9]),([0-9])(?!a)|(?<!b)([0-9]) ([0-9])(?!b)|(?<!c)([0-9])([0-9])(?!c)
REPLACE: (?<!n)($1 $2|$3 $4|$5 $6)(?!n)
Note how the backreferences are concatenated: all backreferences to odd groups come first, then all backreferences to even groups are placed in a no-alternative pattern.
See the regex demo.
Note that even if the number of groups is different across the alternatives you may just add "fake" empty groups to each of them, and this approach will still work.

Java Regex with "Joker" characters

I try to have a regex validating an input field.
What i call "joker" chars are '?' and '*'.
Here is my java regex :
"^$|[^\\*\\s]{2,}|[^\\*\\s]{2,}[\\*\\?]|[^\\*\\s]{2,}[\\?]{1,}[^\\s\\*]*[\\*]{0,1}"
What I'm tying to match is :
Minimum 2 alpha-numeric characters (other than '?' and '*')
The '*' can only appears one time and at the end of the string
The '?' can appears multiple time
No WhiteSpace at all
So for example :
abcd = OK
?bcd = OK
ab?? = OK
ab*= OK
ab?* = OK
??cd = OK
*ab = NOT OK
??? = NOT OK
ab cd = NOT OK
abcd = Not OK (space at the begining)
I've made the regex a bit complicated and I'm lost can you help me?
^(?:\?*[a-zA-Z\d]\?*){2,}\*?$
Explanation:
The regex asserts that this pattern must appear twice or more:
\?*[a-zA-Z\d]\?*
which asserts that there must be one character in the class [a-zA-Z\d] with 0 to infinity questions marks on the left or right of it.
Then, the regex matches \*?, which means an 0 or 1 asterisk character, at the end of the string.
Demo
Here is an alternative regex that is faster, as revo suggested in the comments:
^(?:\?*[a-zA-Z\d]){2}[a-zA-Z\d?]*\*?$
Demo
Here you go:
^\?*\w{2,}\?*\*?(?<!\s)$
Both described at demonstrated at Regex101.
^ is a start of the String
\?* indicates any number of initial ? characters (must be escaped)
\w{2,} at least 2 alphanumeric characters
\?* continues with any number of and ? characters
\*? and optionally one last * character
(?<!\s) and the whole String must have not \s white character (using negative look-behind)
$ is an end of the String
Other way to solve this problem could be with look-ahead mechanism (?=subregex). It is zero-length (it resets regex cursor to position it was before executing subregex) so it lets regex engine do multiple tests on same text via construct
(?=condition1)
(?=condition2)
(?=...)
conditionN
Note: last condition (conditionN) is not placed in (?=...) to let regex engine move cursor after tested part (to "consume" it) and move on to testing other things after it. But to make it possible conditionN must match precisely that section which we want to "consume" (earlier conditions didn't have that limitation, they could match substrings of any length, like lets say few first characters).
So now we need to think about what are our conditions.
We want to match only alphanumeric characters, ?, * but * can appear (optionally) only at end. We can write it as ^[a-zA-Z0-9?]*[*]?$. This also handles non-whitespace characters because we didn't include them as potentially accepted characters.
Second requirement is to have "Minimum 2 alpha-numeric characters". It can be written as .*?[a-zA-Z0-9].*?[a-zA-Z0-9] or (?:.*?[a-zA-Z0-9]){2,} (if we like shorter regexes). Since that condition doesn't actually test whole text but only some part of it, we can place it in look-ahead mechanism.
Above conditions seem to cover all we wanted so we can combine them into regex which can look like:
^(?=(?:.*?[a-zA-Z0-9]){2,})[a-zA-Z0-9?]*[*]?$

What is the regex for finding a of a piece of text in a particular format

What is the regex for finding if a piece of text is in a particulate format?
Format should follow:
AAAA-123 or AAAA123 (with or without the dash)
Where the first 4 characters are letters in the range A-M and the following 3 characters are numbers with a max of 299.
Example:
ABCD-299 would match
and
ABZR-301 would not match
[A-M]{4}-?[0-2][0-9]{2}
Basically:
[A-M]{4} = 4 of any letters A-M
-? = an optional dash
[0-2] = a single 0,1, or 2
[0-9]{2} = two of any number
limiting the first number to 0-2 effectively limits your number to 299, and allows for 000-299
i'm not sure if you are searching for this in a string or checking that a string equals exactly this... and that context might change how you use the above. for example, if you are testing a string you'll want to wrap it with ^ and $
^[A-M]{4}-?[0-2][0-9]{2}$
^ means beginning of string
[] define a group of potential matches. In this case uppercase A all the way to uppercase M (hyphen is a special char when within [] to denote a range) (note the range utilizes ascii http://www.asciitable.com/ so if you did A-z it would include all those non alphanumeric characters between.
{} define count. in this case exactly 4. you can define a range like {1,3} which means 1 to 3, or {,7} at most 7, or {5,} at least 5
and the ? means that the char before is may or may not be there. in this case the hyphen.
$ means end of string
the ^ and $ are necessary i think. otherwise that regex will match AAAAAAAA-2342347474
anyways, read up on regex. they can be powerful and fun. http://regexr.com/

How to make a regular expression that matches tokens with delimiters and separators?

I want to be able to write a regular expression in java that will ensure the following pattern is matched.
<D-05-hello-87->
For the letter D, this can either my 'D' or 'E' in capital letters and only either of these letters once.
The two numbers you see must always be a 2 digit decimal number, not 1 or 3 numbers.
The string must start and end with '<' and '>' and contain '-' to seperate parts within.
The message in the middle 'hello' can be any character but must not be more than 99 characters in length. It can contain white spaces.
Also this pattern will be repeated, so the expression needs to recognise the different individual patterns within a logn string of these pattersn and ensure they follow this pattern structure. E.g
So far I have tried this:
([<](D|E)[-]([0-9]{2})[-](.*)[-]([0-9]{2})[>]\z)+
But the problem is (.*) which sees anything after it as part of any character match and ignores the rest of the pattern.
How might this be done? (Using Java reg ex syntax)
Try making it non-greedy or negation:
(<([DE])-([0-9]{2})-(.*?)-([0-9]{2})>)
Live Demo: http://ideone.com/nOi9V3
Update: tested and working
<([DE])-(\d{2})-(.{1,99}?)-(\d{2})>
See it working: http://rubular.com/r/6Ozf0SR8Cd
You should not wrap -, < and > in [ ]
Assuming that you want to stop at the first dash, you could use [^-]* instead of .*. This will match all non-dash characters.

Set minimum and maximum characters in a regular expression

I've written a regular expression that matches any number of letters with any number of single spaces between the letters. I would like that regular expression to also enforce a minimum and maximum number of characters, but I'm not sure how to do that (or if it's possible).
My regular expression is:
[A-Za-z](\s?[A-Za-z])+
I realized it was only matching two sets of letters surrounding a single space, so I modified it slightly to fix that. The original question is still the same though.
Is there a way to enforce a minimum of three characters and a maximum of 30?
Yes
Just like + means one or more you can use {3,30} to match between 3 and 30
For example [a-z]{3,30} matches between 3 and 30 lowercase alphabet letters
From the documentation of the Pattern class
X{n,m} X, at least n but not more than m times
In your case, matching 3-30 letters followed by spaces could be accomplished with:
([a-zA-Z]\s){3,30}
If you require trailing whitespace, if you don't you can use: (2-29 times letter+space, then letter)
([a-zA-Z]\s){2,29}[a-zA-Z]
If you'd like whitespaces to count as characters you need to divide that number by 2 to get
([a-zA-Z]\s){1,14}[a-zA-Z]
You can add \s? to that last one if the trailing whitespace is optional. These were all tested on RegexPlanet
If you'd like the entire string altogether to be between 3 and 30 characters you can use lookaheads adding (?=^.{3,30}$) at the beginning of the RegExp and removing the other size limitations
All that said, in all honestly I'd probably just test the String's .length property. It's more readable.
This is what you are looking for
^[a-zA-Z](\s?[a-zA-Z]){2,29}$
^ is the start of string
$ is the end of string
(\s?[a-zA-Z]){2,29} would match (\s?[a-zA-Z]) 2 to 29 times..
Actually Benjamin's answer will lead to the complete solution to the OP's question.
Using lookaheads it is possible to restrict the total number of characters AND restrict the match to a set combination of letters and (optional) single spaces.
The regex that solves the entire problem would become
(?=^.{3,30}$)^([A-Za-z][\s]?)+$
This will match AAA, A A and also fail to match AA A since there are two consecutive spaces.
I tested this at http://regexpal.com/ and it does the trick.
You should use
[a-zA-Z ]{20}
[For allowed characters]{for limiting of the number of characters}

Categories