regular expression to allow only 1 dash - java

I have a textbox where I get the last name of a user. How do I allow only one dash (-) in a regular expression? And it's not supposed to be in the beginning or at the end of the string.
I have this code:
Pattern p = Pattern.compile("[^a-z-']", Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher(name);

Try to rephrase the question in more regexy terms. Rather than "allow only one dash, and it can't be at the beginning" you could say, "the string's beginning, followed by at least one non-dash, followed by one dash, followed by at least one non-dash, followed by the string's end."
the string's beginning: `^
at least one non-dash: [^-]+
followed by one dash: -
followed by at least one non-dash: [^-]+
followed by the string's end: $
Put those all together, and there you go. If you're using this in a context that matches against the complete string (not just any substring within it), you don't need the anchors -- though it may be good to put them in anyway, in case you later use that regex in a substring-matching context and forget to add them back in.

Why not just use indexOf() in String?
String s = "last-name";
int first = s.indexOf('-');
int last = s.lastIndexOf('-');
if(first == 0 || last == s.length()-1) // Checks if a dash is at the beginning or end
System.out.println("BAD");
if(first != last) // Checks if there is more than one dash
System.out.println("BAD");
It is slower than using regex but with usually small size of last names it should not be noticeable in the least bit. Also, it will make debugging and future maintenance MUCH easier.

It looks like your regex represents a fragment of an invalid value, and you're presumably using Matcher.find() to find if any part of your value matches that regex. Is that correct? If so, you can change your pattern to:
Pattern p = Pattern.compile("[^a-zA-Z'-]|-.*-|^-|-$");
which will match a non-letter-non-hyphen-non-apostrophe character, or a sequence of characters that both starts and ends with hyphens (thereby detecting a value that contains two hyphens), or a leading hyphen, or a trailing hyphen.

This regex represents one or more non-hyphens, followed by a single hyphen, followed by one or more non-hyphens.
^[^\-]+\-[^\-]+$
I'm not sure if the hyphen in the middle needs to be escaped with a backslash... That probably depends on what platform you're using for regex.

Try pattern something like [a-z]-[a-z].
Pattern p = Pattern.compile("[a-z]-[a-z]");

Related

Regex return true if even a substring follows the pattern

I was just practicing regex and found something intriguing
for a string
"world9 a9$ b6$" my regular expression "^(?=.*[\\d])(?=\\S+\\$).{2,}$"
will return false as there is a space in between before the look ahead finds the $ sign with at least one digit and non space character.
As a whole the string doesn't matches the pattern.
What should be the regular expression if I want to return true even if a substring follows a pattern?
as in this one a9$ and b6$ both follow the regular expression.
You can use
^(?=\D*\d)(?=.*\S\$).{2,}$
See the regex demo. As The fourth bird mentions, since \S\$ matches two chars, you may simply move the pattern to the consuming part, and use ^(?=\D*\d).*\S\$.*$, see this regex demo.
Details
^ - start of string (implicit if used in .matches())
(?=\D*\d) - a positive lookahead that requires zero or more non-digit chars followed with a digit char immediately to the right of the current location
(?=.*\S\$) - a positive lookahead that requires zero or more chars other than line break chars, as many as possible, followed with a non-whitespace char and a $ char immediately to the right of the current location
.{2,} - any two or more chars other than line break chars, as many as possible
$ - end of string (implicit if used in .matches())
Mostly, knock out the ^ and $ bits, as those force this into a full string match, and you want substring matches. In general, look-ahead seems like a mistake here, what are you trying to accomplish by using that? (Look-ahead/look-behind is rarely needed in general). All you need is:
Pattern.compile("\\S+\\$");
possibly, if you want an element (such as a9$) to stand on its own, use \b which is regexpese for word break: Basically, whitespace (and a few other characters, such as underscores. Most non-letter, non-digits characters are considered a break. Think [^a-zA-Z0-9]) - but \b also matches start/end of input. Thus:
Pattern.compile("\\b\\S+\\$\\b")
still matches foo a9$ bar, or a9$ just fine.
If you MUST put this in terms of a full match, e.g. because matches() (which always does a full string match) is run and you can't change that, well, put ^.* in front and .*$ at the back of it, simple as that.
Absolutely nothing about this says "This can only be needed with lookahead".

Java Regex with "Joker" characters

I try to have a regex validating an input field.
What i call "joker" chars are '?' and '*'.
Here is my java regex :
"^$|[^\\*\\s]{2,}|[^\\*\\s]{2,}[\\*\\?]|[^\\*\\s]{2,}[\\?]{1,}[^\\s\\*]*[\\*]{0,1}"
What I'm tying to match is :
Minimum 2 alpha-numeric characters (other than '?' and '*')
The '*' can only appears one time and at the end of the string
The '?' can appears multiple time
No WhiteSpace at all
So for example :
abcd = OK
?bcd = OK
ab?? = OK
ab*= OK
ab?* = OK
??cd = OK
*ab = NOT OK
??? = NOT OK
ab cd = NOT OK
abcd = Not OK (space at the begining)
I've made the regex a bit complicated and I'm lost can you help me?
^(?:\?*[a-zA-Z\d]\?*){2,}\*?$
Explanation:
The regex asserts that this pattern must appear twice or more:
\?*[a-zA-Z\d]\?*
which asserts that there must be one character in the class [a-zA-Z\d] with 0 to infinity questions marks on the left or right of it.
Then, the regex matches \*?, which means an 0 or 1 asterisk character, at the end of the string.
Demo
Here is an alternative regex that is faster, as revo suggested in the comments:
^(?:\?*[a-zA-Z\d]){2}[a-zA-Z\d?]*\*?$
Demo
Here you go:
^\?*\w{2,}\?*\*?(?<!\s)$
Both described at demonstrated at Regex101.
^ is a start of the String
\?* indicates any number of initial ? characters (must be escaped)
\w{2,} at least 2 alphanumeric characters
\?* continues with any number of and ? characters
\*? and optionally one last * character
(?<!\s) and the whole String must have not \s white character (using negative look-behind)
$ is an end of the String
Other way to solve this problem could be with look-ahead mechanism (?=subregex). It is zero-length (it resets regex cursor to position it was before executing subregex) so it lets regex engine do multiple tests on same text via construct
(?=condition1)
(?=condition2)
(?=...)
conditionN
Note: last condition (conditionN) is not placed in (?=...) to let regex engine move cursor after tested part (to "consume" it) and move on to testing other things after it. But to make it possible conditionN must match precisely that section which we want to "consume" (earlier conditions didn't have that limitation, they could match substrings of any length, like lets say few first characters).
So now we need to think about what are our conditions.
We want to match only alphanumeric characters, ?, * but * can appear (optionally) only at end. We can write it as ^[a-zA-Z0-9?]*[*]?$. This also handles non-whitespace characters because we didn't include them as potentially accepted characters.
Second requirement is to have "Minimum 2 alpha-numeric characters". It can be written as .*?[a-zA-Z0-9].*?[a-zA-Z0-9] or (?:.*?[a-zA-Z0-9]){2,} (if we like shorter regexes). Since that condition doesn't actually test whole text but only some part of it, we can place it in look-ahead mechanism.
Above conditions seem to cover all we wanted so we can combine them into regex which can look like:
^(?=(?:.*?[a-zA-Z0-9]){2,})[a-zA-Z0-9?]*[*]?$

Regex for multiple instances of character

In Java, using a regular expression, how would I check a string to see if it had a correct amount of instances of a character.
For example take the string hello.world.hello:world:. How could this string be checked to see if it contained two instances of a . or two instances of a :?
I have tried
Pattern p = Pattern.compile("[:]{2}");
Matcher m = p.matcher(hello.world.hello:world:);
m.find();
but that failed.
Edit
First I would like to say thank you for all the answers. I noticed a lot of the answers said something along the lines of "This means: zero or more non-colons, followed by a single colon, followed by zero or more non-colons - matched exactly twice". So if you were checking for 3 : in a string such as Hello::World: how would you do it?
Well, using matches you could use:
"([^:]*:[^:]*){2}"
This means: "zero or more non-colons, followed by a single colon, followed by zero or more non-colons - matched exactly twice".
Using find is not as good, as there may be additional : and it will just ignore them.
You can use this regex based on two lookaheads assertions:
^(?=(?:[^.]*\.){2}[^.]*$)(?=(?:[^:]*:){2}[^:]*$)
(?=(?:[^.]*\.){2}[^.]*$) makes sure there are exactly 2 DOTS and (?=(?:[^:]*:){2}[^:]*$) asserts that there are exactly 2 colons in input string.
RegEx Demo
You can determine whether the string has exectly the given number of a certain character, say ':', by attempting to match it against a pattern of this form:
^(?:[^:]*[:]){2}[^:]*$
That says exactly two non-capturing groups consisting of any number (including zero) of characters other than ':' followed by one colon, with the second group followed by any number of additional characters other than ':'.

Wierd behaviour on regexp Matcher

My regexp below is supposed to filter out capital words with a length of 8-10, where 0-2 numbers may appear. It has been working for all of my tests, but for some reason it got stuck on the string below. And n.group(0) only contains an empty string instead of the matched "word".
static final Pattern PATTERN =
Pattern.compile("\\b(?=[A-Z\\d]{9,10}\\b)(?:[A-Z]*\\d){0,2}[A-Z]*\\b");
Matcher n = LONG_PASSWORD.matcher("foo ID:636152727 bar");
while (n.find()) {
String s = n.group(0);
resultArrayList.add(s);
}
Why does my pattern match ID:636152727?
Some examples that I want to filter out (which is working):
AAAAAAAAAA
1AAAAAAAAA
1AAAAAAAA1
etc...
I don't have a better solution to offer than the one in Ωmega's answer, but I think I can explain what's happening. What it boils down to is that the first \b and the last \b are matching the same spot: right after the colon.
That's the first place where the lookahead can match, since it's followed by nine digits and a word boundary. Then the next part of the regex tries to match two digits (interspersed with any number of uppercase letters) followed by a word boundary, and fails. So it tries to match just one digit (ditto), and fails again. Then it tries matching zero digits (interspersed with zero letters), and it succeeds, without advancing the match position. That position is still a word boundary, so the final \b succeeds as well.
A word boundary is just another zero-width assertion, like lookaheads and lookbehinds. There's no reason why two or more can't be applied at the same spot; you did that on purpose with the first word boundary and the lookahead. Some regex flavors treat it as an error if you apply a quantifier to an assertion (like \b+), but I don't think any of them would catch this problem. This is one of those rare instances where separate start-of-word and end-of-word assertions, like GNU's \< and \> or TCL's \y and \Y, would make a difference.
You need to use anchors ^ and $ »
Pattern.compile("^(?=[A-Z\\d]{9,10}$)(?:[A-Z]*\\d){0,2}[A-Z]*$");
Use this pattern:
"(?:^|(?<=\\s))(?=[A-Z\\d]{9,10}(?:\\s|$))(?:[A-Z]*\\d){0,2}[A-Z]*(?=\\s|$)"

Is this Regex incorrect? No matches found

I'm trying to parse through a string formatted like this, except with more values:
Key1=value,Key2=value,Key3=value,Key4=value,Key5=value,Key6=value,Key7=value
The Regex
((Key1)=(.*)),((Key2)=(.*)),((Key3)=(.*)),((Key4)=(.*)),((Key5)=(.*)),((Key6)=(.*)),((Key7)=(.*))
In the actual string, there are about double the amount of key/values, but I'm keeping it short for brevity. I have them in parentheses so I can call them in groups. The keys I have stored as Constants, and they will always be the same. The problem is, it never finds a match which doesn't make sense (unless the Regex is wrong)
Judging by your comment above, it sounds like you're creating the Pattern and Matcher objects and associating the Matcher with the target string, but you aren't actually applying the regex. That's a very common mistake. Here's the full sequence:
String regex = "Key1=(.*),Key2=(.*)"; // etc.
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(targetString);
// Now you have to apply the regex:
if (m.find())
{
String value1 = m.group(1);
String value2 = m.group(2);
// etc.
}
Not only do you have to call find() or matches() (or lookingAt(), but nobody ever uses that one), you should always call it in an if or while statement--that is, you should make sure the regex actually worked before you call any methods like group() that require the Matcher to be in a "matched" state.
Also notice the absence of most of your parentheses. They weren't necessary, and leaving them out makes it easier to (1) read the regex and (2) keep track of the group numbers.
Looks like you'd do better to do:
String[] pairs = data.split(",");
Then parse the key/value pairs one at a time
Your regex is working for me...
If you are always getting an IllegalStateException, I would say that you are trying to do something like:
matcher.group(1);
without having invoked the find() method.
You need to call that method before any attempt to fetch a group (or you will be in an illegal state to call the group() method)
Give this a try:
String test = "Key1=value,Key2=value,Key3=value,Key4=value,Key5=value,Key6=value,Key7=value";
Pattern pattern = Pattern.compile("((Key1)=(.*)),((Key2)=(.*)),((Key3)=(.*)),((Key4)=(.*)),((Key5)=(.*)),((Key6)=(.*)),((Key7)=(.*))");
Matcher matcher = pattern.matcher(test);
matcher.find();
System.out.println(matcher.group(1));
It's not wrong per se, but it requires a lot of backtracking which might cause the regular expression engine to bail. I would try a split as suggested elsewhere, but if you really need to use a regular expression, try making it non-greedy.
((Key1)=(.*?)),((Key2)=(.*?)),((Key3)=(.*?)),((Key4)=(.*?)),((Key5)=(.*?)),((Key6)=(.*?)),((Key7)=(.*?))
To understand why it requires so much backtracking, understand that for
Key1=(.*),Key2=(.*)
applied to
Key1=x,Key2=y
Java's regular expression engine matches the first (.*) to x,Key2=y and then tries stripping characters off the right until it can get a match for the rest of the regular expression: ,Key2=(.*). It effectively ends up asking,
Does "" match ,Key2=(.*), no so try
Does "y" match ,Key2=(.*), no so try
Does "=y" match ,Key2=(.*), no so try
Does "2=y" match ,Key2=(.*), no so try
Does "y2=y" match ,Key2=(.*), no so try
Does "ey2=y" match ,Key2=(.*), no so try
Does "Key2=y" match ,Key2=(.*), no so try
Does ",Key2=y" match ,Key2=(.*), yes so the first .* is "x" and the second is "y".
EDIT:
In Java, the non-greedy qualifier changes things so that it starts off trying to match nothing and then building from there.
Does "x,Key2=(.*)" match ,Key2=(.*), no so try
Does ",Key2=(.*)" match ,Key2=(.*), yes.
So when you've got 7 keys it doesn't need to unmatch 6 of them which involves unmatching 5 which involves unmatching 4, .... It can do it's job in one forward pass over the input.
I'm not going to say that there's no regex that will work for this, but it's most likely more complicated to write (and more importantly, read, for the next person that has to deal with the code) than it's worth. The closest I'm able to get with a regex is if you append a terminal comma to the string you're matching, i.e, instead of:
"Key1=value1,Key2=value2"
you would append a comma so it's:
"Key1=value1,Key2=value2,"
Then, the regex that got me the closest is: "(?:(\\w+?)=(\\S+?),)?+"...but this doesn't quite work if the values have commas, though.
You can try to continue tweaking that regex from there, but the problem I found is that there's a conflict in the behavior between greedy and reluctant quantifiers. You'd have to specify a capturing group for the value that is greedy with respect to commas up to the last comma prior to an non-capturing group comprised of word characters followed by the equal sign (the next value)...and this last non-capturing group would have to be optional in case you're matching the last value in the sequence, and maybe itself reluctant. Complicated.
Instead, my advice is just to split the string on "=". You can get away with this because presumably the values aren't allowed to contain the equal sign character.
Now you'll have a bunch of substrings, each of which that is a bunch of characters that comprise a value, the last comma in the string, followed by a key. You can easily find the last comma in each substring using String.lastIndexOf(',').
Treat the first and last substrings specially (because the first one does not have a prepended value and the last one has no appended key) and you should be in business.
If you know you always have 7, the hack-of-least resistance is
^Key1=(.+),Key2=(.+),Key3=(.+),Key4=(.+),Key5=(.+),Key6=(.+),Key7=(.+)$
Try it out at http://www.fileformat.info/tool/regex.htm
I'm pretty sure that there is a better way to parse this thing down that goes through .find() rather than .matches() which I think I would recommend as it allows you to move down the string one key=value pair at a time. It moves you into the whole "greedy" evaluation discussion.
Some people, when confronted with a problem, think "I know, I'll use
regular expressions." Now they have two problems. - Jamie Zawinski
The simplest solution is the most robust.
final String data = "Key1=value,Key2=value,Key3=value,Key4=value,Key5=value,Key6=value,Key7=value";
final String[] pairs = data.split(",");
for (final String pair: pairs)
{
final String[] keyValue = pair.split("=");
final String key = keyValue[0];
final String value = keyValue[1];
}

Categories