I would like a Java regular expression string that finds all vowels in a string unless they:
are the first character or
the next character following an underscore
AREA_ID becomes
AR_ID
LONG_NAME becomes
LNG_NM
HOME_ALONE becomes
HM_ALN
I have played around with http://gskinner.com/RegExr and
I currently have the following regex that replaces all vowels except if it is the starting character
(?!^[AEIOU])[AEIOU]
I can't figure out how to get the second part (ignore vowel immediately following an underscore).
I'm guessing you're using JavaScript, in which case this will do:
(?!(?:^|_))_?[AEIOU]
However, if you're using a regex flavour that supports lookbehinds, try this:
(?<!^)(?<!_)[AEIOU]
Note that two lookbehinds are needed because a lookbehind must have a fixed length, which "either the start of the string or an underscore" does not.
Related
I was just practicing regex and found something intriguing
for a string
"world9 a9$ b6$" my regular expression "^(?=.*[\\d])(?=\\S+\\$).{2,}$"
will return false as there is a space in between before the look ahead finds the $ sign with at least one digit and non space character.
As a whole the string doesn't matches the pattern.
What should be the regular expression if I want to return true even if a substring follows a pattern?
as in this one a9$ and b6$ both follow the regular expression.
You can use
^(?=\D*\d)(?=.*\S\$).{2,}$
See the regex demo. As The fourth bird mentions, since \S\$ matches two chars, you may simply move the pattern to the consuming part, and use ^(?=\D*\d).*\S\$.*$, see this regex demo.
Details
^ - start of string (implicit if used in .matches())
(?=\D*\d) - a positive lookahead that requires zero or more non-digit chars followed with a digit char immediately to the right of the current location
(?=.*\S\$) - a positive lookahead that requires zero or more chars other than line break chars, as many as possible, followed with a non-whitespace char and a $ char immediately to the right of the current location
.{2,} - any two or more chars other than line break chars, as many as possible
$ - end of string (implicit if used in .matches())
Mostly, knock out the ^ and $ bits, as those force this into a full string match, and you want substring matches. In general, look-ahead seems like a mistake here, what are you trying to accomplish by using that? (Look-ahead/look-behind is rarely needed in general). All you need is:
Pattern.compile("\\S+\\$");
possibly, if you want an element (such as a9$) to stand on its own, use \b which is regexpese for word break: Basically, whitespace (and a few other characters, such as underscores. Most non-letter, non-digits characters are considered a break. Think [^a-zA-Z0-9]) - but \b also matches start/end of input. Thus:
Pattern.compile("\\b\\S+\\$\\b")
still matches foo a9$ bar, or a9$ just fine.
If you MUST put this in terms of a full match, e.g. because matches() (which always does a full string match) is run and you can't change that, well, put ^.* in front and .*$ at the back of it, simple as that.
Absolutely nothing about this says "This can only be needed with lookahead".
I'm trying to create a regular expression to parse a 5 digit number out of a string no matter where it is but I can't seem to figure out how to get the beginning and end cases.
I've used the pattern as follows \\d{5} but this will grab a subset of a larger number...however when I try to do something like \\D\\d{5}\\D it doesn't work for the end cases. I would appreciate any help here! Thanks!
For a few examples (55555 is what should be extracted):
At the beginning of the string
"55555blahblahblah123456677788"
In the middle of the string
"2345blahblah:55555blahblah"
At the end of the string
"1234567890blahblahblah55555"
Since you are using a language that supports them use negative lookarounds:
"(?<!\\d)\\d{5}(?!\\d)"
These will assert that your \\d{5} is neither preceded nor followed by a digit. Whether that is due to the edge of the string or a non-digit character does not matter.
Note that these assertions themselves are zero-width matches. So those characters will not actually be included in the match. That is why they are called lookbehind and lookahead. They just check what is there, without actually making it part of the match. This is another disadvantage of using \\D, which would include the non-digit character in your match (or require you to use capturing groups).
Can anyone tell me what "?=" does when using regex?
Here is an example of the code fragment I am trying to decipher:
password.matches("(?=.*\\d.*\\d.*)^[\\w]{8}.*$");
Thanks.
It's a positive lookahead. In that particular expression, it is saying that your password must have at least two digits (\d).
Also note that a lookahead doesn't consume input, it is merely an assertion.
For example, in your regex, the lookahead part ((?=.*\\d.*\\d.*)) asserts that your password contains at least two digits, and the rest of the expression consumes the entire string, and tries to match at least 8 word characters (i.e., [a-zA-Z_0-9]) at the beginning of the string.
It's a lookahead: A zero-width match that checks to see if the position is followed by the given expression.
http://www.regular-expressions.info/lookaround.html
In your scenario, you are looking for a string that:
begins with a string containing two digits (enforced by the lookahead)
begins with 8 word characters (matched by the rest of the regex)
The lookahead is not actually part of the match. It behaves much like a word boundary (\b) or beginning of string (^).
I have the following requirements for validating an input field:
It should only contain alphabets and spaces between the alphabets.
It cannot contain spaces at the beginning or end of the string.
It cannot contain any other special character.
I am using following regex for this:
^(?!\s*$)[-a-zA-Z ]*$
But this is allowing spaces at the beginning. Any help is appreciated.
For me the only logical way to do this is:
^\p{L}+(?: \p{L}+)*$
At the start of the string there must be at least one letter. (I replaced your [a-zA-Z] by the Unicode code property for letters \p{L}). Then there can be a space followed by at least one letter, this part can be repeated.
\p{L}: any kind of letter from any language. See regular-expressions.info
The problem in your expression ^(?!\s*$) is, that lookahead will fail, if there is only whitespace till the end of the string. If you want to disallow leading whitespace, just remove the end of string anchor inside the lookahead ==> ^(?!\s)[-a-zA-Z ]*$. But this still allows the string to end with whitespace. To avoid this look back at the end of the string ^(?!\s)[-a-zA-Z ]*(?<!\s)$. But I think for this task a look around is not needed.
This should work if you use it with String.matches method. I assume you want English alphabet.
"[a-zA-Z]+(\\s+[a-zA-Z]+)*"
Note that \s will allow all kinds of whitespace characters. In Java, it would be equivalent to
[ \t\n\x0B\f\r]
Which includes horizontal tab (09), line feed (10), carriage return (13), form feed (12), backspace (08), space (32).
If you want to specifically allow only space (32):
"[a-zA-Z]+( +[a-zA-Z]+)*"
You can further optimize the regex above by making the capturing group ( +[a-zA-Z]+) non-capturing (with String.matches you are not going to be able to get the words individually anyway). It is also possible to change the quantifiers to make them possessive, since there is no point in backtracking here.
"[a-zA-Z]++(?: ++[a-zA-Z]++)*+"
Try this:
^(((?<!^)\s(?!$)|[-a-zA-Z])*)$
This expression uses negative lookahead and negative lookbehind to disallow spaces at the beginning or at the end of the string, and requiring the match of the entire string.
I think the problem is there's a ? before the negation of white spaces, which means it is optional
This should work:
[a-zA-Z]{1}([a-zA-Z\s]*[a-zA-Z]{1})?
at least one sequence of letters, then optional string with spaces but always ends with letters
I don't know if words in your accepted string can be seperated by more then one space. If they can:
^[a-zA-Z]+(( )+[a-zA-z]+)*$
If can't:
^[a-zA-Z]+( [a-zA-z]+)*$
String must start with letter (or few letters), not space.
String can contain few words, but every word beside first must have space before it.
Hope I helped.
With regular expressions, how can I test just the last characters of the given string for a match?
I want to check if something ends in any of the following:
vowel+consonant ("like 'ur' in devour")
vowel+'nt' ("paint")
vowel+'y' ("play")
and some others that are similar.
How can I do this with regular expressions (in Java)?
edit:
How would I use regular expressions to find out if a verb ends in the pattern
consonant-'e'
or various other combinations like
'ss' 'x' 'sh' 'ch' (sibilants)
in order to properly conjugate them in English as verbs.
I think this is the expression that you want. The first bit checks for the vowel and the second looks for any consonant or 'nt'
[aeiou]([^aeiou\W\d]|nt)$
I've checked it on http://regexpal.com/ which is my usual tester. The [^aeiou\W\d] means 'any that isn't a vowel, is alpha-numeric but isn't a number'. It could just be replace by all the consonants, I suppose:
[aeiou]([bcdfghjklmnpqrstvwxyz]|nt)$
Note that this ignores any possibility of any characters other than those listed. It also tests lower case but I'm unsure how to do case insensitive regex in Java.
you need a regular expression like this
^.*[aeiou]([^aeiou]|nt)$
which zero or more chars, followed by one of a,e,i,o and u, followed by one char that isn't a vowel or by exactly nt
As was pointed out in the comments, the [^aeiou] doesn't perform as intended unless you assume only alpha chars are used