Regex for a Person Name - java

I wanted to make a TextField validation for a person name. I wanted to have person name to be valid if:
it contains only english alphabets a-z, A-Z
it may contain multiple space characters
it may contain any no. of dot characters for the names like A.B. Devilliers
What I have tried?
NAME_REGEX="(\\w|\\s|(\\.))+"
Note: I am working in Java.
When I have NAME_REGEX="(\\w|\\s)+" then rules 1. and 2. are followed but I also want the 3. rule.

It is very hard to specify what should be allowed in a name, so trying to tweak too much is probably counter-productive. If anything, I would err on the side or allowing more things.
It looks to me like you want something like this:
(?i)^(?:[a-z]+(?: |\. ?)?)+[a-z]$
On the demo, see which names are allowed.
I am assuming you want to start and end with a letter
You have not specified that any letter must specifically be in upper case, so this will accept aLan parsoN
If you want to allow quotes or other chars, let me know.
How it works:
(?i) puts us in case-insensitive mode
^ asserts that we are at the beginning of the string
(?:[a-z]+(?: |\. ?)?)+ matches {any letters, optionally followed by a space or dot or dot-space}, once or more
[a-z]$ ensures that the last character is a letter

Try This regex for java:::
NameRegex = ^^[a-z A-Z\\.\\s]+$

Related

Regex format for a particular Match

I am trying to write a regex for the following format
PA-123456-067_TY
It's always PA, followed by a dash, 6 digits, another dash, then 3 digits, and ends with _TY
Apparently, when I write this regex to match the above format it shows the output correctly
^[^[PA]-]+-(([^-]+)-([^_]+))_([^.]+)
with all the Negation symbols ^
This does not work if I write the regex in the below format without negation symbols
[[PA]-]+-(([-]+)-([_]+))_([.]+)
Can someone explain to me why is this so?
The negation symbol means that the character cannot be anything within the specified class. Your regex is much more complicated than it needs to be and is therefore obfuscating what you really want.
You probably want something like this:
^PA-(\d+)-(\d+)_TY$
... which matches anything that starts with PA-, then includes two groups of numbers separated by a dash, then an underscore and the letters TY. If you want everything after the PA to be what you capture, but separated into the three groups, then it's a little more abstract:
^PA-(.+)-(.+)_(.+)$
This matches:
PA-
a capture group of any characters
a dash
another capture group of any characters
an underscore
all the remaining characters until end-of-line
Character classes [...] are saying match any single character in the list, so your first capture group (([^-]+)-([^_]+)) is looking for anything that isn't a dash any number of times followed by a dash (which is fine) followed by anything that isn't an underscore (again fine). Having the extra set of parentheses around that creates another capture group (probably group 1 as it's the first parentheses reached by the regex engine)... that part is OK but probably makes interpreting the answer less intuitive in this case.
In the re-write however, your first capture group (([-]+)-([_]+)) matches [-]+, which means "one or more dashes" followed by a dash, followed by any number of underscores followed by an underscore. Since your input does not have a dash immediately following PA-, the entire regex fails to find anything.
Putting the PA inside embedded character classes is also making things complicated. The first part of your first one is looking for, well, I'm not actually sure how [^[PA]-]+ is interpreted in practice but I suspect it's something like "not either a P or an A or a dash any number of times". The second one is looking for the opposite, I think. But you don't want any of that, you just want to start without anything other than the actual sequence of characters you care about, which is just PA-.
Update: As per the clarifications in the comments on the original question, knowing you want fixed-size groups of digits, it would look like this:
^PA-(\d{6})-(\d{3})_TY$
That captures PA-, then a 6-digit number, then a dash, then a 3-digit number, then _TY. The six digit number and 3 digit numbers will be in capture groups 1 and 2, respectively.
If the sizes of those numbers could ever change, then replace {x} with + to just capture numbers regardless of max length.
according to your comment this would be appropriate PA-\d{6}-\d{3}_TY
EDIT: if you want to match a line use it with anchors: ^PA-\d{6}-\d{3}_TY$

How to negate an inner character in a glob pattern

The question
java nio negate a glob pattern
asked how to make a glob pattern matching strings that do not start with a given character, say "a". The accepted answer
"[!a]*"
does work for starting characters and also for ending characters,
"*[!a]"
However, it does not work for positions in between. For example
"*[!.]*"
does not filter out file names with a dot somewhere inside the file name. (While, of course,
"*.*"
does filter out file names without a dot.) How can I do inner character negation?
It works perfectly fine in the middle of matcher. The thing to realize is that foo.bar DOES match *[!.]*
To show that this is a match:
Let the first star match foo.b. This is allowed since it can match any string of any length.
The next character is not a period, so [!.] matches a
Let the second star match the remainder, r
This is the complete input, and therefore foo.bar matches *[!.]*.
The pattern matches "any string that contains a character that is not a period". You instead wanted "any string that does not contain any periods anywhere".
In regex, this is the difference between ^.*[^.].*$ and ^([^.])*$.
Unfortunately, globs are not powerful enough to express what you want.

Name validation with special conditions using regex

I want to validate the Name in Java that will allow following special characters for single time {,-.'}. I am able to achieve with the Expression that will allow user to enter only such special characters in a string. But I am not able to figure it out how to add restrictions where users cannot add these characters more then one time. I tried to achieve it using quantifiers but remain unsuccessful. I have done the following code yet!
Pattern validator = Pattern.compile("^[a-zA-Z+\\.+\\-+\\'+\\,]+$");
You can use lookahead assertion in your regex:
Pattern validator = Pattern.compile(
"^(?!(?:.*?\\.){2})(?!(?:.*?'){2})(?!(?:.*?,){2})(?!(?:.*?-){2})[a-zA-Z .',-]+$");
(?!(?:.*?[.',-]){2}) is a negative lookahead that means don't allow more than 1 of those characters in character class.
RegEx Demo
I think that you can just take into account names where such characters would only happen once. Names like "Jonathan's", "Thoms-Damm", "Thoms,Jon", "jonathan.thoms". In practice for names, I don't think that such special characters would occur at the edges of the string. As such, you can probably get away with a regex like:
Pattern validator = Pattern.compile("^[a-zA-Z]+(?:[-',\.][a-zA-Z]+)?$");
This regex should match a regular ASCII name followed optionally by a single "special" character with another name after it.

How to add wildcards in lookahead?

I'd like to identify certain values in the following string, specially the values inside CVC and Number:
CreditCard Number="123" CVC="213" Date="2015-12"
(?<=CVC=\").*(?=") matches 213" Date="2015-12. How can I modify the regex to look for the first doublequote match after something was found, and not to look for the last doublequote as it does now?
Further: how can I define wildcards in lookaheads? Ideally I'd like to have an expression:
(?<=CreditCard.*CVC=\").*(?=") which means that a CVC statement must be preceded with "CreditCard" String, but between them there could by any values.
You can simply make the .* not greedy .*?
(?<=CVC=\").*?(?=")
RegExr
In answer to your 2nd question, java regex (and most other engines) don't allow variable length lookbehinds. Usually though, you can solve a problem that would require a variable length lookbehind by using capture groups:
(?<=CreditCard.*CVC=\").*?(?=")
becomes:
CreditCard.*?CVC=\"(.*?)"
And then you can take the relevant information from capture group 1.
RegExr (.* added on RegExr so that output replaces the entire input, its not required for your case though.)
You could skip using lookbehinds, and instead use clustering to pull out just the portions of the string you want:
CreditCard Number="(/d*)".*\sCVC="(/d*)"
And then the "match groups" numbered 1 and 2 will correspond to your credit card number and CVC, respectively. (You can use Matcher.group(int) to retrieve the values of the various groups) Notice that by using \d to specifically match digits, you don't have to make the * non-greedy. In this case it works because you only want to match on digits. In the general case (let's say a credit card number could consist of any non-quote character), you can use a custom character class to match anything but your delimiter (quote in this case):
CreditCard Number="([^"]*)".*\sCVC="([^"]*)"

Java - Unknown characters passing as [a-zA-z0-9]*?

I'm no expert in regex but I need to parse some input I have no control over, and make sure I filter away any strings that don't have A-z and/or 0-9.
When I run this,
Pattern p = Pattern.compile("^[a-zA-Z0-9]*$"); //fixed typo
if(!p.matcher(gottenData).matches())
System.out.println(someData); //someData contains gottenData
certain spaces + an unknown symbol somehow slip through the filter (gottenData is the red rectangle):
In case you're wondering, it DOES also display Text, it's not all like that.
For now, I don't mind the [?] as long as it also contains some string along with it.
Please help.
[EDIT] as far as I can tell from the (very large) input, the [?]'s are either white spaces either nothing at all; maybe there's some sort of encoding issue, also perhaps something to do with #text nodes (input is xml)
The * quantifier matches "zero or more", which means it will match a string that does not contain any of the characters in your class. Try the + quantifier, which means "One or more": ^[a-zA-Z0-9]+$ will match strings made up of alphanumeric characters only. ^.*[a-zA-Z0-9]+.*$ will match any string containing one or more alphanumeric characters, although the leading .* will make it much slower. If you use Matcher.lookingAt() instead of Matcher.matches, it will not require a full string match and you can use the regex [a-zA-Z0-9]+.
You have an error in your regex: instead of [a-zA-z0-9]* it should be [a-zA-Z0-9]*.
You don't need ^ and $ around the regex.
Matcher.matches() always matches the complete string.
String gottenData = "a ";
Pattern p = Pattern.compile("[a-zA-z0-9]*");
if (!p.matcher(gottenData).matches())
System.out.println("doesn't match.");
this prints "doesn't match."
The correct answer is a combination of the above answers. First I imagine your intended character match is [a-zA-Z0-9]. Note that A-z isn't as bad as you might think it include all characters in the ASCII range between A and z, which is the letters plus a few extra (specifically [,\,],^,_,`).
A second potential problem as Martin mentioned is you may need to put in the start and end qualifiers, if you want the string to only consists of letters and numbers.
Finally you use the * operator which means 0 or more, therefore you can match 0 characters and matches will return true, so effectively your pattern will match any input. What you need is the + quantifier. So I will submit the pattern you are most likely looking for is:
^[a-zA-Z0-9]+$
You have to change the regexp to "^[a-zA-Z0-9]*$" to ensure that you are matching the entire string
Looks like it should be "a-zA-Z0-9", not "a-zA-z0-9", try correcting that...
Did anyone consider adding space to the regex [a-zA-Z0-9 ]*. this should match any normal text with chars, number and spaces. If you want quotes and other special chars add them to the regex too.
You can quickly test your regex at http://www.regexplanet.com/simple/
You can check input value is contained string and numbers? by using regex ^[a-zA-Z0-9]*$
if your value just contained numberString than its show match i.e, riz99, riz99z
else it will show not match i.e, 99z., riz99.z, riz99.9
Example code:
if(e.target.value.match('^[a-zA-Z0-9]*$')){
console.log('match')
}
else{
console.log('not match')
}
}
online working example

Categories