Java Regex First Name Validation

Java Regex First Name Validation - java

I understand that validating the first name field is highly controversial due to the fact that there are so many different possibilities. However, I am just learning regex and in an effort to help grasp the concept, I have designed some simple validations to create just try to make sure I am able to make the code do exactly what I want it to, despite whether or not it conforms to best business logic practices.
I am trying to validate a few things.
The first name is between 1 and 25 characters.
The first name can only start with an a-z (ignore case) character.
After that the first name can contain a-z (ignore case) and [ '-,.].
The first name can only end with an a-z (ignore case) character.
public static boolean firstNameValidation(String name){
valid = name.matches("(?i)(^[a-z]+)[a-z .,-]((?! .,-)$){1,25}$");
System.out.println("Name: " + name + "\nValid: " + valid);
return valid;
}

Try this regex
^[^- '](?=(?![A-Z]?[A-Z]))(?=(?![a-z]+[A-Z]))(?=(?!.*[A-Z][A-Z]))(?=(?!.*[- '][- '.]))(?=(?!.*[.][-'.]))[A-Za-z- '.]{2,}$
Demo

Your expression is almost correct. The following is a modification that satisfies all of the conditions:
valid = name.matches("(?i)(^[a-z])((?![ .,'-]$)[a-z .,'-]){0,24}$");

A regex for the same:
([a-zA-z]{1}[a-zA-z_'-,.]{0,23}[a-zA-Z]{0,1})

Lets change the order of the requirements:
ignore case: "(?i)"
can only start with an a-z character: "(?i)[a-z]"
can only end with an a-z: "(?i)[a-z](.*[a-z])?"
is between 1 and 25 characters: "(?i)[a-z](.{0,23}[a-z])?"
can contain a-z and [ '-,.]: "(?i)[a-z]([- ',.a-z]{0,23}[a-z])?"
the last one should do the job:
valid = name.matches("(?i)[a-z]([- ',.a-z]{0,23}[a-z])?")
Test on RegexPlanet (press java button).
Notes for above points
could have used "[a-zA-Z]"' instead of"(?i)"'
need ? since we want to allow one character names
23 is total length minus first and the last charracter (25-1-1)
the - must come first (or last) inside [] else it is interpreted as range sepparator (assuming you didn't mean the characters between ' and ,)

Try this simplest version:
^[a-zA-Z][a-zA-Z][-',.][a-zA-Z]{1,25}$
Thanks for sharing.

A unicode compatible version of the answer of #ProPhoto:
^[^- '](?=(?!\p{Lu}?\p{Lu}))(?=(?!\p{Ll}+\p{Lu}))(?=(?!.*\p{Lu}\p{Lu}))(?=(?!.*[- '][- '.]))(?=(?!.*[.][-'.]))(\p{L}|[- '.]){2,}$

Related

Regex to match a fixed sub string in a String

I am trying to write a regular expression to verify the presence of a specific number in a fixed position in a String.
String: 109300300330066611111111100000000017000656052086116020170111Name 1
Number to find: 111111111 (Staring from position 17)
I have written the following regular expression:
^.{16}(?<Ones>111111111)(.*)
My understanding is:
Let first 16 characters be whatever they are
Use the Named Capturing Group to grab the specific word
Let the rest of the characters be whatever they are
I am new to regex, is there any issue with the above approach?
Can it be done in other/better way?
I am using Java 8.

Without more details of why you're doing what you're doing, there's just one possible improvement I can see. You repeated any character 16 times at the beginning of the string rather than writing out 16 .s, which is nice and readable, but then, it would be nice to do the same for the repeated 1s:
^.{16}(?<Ones>1{9})(.*)
Otherwise, the string of 1s is hard to understand without the coder manually counting how many there are in the regex.

If you want to hard-code the ones and you know the starting position and you just wnat to know if it is there, using a regex seems unnecessary. you can use this:
String s = "109300300330066611111111100000000017000656052086116020170111Name 1";
if (s.indexOf("111111111").equals(16) doSomething();
Another possible solution without regex:
if(s.substring(16,25).equals("111111111") doSomething();
Otherwise your regex looks good.

Regex expression for comma and dash seperated text of items

I do have a Java Web Application, where I get some inputs from the user. Once I got this input I have to parse it and the parsing part depends on what kind of input I'll get. I decided to use the Pattern class of java for some of predefined user inputs.
So I need the last 2 regex patterns:
a)Enumaration:
input can be - A03,B24.1,A25.7
The simple way would be to check if there are a comma in there ([^,]+) but it will end up with a lot of updates in to parsing function, which I would like to avoid. So, in addition to comma it should check if it starts with
letter
minimum 3 letters (combined with numbers)
can have one dot in the word
minimum 1 comma (updated it)
b) Mixed
input can be A03,B24.1-B35.5,A25.7
So all of what Enumuration part got, but with addition that it can have a dash minimum one.
I've tried to use multiple online regex generators but didnt get it correct. Would be much appreciated if you can help.
Here is what I got if its B24.1-B35.5 if its just a simple range.
"='.{1}\\d{0,2}-.{1}\\d{0,2}'|='.{1}\\d{1,2}.\\d{1,2}-.{1}\\d{1,2}.\\d{1,2}'";
Edit1: Valid and Invalid inputs
for a)Enumaration
A03,B24.1,A25.7 Valid
A03,B24.1 Valid
A03,B24.1-B25.1 -Invalid because in this case (enumaration) it should not contain dash
A03 invalid because no comma
A03,B24.1 - Valid
A03 Invalid
for b)Mixed
everything that a enumeration has with addition that it can have dash too.

You can use this regex for (a) Enumeration part as per your rules:
[A-Za-z][A-Za-z0-9]{2,}(?:\.[A-Za-z0-9]{1,})?(?:,[A-Za-z][A-Za-z0-9]{2,}(?:\.[A-Za-z0-9]{1,})?)+
Rules:
Verifies that each segment starts with a letter
Minimum of three letters or numbers [A-Za-z][A-Za-z0-9]{2,}
Optionally followed by decimal . and one or more alphabets and numbers i.e (?:\.[A-Za-z0-9]{1,})?
Same thing repeated, and seperated by a comma ,. Also must have atleast one comma so using + i.e (?:,[A-Za-z][A-Za-z0-9]{2,}(?:\.[A-Za-z0-9]{1,})?)+
?: to indicate non-capturing group
Using [A-Za-z0-9] instead of \w to avoid underscores
Regex101 Demo
For (b) Mixed, you haven't shared too many valid and invalid cases, but based on my current understanding here's what I have:
[A-Za-z][A-Za-z0-9]{2,}(?:\.[A-Za-z0-9]{1,})?(?:[,-][A-Za-z][A-Za-z0-9]{2,}(?:\.[A-Za-z0-9]{1,})?)+
Note that , from previous regex has been replaced with [,-] to allow - as well!
Regex101 Demo
// Will match
A03,B24.1-B35.5,A25.7
A03,B24.1,A25.7
A03,B24.1-B25.1
Hope this helps!
EDIT: Making sure each group starts with a letter (and not a number)
Thanks to #diginoise and #anubhava for pointing out! Changed [A-Za-z0-9]{3,} to [A-Za-z][A-Za-z0-9]{2,}

As I said in the comments, I would chop the input by commas and verify each segment separately. Your domain ICD 10 CM codes is very well defined and also I would be very wary of any input which could be non valid, yet pass the validation.
Here is my solution:
regex
([A-TV-Z][0-9][A-Z0-9](\.?[A-Z0-9]{0,4})?)
... however I would avoid that.
Since your domain is (moste likely) medical software, people's lives (or at least well being) is at stake. Not to mention astronomical damages and the lawyers ever-chasing ambulances. Therefore avoid the easy solution, and implement the bomb proof one.
You could use the regex to establish that given code is definitely not valid. However if a code passes your regex it does not mean that it is valid.
bomb proof method
See this example: O09.7, O09.70, O09.71, O09.72, O09.73 are valid entries, but O09.1 is not valid.
Therefore just get all possible codes. According to this gist there are 42784 different codes. Just load them to memory and any code which is not in the set, is not valid. You could compress said list and be clever about the encoding in memory, to occupy less space, but verbatim all codes are under 300kB on disk, so few MBs max in memory, therefore not a massive cost to pay for a price of people not having left instead of right kidney removed.

Regex to validate words do not contain numbers or special characters

I am developing a java app, running on android. I am trying to pick all words which do not contain any embedded digits or symbols.
The best I have come up with is:
\b[a-zA-Z]+[a-zA-Z]*+\b
Test Data:
this is a test , an0ther gr8 WW##ee one, w1n 1test test1 end
This results in picking the following: this, is, a, test, WW##ee, one, end
I need to eliminate the WW##ee from the results.

You shouldn't use a word boundary meta-character \b since it matches the position right after WW which sees a hash # character. This position is a word boundary itself. So you should pick up a different way:
(?<![\S&&[^,]])[a-zA-Z]+(?![\S&&[^,]])
Using character class intersection feature of Java's regex you are able to define punctuation characters that are allowed to follow or precede a word character. Here it is a comma ,.

You could use look behind and look ahead to check there is no #.
\b(?<!\#)[a-zA-Z]+(?!\#)\b

My solution has evolved a bit as I have gotten additional help with this. So, this is now my best solution but still a bit lacking. I have not been able to accept "as-is" while rejecting "-this-" and a similar case of accept "and/or" while rejecting "/slash/". Also for simplicity I have made the input data single word per line.
^(?:[\p{P}\p{S}])?((?:[\p{L}\p{Pd}'])+)(?:[\p{P}\p{S}])$
as-is is picked valid
-this- is valid but I wish it weren't
and/or is not valid but I wish it would be picked
/slash/ "slash" is picked valid
(test) "test" is picked valid
[test] "test" is picked valid
<test> "test" is picked valid

Regular expression, excluding .. in suffix of email addy [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Using a regular expression to validate an email address
This is homework, I've been working on it for a while, I've done lots of reading and feel I have gotten pretty familiar with regex for a beginner.
I am trying to find a regular expression for validating/invalidating a list of emails. There are two addresses which are giving me problems, I can't get them both to validate the correct way at the same time. I've gone through a dozen different expressions that work for all the other emails on the list but I can't get those two at the same time.
First, the addresses.
me#example..com - invalid
someone.nothere#1.0.0.127 - valid
The part of my expression which validates the suffix
I originally started with
#.+\\.[[a-z]0-9]+
And had a second pattern for checking some more invalid addresses and checked the email against both patterns, one checked for validity the other invalidity but my professor said he wanted it all in on expression.
#[[\\w]+\\.[\\w]+]+
or
#[\\w]+\\.[\\w]+
I've tried it written many, many different ways but I'm pretty sure I was just using different syntax to express these two expressions.
I know what I want it to do, I want it to match a character class of "character+"."character+"+
The plus sign being at least one. It works for the invalid class when I only allow the character class to repeat one time(and obviously the ip doesn't get matched), but when I allow the character class to repeat itself it matches the second period even thought it isn't preceded by a character. I don't understand why.
I've even tried grouping everything with () and putting {1} after the escaped . and changing the \w to a-z and replacing + with {1,}; nothing seems to require the period to surrounded by characters.

You need a negative look-ahead :
#\w+\.(?!\.)
See http://www.regular-expressions.info/lookaround.html
test in Perl :
Perl> $_ = 'someone.nothere#1.0.0.127'
someone.nothere#1.0.0.127
Perl> print "OK\n" if /\#\w+\.(?!\.)/
OK
1
Perl> $_ = 'me#example..com'
me#example..com
Perl> print "OK\n" if /\#\w+\.(?!\.)/
Perl>

#([\\w]+\\.)+[\\w]+
Matches at least one word character, followed by a '.'. This is repeated at least once, and is then followed by at least on more word character.

I think you want this:
#[\\w]+(\\.[\\w]+)+
This matches a "word" followed by one or more "." "word" sequences. (You can also do the grouping the other way around; e.g. see Dailin's answer.)
The problem with what you are doing before was that you were trying to embed a repeat inside a character class. That doesn't make sense, and there is no syntax that would support it. A character class defines a set of characters and matches against one character. Nothing more.

The official standard RFC 2822 describes the syntax that valid email addresses with this regular expression:
(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")#(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])
More practical implementation of RFC 2822 (if we omit the syntax using double quotes and square brackets), which will still match 99.99% of all email addresses in actual use today, is:
[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*#(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?

Regex for checking > 1 upper, lower, digit, and special char

^.(?=.{15,})(?=.\d)(?=.[a-z])(?=.[A-Z])(?=.[!##$%^&+=]).*$
This is the regex I am currently using which will evaluate on 1 of each: upper,lower,digit, and specials of my choosing. The question I have is how do I make it check for 2 of each of these? Also I ask because it is seemingly difficult to write a test case for this as I do not know if it is only evaluating the first set of criteria that it needs. This is for a password, however the requirement is that it needs to be in regex form based on the package we are utilizing.
EDIT
Well as it stands in my haste to validate the expression I forgot to validate my string length. Thanks to Ken and Gumbo on helping me with this.
This is the code I am executing:
I do apologize as regex is not my area.
The password I am using is the following string "$$QiouWER1245", the behavior I am experiencing at the moment is that it randomly chooses to pass or fail. Any thoughts on this?
Pattern pattern = Pattern.compile(regEx);
Matcher match = pattern.matcher(password);
while(match.find()){
System.out.println(match.group());
}
From what I see if it evaluates to true it will throw the value in password back to me else it is an empty string.

Personally, I think a password policy that forces use of all three character classes is not very helpful. You can get the same degree of randomness by letting people make longer passwords. Users will tend to get frustrated and write passwords down if they have to abide by too many password rules (which make the passwords too difficult to remember). I recommend counting bits of entropy and making sure they're greater than 60 (usually requires a 10-14 character password). Entropy per character would depend roughly on the number of characters, the range of character sets they use, and maybe how often they switch between character sets (I would guess that passwords like HEYthere are more predictable than heYThEre).
Another note: do you plan not to count the symbols to the right of the keyboard (period, comma, angle brackets, etc.)?
If you still have to find groups of two characters, why not just repeat each pattern? For example, make (?=.\d) into (?=.\d.*\d).
For your test cases, if you are worried that it would only check the first criteria, then write a test case that makes sure each of the following passwords fails (because one and only one of the criteria is not met in each case): Just for fun I reversed the order of expectation of each character set, though it probably won't make a difference unless someone removes/forgets the ?= at some future date.
!##TESTwithoutnumbers
TESTwithoutsymbols123
&*(testwithoutuppercase456
+_^TESTWITHOUTLOWERCASE3498
I should point out that technically none of these passwords should be acceptable because they use dictionary words, which have about 2 bits of entropy per character instead of something more like 6. However, I realize that it's difficult to write a (maintainable and efficient) regular expression to check for dictionary words.

Try this:
"^(?=(?:\\D*\\d){2})(?=(?:[^a-z]*[a-z]){2})(?=(?:[^A-Z]*[A-Z]){2})(?=(?:[^!##$%^&*+=]*[!##$%^&*+=]){2}).{15,}$"
Here non-capturing groups (?:…) are used to group the conditions and repeat them. I’ve also used the complements of each character class for optimization instead of the universal ..

If I understand your question correctly, you want at least 15 characters, and to require at least 2 uppercase characters, at least 2 lowercase characters, at least 2 digits, and at least 2 special characters. In that case you could it like this:
^.*(?=.{15,})(?=.*\d.*\d)(?=.*[a-z].*[a-z])(?=.*[A-Z].*[A-Z])(?=.*[!##$%^&*+=].*[!##$%^&*+=]).*$
BTW, your original regex had an extra backslash before the \d

I'm not sure that one big regex is the right way to go here. It already looks far too complicated and will be very difficult to change in the future.
My suggestion is to structure the code in the following way:
check that the string has 2 lower case characters
return failure if not found or continue
check that the string has 2 upper case characters
return failure if not found or continue
etc.
This will also allow you to pass out a return code or errors string specifying why the password was not accepted and the code will be much simpler.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.