Java String.matches() regular expressions

Java String.matches() regular expressions - java

I am trying to use the matches() function to check a user-entered password for certain conditions
"Contains six alphanumeric characters, at least one letter and one number"
Here is my current condition for checking for alphanumeric characters
pword.matches("([[a-zA-Z]&&0-9])*")
unfortunately in example using "rrrrZZ1" as the password this condition still returns false
What would be the correct regular expression? Thank you

Someone else here may prove me wrong, but this is going to be very difficult to do without an excessively complex regular expression. I'd just use a non-regular expression approach instead. Set 3 booleans for each of your conditions, loop through the characters and set each boolean as each condition is met, and if all 3 booleans don't equal true, then fail the verification.
You could use something as simple as this:
public boolean validatePassword(String password){
if(password.length() < 6){
return false;
}
boolean foundLetter = false;
boolean foundNumber = false;
for(int i=0; i < password.length(); i++){
char c = password.charAt(i);
if(Character.isLetter(c)){
foundLetter = true;
}else if(Character.isDigit(c)){
foundNumber = true;
}else{
// Isn't alpha-numeric.
return false;
}
}
return foundLetter && foundNumber;
}

I agree with ziesemer - use a validatePassword() method instead of cramming it into regex.
Much more readable for a developer to maintain.
If you do want to go down the regex path, it is achievable using zero width positive lookaheads.
Contains six characters, at least one letter and one number:
^.*(?=.{6,})(?=.*[a-zA-Z]).*$
I changed your six alphanumeric characters to just characters.
Supports more complex passwords :)
Great post on the topic:
http://www.zorched.net/2009/05/08/password-strength-validation-with-regular-expressions/
Bookmark this one too:
http://www.regular-expressions.info/

Related

finding the middle index of a substring when there are duplicates in the string

I was working on a Java coding problem and encountered the following issue.
Problem:
Given a string, does "xyz" appear in the middle of the string? To define middle, we'll say that the number of chars to the left and right of the "xyz" must differ by at most one
xyzMiddle("AAxyzBB") → true
xyzMiddle("AxyzBBB") → false
My Code:
public boolean xyzMiddle(String str) {
boolean result=false;
if(str.length()<3)result=false;
if(str.length()==3 && str.equals("xyz"))result=true;
for(int j=0;j<str.length()-3;j++){
if(str.substring(j,j+3).equals("xyz")){
String rightSide=str.substring(j+3,str.length());
int rightLength=rightSide.length();
String leftSide=str.substring(0,j);
int leftLength=leftSide.length();
int diff=Math.abs(rightLength-leftLength);
if(diff>=0 && diff<=1)result=true;
else result=false;
}
}
return result;
}
Output I am getting:
Running for most of the test cases but failing for certain edge cases involving more than once occurence of "xyz" in the string
Example:
xyzMiddle("xyzxyzAxyzBxyzxyz")
My present method is taking the "xyz" starting at the index 0. I understood the problem. I want a solution where the condition is using only string manipulation functions.
NOTE: I need to solve this using string manipulations like substrings. I am not considering using list, stringbuffer/builder etc. Would appreciate answers which can build up on my code.

There is no need to loop at all, because you only want to check if xyz is in the middle.
The string is of the form
prefix + "xyz" + suffix
The content of the prefix and suffix is irrelevant; the only thing that matters is they differ in length by at most 1.
Depending on the length of the string (and assuming it is at least 3):
Prefix and suffix must have the same length if the (string's length - the length of xyz) is even. In this case:
int prefixLen = (str.length()-3)/2;
result = str.substring(prefixLen, prefixLen+3).equals("xyz");
Otherwise, prefix and suffix differ in length by 1. In this case:
int minPrefixLen = (str.length()-3)/2;
int maxPrefixLen = minPrefixLen+1;
result = str.substring(minPrefixLen, minPrefixLen+3).equals("xyz") || str.substring(maxPrefixLen, maxPrefixLen+3).equals("xyz");
In fact, you don't even need the substring here. You can do it with str.regionMatches instead, and avoid creating the substrings, e.g. for the first case:
result = str.regionMatches(prefixLen, "xyz", 0, 3);

Super easy solution:
Use Apache StringUtils to split the string.
Specifically, splitByWholeSeparatorPreserveAllTokens.
Think about the problem.
Specifically, if the token is in the middle of the string then there must be an even number of tokens returned by the split call (see step 1 above).
Zero counts as an even number here.
If the number of tokens is even, add the lengths of the first group (first half of the tokens) and compare it to the lengths of the second group.
Pay attention to details,
an empty token indicates an occurrence of the token itself.
You can count this as zero length, count as the length of the token, or count it as literally any number as long as you always count it as the same number.
if (lengthFirstHalf == lengthSecondHalf) token is in middle.

Managing your code, I left unchanged the cases str.lengt<3 and str.lengt==3.
Taking inspiration from #Andy's answer, I considered the pattern
prefix+'xyz'+suffix
and, while looking for matches I controlled also if they respect the rule IsMiddle, as you defined it. If a match that respect the rule is found, the loop breaks and return a success, else the loop continue.
public boolean xyzMiddle(String str) {
boolean result=false;
if(str.length()<3)
result=false;
else if(str.length()==3 && str.equals("xyz"))
result=true;
else{
int preLen=-1;
int sufLen=-2;
int k=0;
while(k<str.lenght){
if(str.indexOf('xyz',k)!=-1){
count++;
k=str.indexOf('xyz',k);
//check if match is in the middle
preLen=str.substring(0,k).lenght;
sufLen=str.substring(k+3,str.lenght-1).lenght;
if(preLen==sufLen || preLen==sufLen-1 || preLen==sufLen+1){
result=true;
k=str.length; //breaks the while loop
}
else
result=false;
}
else
k++;
}
}
return result;
}

Boolean always returns false possibly due to values not increasing

This method, moreVowels, is intended to be able to count the amount of vowels and consonants in the String entered, and return true if the amount of vowels is greater than the amount of consonants. Sadly this code always returns false, and I cannot understand why. Here is the method stated:
public Boolean moreVowels()
{ vowelCount = 0;
consonantCount = 0;
for(int i = 0; i < word.length(); i++)
{
if ("AEOIUY".contains(word.substring(i,i++)) || "aeoiuy".contains(word.substring(i,i++)))
{
vowelCount++;
}
if ("BCDFGHJKLMNPQRSTVWXZ".contains(word.substring(i,i++)) || "abcdefghijklmnopqrstuvwxyz".contains(word.substring(i,i++)))
{
consonantCount++;
}
}
if (vowelCount > consonantCount)
{
return true;
}
else
{
return false;
}
}
I believe it is always returning false due to the loop not actually increasing the counts, but I'm not quite sure why not. Thank you for reading, I'm sure the answer is something silly that I failed to recognize.

First, you should not use substring(i,i++), but substring(i,i+1). Otherwise, you'll increase i, making your code skip letters.
"abcdefghijklmnopqrstuvwxyz".contains(word.substring(i,i+1)) looks like a mistake. It will cause consonantCount to increase in each loop for every lowercase letter.
If you're only dealing with words (no spaces etc.), then every word is either a consonant or a vowel, so you don't need the second if. You could get consonant count by subtracting vowelCount from length.
Furthermore, if you convert the i-th character to uppercase, you can omit the || "aeoiuy".contains(...) part.

The other answer and comment already show the problems with your code. I just want to add a possible stream based solution that can reduce the possibilty for errors by repacing the index based looping and local variables:
private static boolean moreVowels(String word)
{
return word.chars()
.mapToObj(c -> Character.toString((char) c).toUpperCase())
.mapToInt(c -> "AEIOUY".contains(c) ? 1 : "BCDFGHJKLMNPQRSTVWXZ".contains(c) ? -1 : 0)
.sum() > 0;
}
You can apply the use of toUpperCase() to your own implementation as well to make the if statements a bit shorter (again avoiding possible errors).

Java: Efficient way to determine if a String meets several criteria?

I would like to find an efficient way (not scanning the String 10,000 times, or creating lots of intermediary Strings for holding temporary results, or string bashing, etc.) to write a method that accepts a String and determine if it meets the following criteria:
It is at least 2 characters in length
The first character is uppercased
The remaining substring after the first character contains at least 1 lowercased character
Here's my attempt so far:
private boolean isInProperForm(final String token) {
if(token.length() < 2)
return false;
char firstChar = token.charAt(0);
String restOfToken = token.substring(1);
String firstCharAsString = firstChar + "";
String firstCharStrToUpper = firstCharAsString.toUpperCase();
// TODO: Giving up because this already seems way too complicated/inefficient.
// Ignore the '&& true' clause - left it there as a placeholder so it wouldn't give a compile error.
if(firstCharStrToUpper.equals(firstCharAsString) && true)
return true;
// Presume false if we get here.
return false;
}
But as you can see I already have 1 char and 3 temp strings, and something just doesn't feel right. There's got to be a better way to write this. It's important because this method is going to get called thousands and thousands of times (for each tokenized word in a text document). So it really really needs to be efficient.
Thanks in advance!

This function should cover it. Each char is examined only once and no objects are created.
public static boolean validate(String token) {
if (token == null || token.length() < 2) return false;
if (!Character.isUpperCase(token.charAt(0)) return false;
for (int i = 1; i < token.length(); i++)
if (Character.isLowerCase(token.charAt(i)) return true;
return false;

The first criteria is simply the length - this data is cached in the string object and is not requiring traversing the string.
You can use Character.isUpperCase() to determine if the first char is upper case. No need as well to traverse the string.
The last criteria requires a single traversal on the string- and stop when you first find a lower case character.
P.S. An alternative for the 2+3 criteria combined is to use a regex (not more efficient - but more elegant):
return token.matches("[A-Z].*[a-z].*");
The regex is checking if the string starts with an upper case letter, and then followed by any sequence which contains at least one lower case character.

It is at least 2 characters in length
The first character is
uppercased
The remaining substring after the first character contains
at least 1 lowercased character
Code:
private boolean isInProperForm(final String token) {
if(token.length() < 2) return false;
if(!Character.isUpperCase(token.charAt(0)) return false;
for(int i = 1; i < token.length(); i++) {
if(Character.isLowerCase(token.charAt(i)) {
return true; // our last criteria, so we are free
// to return on a met condition
}
}
return false; // didn't meet the last criteria, so we return false
}
If you added more criteria, you'd have to revise the last condition.

What about:
return token.matches("[A-Z].*[a-z].*");
This regular expression starts with an uppercase letter and has at least one following lowercase letter and therefore meets your requirements.

To find if the first character is uppercase:
Character.isUpperCase(token.charAt(0))
To check if there is at least one lowercase:
if(Pattern.compile("[a-z]").matcher(token).find()) {
//At least one lowercase
}

To check if first char is uppercase you can use:
Character.isUpperCase(s.charAt(0))

return token.matches("[A-Z].[a-z].");

Java: looking for the fastest way to check String for presence of Unicode chars in certain range

I need to implement a very crude language identification algorithm. In my world, there are only two languages: English and not-English. I have ArrayList and I need to determine if each String is likely in English or the other language which has its Unicode chars in a certain range. So what I want to do is to check each String against this range using some type of "presence" test. If it passes the test, I say the String is not English, otherwise it's English. I want to try two type of tests:
TEST-ANY: If any char in the string falls within the range, the string passes the test
TEST-ALL: If all chars in the string fall within the range, the string passes the test
Since the array might be very long, I need to implement this very efficiently. What would be the fastest way of doing this in Java?
Thx
UPDATE: I am specifically checking for non-English by looking at a specific range of Unicodes rather then checking for whether the characters are ASCII, in part to take care of the "resume" problem mentioned below. What I am trying to figure out is whether Java provides any classes/methods that essentially implement TEST-ANY or TEST-ALL (or another similar test) as efficiently as possible. In other words, I am trying to avoid reinventing the wheel especially if the wheel invented before me is better anyway.

Here's how I ended up implementing TEST-ANY:
// TEST-ANY
String str = "wordToTest";
int UrangeLow = 1234; // can get range from e.g. http://www.utf8-chartable.de/unicode-utf8-table.pl
int UrangeHigh = 2345;
for(int iLetter = 0; iLetter < str.length() ; iLetter++) {
int cp = str.codePointAt(iLetter);
if (cp >= UrangeLow && cp <= UrangeHigh) {
// word is NOT English
return;
}
}
// word is English
return;

I really don't think that this solution is ideal for determining language, but if you want to check to see if a string is all ascii, you could do something like this:
public static boolean isASCII(String s){
boolean ret = true;
for(int i = 0; i < s.length() ; i++) {
if(s.charAt(i)>=128){
ret = false;
break;
}
}
return ret;
}
So then if you try this:
boolean r = isASCII("Hello");
r would equal true. But if you try:
boolean r = isASCII("Grüß dich");
then r would equal false. I haven't tested performance, but this would work reasonably fast, because all it does is compare a character to the number 128.
But as #AlexanderPogrebnyak mentioned in the comments above, this will return false if you give it "résumé". Be aware of that.
Update:
I am specifically checking for non-English by looking at a specific range of Unicodes rather then checking for whether the characters are ASCII
But ASCII is a range in Unicode (well at least in UTF-8). Unicode is just an extension of ASCII. What the code #mP. and I provided does is it checks to see whether each character is in a certain range. I chose that range to be ASCII, which is any Unicode character that has a decimal value of less than 128. You can just as well choose any other range. But the reason I chose ASCII is because it's the one with the Latin alphabet, the Arabic numbers, and some other common characters that would normally be in an 'English' string.

public static boolean isAscii( String s ){
int length = s.length;
for( int i = 0; i < length; i++){
final char c = s.charAt( i );
if( c > 'z' ){
return false;
}
}
return true;
}
#Hassan thanks for picking the typo replaced test against big Z with little z.

Regular expression for validating an answer to a question

Hey everyone,
I'm having a minor difficulty setting up a regular expression that evaluates a sentence entered by a user in a textbox to keyword(s). Essentially, the keywords have to be entered consecutive from one to the other and can have any number of characters or spaces before, between, and after (ie. if the keywords are "crow" and "feet", crow must be somewhere in the sentence before feet. So with that in mind, this statement should be valid "blah blah sccui crow dsj feet "). The characters and to some extent, the spaces (i would like the keywords to have at least one space buffer in the beginning and end) are completely optional, the main concern is whether the keywords were entered in their proper order.
So far, I was able to have my regular expression work in a sentence but failed to work if the answer itself was entered only.
I have the regular expression used in the function below:
// Comparing an answer with the right solution
protected boolean checkAnswer(String a, String s) {
boolean result = false;
//Used to determine if the solution is more than one word
String temp[] = s.split(" ");
//If only one word or letter
if(temp.length == 1)
{
if (s.length() == 1) {
// check multiple choice questions
if (a.equalsIgnoreCase(s)) result = true;
else result = false;
}
else {
// check short answer questions
if ((a.toLowerCase()).matches(".*?\\s*?" + s.toLowerCase() + "\\s*?.*?")) result = true;
else result = false;
}
}
else
{
int count = temp.length;
//Regular expression used to
String regex=".*?\\s*?";
for(int i = 0; i<count;i++)
regex+=temp[i].toLowerCase()+"\\s*?.*?";
//regex+=".*?";
System.out.println(regex);
if ((a.toLowerCase()).matches(regex)) result = true;
else result = false;
}
return result;
Any help would greatly be appreciated.
Thanks.

I would go about this in a different way. Instead of trying to use one regular expression, why not use something similar to:
String answer = ... // get the user's answer
if( answer.indexOf("crow") < answer.indexOf("feet") ) {
// "correct" answer
}
You'll still need to tokenize the words in the correct answer, then check in a loop to see if the index of each word is less than the index of the following word.

I don't think you need to split the result on " ".
If I understand correctly, you should be able to do something like
String regex="^.*crow.*\\s+.*feet.*"
The problem with the above is that it will match "feetcrow feetcrow".
Maybe something like
String regex="^.*\\s+crow.*\\s+feet\\s+.*"
That will enforce that the word is there as opposed to just in a random block of characters.

Depending on the complexity Bill's answer might be the fastest solution. If you'd prefer a regular expression, I wouldn't look for any spaces, but word boundaries instead. That way you won't have to handle commas, dots, etc. as well:
String regex = "\\bcrow(?:\\b.*\\b)?feet\\b"
This should match "crow bla feet" as well as "crowfeet" and "crow, feet".
Having to match multiple words in a specific order you could just join them together using '(?:\b.*\b)?' without requiring any additional sorting or checking.

Following Bill answer, I'd try this:
String input = // get user input
String[] tokens = input.split(" ");
String key1 = "crow";
String key2 = "feet";
String[] tokens = input.split(" ");
List<String> list = Arrays.asList(tokens);
return list.indexOf(key1) < list.indexOf(key2)

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java String.matches() regular expressions - java

Related

finding the middle index of a substring when there are duplicates in the string

Boolean always returns false possibly due to values not increasing

Java: Efficient way to determine if a String meets several criteria?

Java: looking for the fastest way to check String for presence of Unicode chars in certain range

Regular expression for validating an answer to a question

Categories

Resources