java regex for US State Validation - java

I wrote this java method to do regex and missing something because it fails for all conditions. I am new to regex and unable to figure out whats causing it to fail for everything. Can some expert help me.
public static boolean isStateValid(String state){
String expression = "/^(?:A[KLRZ]|C[AOT]|D[CE]|FL|GA|HI|I[ADLN]|K[SY]|LA|M[ADEINOST]|N[CDEHJMVY]|O[HKR]|PA|RI|S[CD]|T[NX]|UT|V[AT]|W[AIVY])*$/";
CharSequence inputStr = state;
Pattern pattern = Pattern.compile(expression);
Matcher matcher = pattern.matcher(inputStr);
if (matcher.matches()) {
return true;
}else{
return false;
}
}
Changed to this after reading comments and stil it isnt working
public static boolean isStateValid(String state) {
CharSequence inputStr = state;
Pattern pattern = Pattern
.compile("AL|AK|AR|AZ|CA|CO|CT|DC|DE|FL|GA|HI|IA|ID|IL|IN|KS|KY|LA|MA|MD|ME|MI|MN|MO|MS|MT|NC|ND|NE|NH|NJ|NM|NV|NY|OH|OK|OR|PA|RI|SC|SD|TN|TX|UT|VA|VT|WA|WI|WV|WY|al|ak|ar|az|ca|co|ct|dc|de|fl|ga|hi|ia|id|il|in|ks|ky|la|ma|md|me|mi|mn|mo|ms|mt|nc|nd|ne|nh|nj|nm|nv|ny|oh|ok|or|pa|ri|sc|sd|tn|tx|ut|va|vt|wa|wi|wv|wy");
Matcher matcher = pattern.matcher(inputStr);
if (matcher.matches()) {
return true;
} else {
return false;
}
}

A lot of things.
First it is not perl. Remove leading and trailing slashes.
Second, why non-capturing group? I mean (?: You do not need group at all here.
Third why so complicated? Just say something like
Pattern.compile("AL|AK|AR|AZ|CA");
etc., all states. Your optimization does not have any benefits. It just makes regex more complicated.

It doesn't like the / characters at the beginning and end of the pattern. When I remove those, it works. Also, I don't think you want the * repetition character at the end.

This one is case-sensitive and includes the US territories:
^(?-i:A[LKSZRAEP]|C[AOT]|D[EC]|F[LM]|G[AU]|HI|I[ADLN]|K[SY]|LA|M[ADEHINOPST]|N[CDEHJMVY]|O[HKR]|P[ARW]|RI|S[CD]|T[NX]|UT|V[AIT]|W[AIVY])$
However, I think using regex is over the top. Use a list or array of some kind.

Is this an exercise in practicing regular expressions, or something that will be used in a production environment?
If it's the latter, then using a lookup list will be better. In this case a regex obfuscates and overcomplicates what you're trying to do.

I haven't used this particular tool, but have used a commercial tool that's similar, but I think you would benefit from a tool like: http://sourceforge.net/projects/regexevaluator/

Related

regex in java string

While trying some JAVA coding on the codingbat.com site, I came repeatedly to a Question about the functionality of regular expressions in java strings.
I know there are JAVA methods like matches() or finder() as well as replace() and so on, but this isn't where I wanted to go.
Take a quick look at the example:
boolean doubleX(String str) {
if(str.equals("xx")){
return true;
} else {
return false;
}
}
I wonder whether I could use regular expressions in the string to add a quantifier, for example
<----- add regex here
if(str.equals("x\[x.*]")){
Would you sirs, be so kind, to explain me, how I could use regex in strings? After all I understood, I thought, it would be possible even w/o using the java regex methodes, because the escape signal \ makes them usable even in plain code. Did I got this wrong?
Use String#matches(String)
if (str.matches(regex)) {
// ...
}
This will only find out if there is a match for the regex though.
What I suggest is that you specify the quantifier in your regex instead of counting the number of matches, like so:
public boolean isX(String str, int count) {
return str.matches("^x{" + count + "}$");
}
Some methods support regex as input and some is not. In general you can't use regex in plain String, because after all it will be just plain string. But some your or framework's methods can support regex inside with Pattern or other approaches.
You can use the Pattern and the Matcher class
private final Pattern PATTERN = Pattern.compile("x\[x.*]");
and then
Matcher matcher = PATTERN.matcher(str);
if (matcher.find())
doSomething();

Java fix regex in code

I need to print #OPOK, but in the following code:
String s = "\"MSG1\":\"00\",\"MSG2\":\"#OPOK\",\"MSG3\":\"XXXXXX\"}";
Pattern pattern = Pattern.compile(".*\"MSG2\":\"(.+)\".*");
Matcher matcher = pattern.matcher(s);
if (matcher.find()) {
System.out.println(matcher.group(1));
} else {
System.out.println("Match not found");
}
I get #OPOK","MSG3":"XXXXXX instead, how do I fix my pattern ?
You want to make your .+ part reluctant. By default it's greedy - it'll match as much as it can without preventing the pattern from matching. You want it to match as little as it can, like this:
Pattern pattern = Pattern.compile(".*\"MSG2\":\"(.+?)\".*");
The ? is what makes it reluctant. See the Pattern documentation for more details.
Or of course you could just match against "any character other than a double quote" which is what Brian's approach will do. Both will work equally well as far as I'm aware; there may well be performance differences between them (I'd expect Brian's to perform better to be honest) but if performance is important to you you should test both approaches.
You probably want the following:
Pattern pattern = Pattern.compile("\"MSG2\":\"([^\"]+)\"");
For the capture group you are interested in, this will match any character except a double quote. Since the group is surrounded by double quotes, this should prevent it from going "too far" in the match.
Edited to add: As #bmorris591 suggested in the comments, you can add an extra + (as shown below) to make the quantifier possessive. This may help improve performance in cases where the matcher fails to find a match.
Pattern pattern = Pattern.compile("\"MSG2\":\"([^\"]++)\"");

How can I provide an OR operator in regular expressions?

I want to match my string to one sequence or another, and it has to match at least one of them.
For and I learned it can be done with:
(?=one)(?=other)
Is there something like this for OR?
I am using Java, Matcher and Pattern classes.
Generally speaking about regexes, you definitely should begin your journey into Regex wonderland here: Regex tutorial
What you currently need is the | (pipe character)
To match the strings one OR other, use:
(one|other)
or if you don't want to store the matches, just simply
one|other
To be Java specific, this article is very good at explaining the subject
You will have to use your patterns this way:
//Pattern and Matcher
Pattern compiledPattern = Pattern.compile(myPatternString);
Matcher matcher = pattern.matcher(myStringToMatch);
boolean isNextMatch = matcher.find(); //find next match, it exists,
if(isNextMatch) {
String matchedString = myStrin.substring(matcher.start(),matcher.end());
}
Please note, there are much more possibilities regarding Matcher then what I displayed here...
//String functions
boolean didItMatch = myString.matches(myPatternString); //same as Pattern.matches();
String allReplacedString = myString.replaceAll(myPatternString, replacement)
String firstReplacedString = myString.replaceFirst(myPatternString, replacement)
String[] splitParts = myString.split(myPatternString, howManyPartsAtMost);
Also, I'd highly recommend using online regex checkers such as Regexplanet (Java) or refiddle (this doesn't have Java specific checker), they make your life a lot easier!
The "or" operator is spelled |, for example one|other.
All the operators are listed in the documentation.
You can separate with a pipe thus:
Pattern.compile("regexp1|regexp2");
See here for a couple of simple examples.
Use the | character for OR
Pattern pat = Pattern.compile("exp1|exp2");
Matcher mat = pat.matcher("Input_data");
The answers are already given, use the pipe '|' operator. In addition to that, it might be useful to test your regexp in a regexp tester without having to run your application, for example:
http://www.regexplanet.com/advanced/java/index.html

Java regex return after first match

how do i return after the first match of regular expression? (does the Matcher.find() method do that? )
say I have a string "abcdefgeee". I want to ask the regex engine stop finding immediately after it finds the first match of "e" for example. I am writing a method to return true/false if the pattern is found and i don't want to find the whole string for "e". (I am looking for a regex solution )
Another question, sometimes when i use matches() , it doesn't return correctly. For example, if i compile my pattern like "[a-z]". and then use matches(), it doesn't match. But when I compile the pattern as ".*[a-z].*", it matches.... is that the behaviour of the matches() method of Matcher class?
Edit, here's actually what i want to do. For example I want to search for a $ sign AND a # sign in a string. So i would define 2 compiled patterns (since i can't find any logical AND for regex as I know the basics).
pattern1 = Pattern.compiled("$");
pattern2 = Pattern.compiled("#");
then i would just use
if ( match1.find() && match2.find() ){
return true;
}
in my method.
I only want the matchers to search the string for first occurrence and return.
thanks
For your second question, matches does work correctly, you example uses two different regular expressions.
.*[a-z].* will match a String that has at least one character. [a-z] will only match a one character String that is lower case a-z. I think you might mean to use something like [a-z]+
Another question, sometimes when i use matches() , it doesn't return correctly. For example, if i compile my pattern like "[a-z]". and then use matches(), it doesn't match. But when I compile the pattern as ".[a-z].", it matches.... is that the behaviour of the matches() method of Matcher class?
Yes, matches(...) tests the entire target string.
... here's actually what i want to do. For example I want to search for a $ sign AND a # sign in a string. So i would define 2 compiled patterns (since i can't find any logical AND for regex as I know the basics).
I know you said you wanted to use regex, but all your examples seems to suggest you have no need for them: those are all singe characters that can be handled with a couple of indexOf(...) calls.
Anyway, using regex, you could do it like this:
public static boolean containsAll(String text, String... patterns) {
for(String p : patterns) {
Matcher m = Pattern.compile(p).matcher(text);
if(!m.find()) return false;
}
return true;
}
But, again: indexOf(...) would do the trick as well:
public static boolean containsAll(String text, String... subStrings) {
for(String s : subStrings) {
if(text.indexOf(s) < 0) return false;
}
return true;
}

How can I perform a partial match with java.util.regex.*?

I have been using the java.util.regex.* classes for Regular Expression in Java and all good so far. But today I have a different requirement. For example consider the pattern to be "aabb". Now if the input String is aa it will definitely not match, however there is still possibility that if I append bb it becomes aabb and it matches. However if I would have started with cc, no matter what I append it will never match.
I have explored the Pattern and Matcher class but didn't find any way of achieving this.
The input will come from user and system have to wait till pattern matches or it will never match irrespective of any input further.
Any clue?
Thanks.
You should have looked more closely at the Matcher API; the hitEnd() method works exactly as you described:
import java.util.regex.*;
public class Test
{
public static void main(String[] args) throws Exception
{
String[] ss = { "aabb", "aa", "cc", "aac" };
Pattern p = Pattern.compile("aabb");
Matcher m = p.matcher("");
for (String s : ss) {
m.reset(s);
if (m.matches()) {
System.out.printf("%-4s : match%n", s);
}
else if (m.hitEnd()) {
System.out.printf("%-4s : partial match%n", s);
}
else {
System.out.printf("%-4s : no match%n", s);
}
}
}
}
output:
aabb : match
aa : partial match
cc : no match
aac : no match
As far as I know, Java is the only language that exposes this functionality. There's also the requireEnd() method, which tells you if more input could turn a match into a non-match, but I don't think it's relevant in your case.
Both methods were added to support the Scanner class, so it can apply regexes to a stream without requiring the whole stream to be read into memory.
Pattern p = Pattern.compile(expr);
Matcher m = p.matcher(string);
m.find();
So you want to know not whether a String s matches the regex, but whether there might be a longer String starting with s that would match? Sorry, Regexes can't help you there because you get no access to the internal state of the matcher; you only get the boolean result and any groups you have defined, so you never know why a match failed.
If you're willing to hack the JDK libraries, you can extend (or probably fork) java.util.regex and give out more information about the matching process. If the match failed because the input was 'used up' the answer would be true; if it failed because of character discrimination or other checks it would be false. That seems like a lot of work though, because your problem is completely the opposite of what regexes are supposed to do.
Another option: maybe you can simply redefine the task so that you can treat the input as the regexp and match aabb against *aa.**? You have to be careful about regex metacharacters, though.
For the example you give you could try to use an anti-pattern to disqualify invalid results. For example "^[^a]" would tell you you're input "c..." can't match your example pattern of "aabb".
Depending on your pattern you may be able to break it up into smaller patterns to check and use multiple matchers and then set their bounds as one match occurs and you move to the next. This approach may work but if you're pattern is complex and can have variable length sub-parts you may end up reimplementing part of the matcher in your own code to adjust the possible bounds of the match to make it more or less greedy. A pseudo-code general idea of this would be:
boolean match(String input, Matcher[] subpatterns, int matchStart, int matchEnd){
matcher = next matcher in list;
int stop = matchend;
while(true){
if matcher.matches input from matchstart -> matchend{
if match(input, subpatterns, end of current match, end of string){
return true;
}else{
//make this match less greedy
stop--;
}
}else{
//no match
return false;
}
}
}
You could then merge this idea with the anti-patterns, and have anti-subpatterns and after each subpattern match you check the next anti-pattern, if it matches you know you have failed, otherwise continue the matching pattern. You would likely want to return something like an enum instead of a boolean (i.e. ALL_MATCHED, PARTIAL_MATCH, ANTI_PATTERN_MATCH, ...)
Again depending on the complexity of your actual pattern that you are trying to match writing the appropriate sub patterns / anti-pattern may be difficult if not impossible.
One way to do this is to parse your regex into a sequence of sub-regexes, and then reassemble them in a way that allows you to do partial matches; e.g. "abc" has 3 sub-regexes "a", "b" and "c" which you can then reassemble as "a(b*(c)?)?".
Things get more complicated when the input regex contains alternation and groups, but the same general approach should work.
The problem with this approach is that the resulting regex is more complicated, and could potentially lead to excessive backtracking for complex input regexes.
If you make each character of the regex optional and relax the multiplicity constraints, you kinda get what you want. Example if you have a matching pattern "aa(abc)+bbbb", you can have a 'possible match' pattern 'a?a?(a?b?c?)*b?b?b?b?'.
This mechanical way of producing possible-match pattern does not cover advanced constructs like forward and backward refs though.
You might be able to accomplish this with a state machine (http://en.wikipedia.org/wiki/State_machine). Have your states/transitions represent valid input and one error state. You can then feed the state machine one character (possibly substring depending on your data) at a time. At any point you can check if your state machine is in the error state. If it is not in the error state then you know that future input may still match. If it is in the error state then you know something previously failed and any future input will not make the string valid.

Categories