Java Regex match digital, character, comma, and quote... - java

I'm new to regular expressions...
I have a problem about the regular expression that will match a string only contains:
0-9, a-z, A-Z, space, comma, and single quote?
If the string contain any char that doesn't belong the above expression, it is invalid.
Is that something like:
Pattern p = Pattern.compile("\\s[a-zA-Z0-9,']");
Matcher m = p.matcher("to be or not");
boolean b = m.lookingAt();
Thank you!

Fix your expression adding bounds:
Pattern p = Pattern.compile("^\\s[a-zA-Z0-9,']+$");
Now your can say m.find() and be sure that this returns true only if your string contains the enumerated symbols only.
BTW is it mistake that you put \\s in the beginning? This means that the string must start from single white space. If this is not the requirement just remove this.

You need to include the space inside the character class and allow more than one character:
Pattern p = Pattern.compile("[\\sa-zA-Z0-9,']*");
Matcher m = p.matcher("to be or not");
boolean b = m.matches();

Note that \s will match any whitespace character (including newlines, tabs, carriage returns, etc.) and not only the space character.
You probably want something like this:
"^[a-zA-Z0-9,' ]+$"

Related

Pattern in java regEx does not match [duplicate]

I have this small piece of code
String[] words = {"{apf","hum_","dkoe","12f"};
for(String s:words)
{
if(s.matches("[a-z]"))
{
System.out.println(s);
}
}
Supposed to print
dkoe
but it prints nothing!!
Welcome to Java's misnamed .matches() method... It tries and matches ALL the input. Unfortunately, other languages have followed suit :(
If you want to see if the regex matches an input text, use a Pattern, a Matcher and the .find() method of the matcher:
Pattern p = Pattern.compile("[a-z]");
Matcher m = p.matcher(inputstring);
if (m.find())
// match
If what you want is indeed to see if an input only has lowercase letters, you can use .matches(), but you need to match one or more characters: append a + to your character class, as in [a-z]+. Or use ^[a-z]+$ and .find().
[a-z] matches a single char between a and z. So, if your string was just "d", for example, then it would have matched and been printed out.
You need to change your regex to [a-z]+ to match one or more chars.
String.matches returns whether the whole string matches the regex, not just any substring.
java's implementation of regexes try to match the whole string
that's different from perl regexes, which try to find a matching part
if you want to find a string with nothing but lower case characters, use the pattern [a-z]+
if you want to find a string containing at least one lower case character, use the pattern .*[a-z].*
Used
String[] words = {"{apf","hum_","dkoe","12f"};
for(String s:words)
{
if(s.matches("[a-z]+"))
{
System.out.println(s);
}
}
I have faced the same problem once:
Pattern ptr = Pattern.compile("^[a-zA-Z][\\']?[a-zA-Z\\s]+$");
The above failed!
Pattern ptr = Pattern.compile("(^[a-zA-Z][\\']?[a-zA-Z\\s]+$)");
The above worked with pattern within ( and ).
Your regular expression [a-z] doesn't match dkoe since it only matches Strings of lenght 1. Use something like [a-z]+.
you must put at least a capture () in the pattern to match, and correct pattern like this:
String[] words = {"{apf","hum_","dkoe","12f"};
for(String s:words)
{
if(s.matches("(^[a-z]+$)"))
{
System.out.println(s);
}
}
You can make your pattern case insensitive by doing:
Pattern p = Pattern.compile("[a-z]+", Pattern.CASE_INSENSITIVE);

“minus-sign” into this regular expression. How?

Consider:
String str = "XYhaku(ABH1235-123548)";
From the above string, I need only "ABH1235-123548" and so far I created a regular expression:
Pattern.compile("ABH\\d+")
But it returns false. So what the correct regular expression for it?
I would just grab whatever is in the parenthesis:
Pattern p = Pattern.compile("\\((?<data>[A-Z\\d]+\\-\\d+)\\)");
Or, if you want to be even more open (any parenthesis):
Pattern p = Pattern.compile("\\((?<data>.+\\)\\)");
Then just nab it:
String s = /* some input */;
Matcher m = p.matcher(s);
if (m.find()) { //just find first
String tag = m.group("data"); //ABH1235-123548
}
\d only matches digits. To include other characters, use a character class:
Pattern.compile("ABH[\\d-]+")
Note that the - must be placed first or last in the character class, because otherwise it will be treated as a range indicator ([A-Z] matching every letter between A and Z, for example). Another way to avoid that would be to escape it, but that adds two more backslashes to your string...

Check only string and only digits with regex in Java [duplicate]

I have this small piece of code
String[] words = {"{apf","hum_","dkoe","12f"};
for(String s:words)
{
if(s.matches("[a-z]"))
{
System.out.println(s);
}
}
Supposed to print
dkoe
but it prints nothing!!
Welcome to Java's misnamed .matches() method... It tries and matches ALL the input. Unfortunately, other languages have followed suit :(
If you want to see if the regex matches an input text, use a Pattern, a Matcher and the .find() method of the matcher:
Pattern p = Pattern.compile("[a-z]");
Matcher m = p.matcher(inputstring);
if (m.find())
// match
If what you want is indeed to see if an input only has lowercase letters, you can use .matches(), but you need to match one or more characters: append a + to your character class, as in [a-z]+. Or use ^[a-z]+$ and .find().
[a-z] matches a single char between a and z. So, if your string was just "d", for example, then it would have matched and been printed out.
You need to change your regex to [a-z]+ to match one or more chars.
String.matches returns whether the whole string matches the regex, not just any substring.
java's implementation of regexes try to match the whole string
that's different from perl regexes, which try to find a matching part
if you want to find a string with nothing but lower case characters, use the pattern [a-z]+
if you want to find a string containing at least one lower case character, use the pattern .*[a-z].*
Used
String[] words = {"{apf","hum_","dkoe","12f"};
for(String s:words)
{
if(s.matches("[a-z]+"))
{
System.out.println(s);
}
}
I have faced the same problem once:
Pattern ptr = Pattern.compile("^[a-zA-Z][\\']?[a-zA-Z\\s]+$");
The above failed!
Pattern ptr = Pattern.compile("(^[a-zA-Z][\\']?[a-zA-Z\\s]+$)");
The above worked with pattern within ( and ).
Your regular expression [a-z] doesn't match dkoe since it only matches Strings of lenght 1. Use something like [a-z]+.
you must put at least a capture () in the pattern to match, and correct pattern like this:
String[] words = {"{apf","hum_","dkoe","12f"};
for(String s:words)
{
if(s.matches("(^[a-z]+$)"))
{
System.out.println(s);
}
}
You can make your pattern case insensitive by doing:
Pattern p = Pattern.compile("[a-z]+", Pattern.CASE_INSENSITIVE);

Investigate a string in java whether it is include some special signs?

I have the following java mehod and have some conditions for the parameter searchPattern:
public boolean checkPatternMatching(String sourceToScan, String searchPattern) {
boolean patternFounded;
if (sourceToScan == null) {
patternFounded = false;
} else {
Pattern pattern = Pattern.compile(Pattern.quote(searchPattern),
Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher(sourceToScan);
patternFounded = matcher.find();
}
return patternFounded;
}
I want to search for all letter (uppercase and lowercase must be considered) and only (!) the special signs "-", ":" and "=". All other values must be occured a "false" from this method.
How can i implemented this logic for the parameter "searchPattern"?
Try searchPattern = "[a-zA-Z:=-]"
Try this pattern [a-zA-Z=,_!:]
String pattern ="[a-zA-Z=,_!:]";
String input="hello_:,!=";
if(input.matches(pattern)){
System.out.println("true");
}else{
System.out.println("false");
}
"[[a-zA-Z]!-=:\\s]+"
The square bracket mean a character class in which each character in which it will match all character within the brackets. The + means one or more characters in the character class, and the \\s is for spaces.
So if you want just letter an spaces, as per your comment in the original post
"[[a-zA-z]\\s]+"
Use searchPattern as ([a-zA-Z]!-:=)+
searchPattern = "^[A-Za-z!=:-]+$"
^ means "begins with"
$ means "ends with"
[A-Za-z!=:-] is a character class that contains any letter or the symbols !, =, :, -
+ means "1 or more` of the preceding
This will work if the string will solely contain those symbols, ie no spaces or anything else.
If you want a string that contains the given symbols and may also contain whitespace, use:
searchPattern = "^[A-Za-z!=:-\\s]+$"
\\s stands for white-space character
Finally, if you want to simply see if a string contains any one of these symbols, you can use:
searchPattern = "[A-Za-z!=:-]"

Perl5Matcher.matches(input, pattern) is returning true for input containing semicolon even when semicolon is not in pattern

I have a string MyString = "AP;"; or any other number of strings containing ;
When I attempt to validate that MyString matches a pattern
eg. MyPattern = "^[a-zA-Z0-9 ()+-_.]*$";
Which I believe should allow AlphaNumerics, and the characters ()+-_.]* but not ;
However the below statement is returning True!
Pattern sepMatchPattern = sepMatchCompiler.compile("^[a-zA-Z0-9 ()+-_.]*$");
Perl5Matcher matcher = new Perl5Matcher();
if (matcher.matches("AP;", sepMatchPattern)) {
return true;
} else {
return false;
}
Can anyone explain why the semicolon keeps getting allowed through?
The problem lies in the regular expression that you have defined - ^[a-zA-Z0-9 ()+-_.]*$. Within this regular expression is a character class of alpha (upper and lower), numeric, space, parentheses, and some punctuation. One of the punctuation characters is a period. The period is not escaped, and thus it has its original meaning of any character (including a semi colon).
This regex will match any string - it is essentially ^.*$.
To fix this, escape the period.
Pattern sepMatchPattern = sepMatchCompiler.compile("^[a-zA-Z0-9 ()+-_\\.]*$");
Edit:
It turns out that there is another item that I missed in there that has special meaning. The hyphen in the character class of "+-_" does not mean "plus, hyphen, or underscore". Rather, it means all the characters from 0x2B to 0x5F (inclusive). A quick test shows that ^[+-_]*$ also matches AP; because A and P are 0x41 and 0x50 and the notorious semicolon is 0x3B - all within the range of 0x2B to 0x5F.
The correct regular expression is:
"^[a-zA-Z0-9 ()+\\-_\\.]*$"

Categories