Java regex: check if word has non alphanumeric characters - java

This is my code to determine if a word contains any non-alphanumeric characters:
String term = "Hello-World";
boolean found = false;
Pattern p = Pattern.Compile("\\W*");
Matcher m = p.Matcher(term);
if(matcher.find())
found = true;
I am wondering if the regex expression is wrong. I know "\W" would matches any non-word characters. Any idea on what I am missing ??

Change your regex to:
.*\\W+.*

This is the expresion you are looking for:
"^[a-zA-Z0-9]+$"
When it evaluates to false that means does not match so that mean you found what you wanted.

It's 2016 or later and you should think about international strings from other alphabets than just Latin. The frequently cited [^a-zA-Z] will not match in that case. There are better ways in Java now:
[^\\p{IsAlphabetic}^\\p{IsDigit}]
See the reference (section "Classes for Unicode scripts, blocks, categories and binary properties"). There's also this answer that I found helpful.

Methods are in the wrong case.
The matcher was declared as m but used as matcher.
The repetition should be "one or many" + instead of "zero or many " *
This works correctly:
String term = "Hello-World";
boolean found = false;
Pattern p = Pattern.compile("\\W+");//<-- compile( not Compile(
Matcher m = p.matcher(term); //<-- matcher( not Matcher
if(m.find()) { //<-- m not matcher
found = true;
}
Btw, it would be enough if you just :
boolean found = m.find();
:)

The problem is the '*'. '*' matches ZERO or more characters. You want to match at least one non word character, so you must use '+' as the quantity modifier. Hence match \W+ (Capital W there for NON word)

Your expression does not take account of possible non-English letters. It's also more complicated than it needs to be. Unless you are using regexs for some reason other than need (such as your professor having told you to) you are much better off with:
boolean found = false;
for (int i=0;i<mystring.length();++i) {
if (!Character.isLetterOrDigit(mystring.charAt(i))) {
found=true;
break;
}
}

When I had to do this same thing the regex I use is "(\w)*" Thats what I use. Not sure if capitol w is the same but I also used parenthesis.

If you are okay to use Apache StringUtils, then it's as simple as following
StringUtils.isAlphanumeric(inp)

if (value.matches(".*[^a-zA-Z0-9].*")) { // tested, seems to work.
System.out.println("match");
} else {
System.out.println("no match");
}

Related

Check only string and only digits with regex in Java [duplicate]

I have this small piece of code
String[] words = {"{apf","hum_","dkoe","12f"};
for(String s:words)
{
if(s.matches("[a-z]"))
{
System.out.println(s);
}
}
Supposed to print
dkoe
but it prints nothing!!
Welcome to Java's misnamed .matches() method... It tries and matches ALL the input. Unfortunately, other languages have followed suit :(
If you want to see if the regex matches an input text, use a Pattern, a Matcher and the .find() method of the matcher:
Pattern p = Pattern.compile("[a-z]");
Matcher m = p.matcher(inputstring);
if (m.find())
// match
If what you want is indeed to see if an input only has lowercase letters, you can use .matches(), but you need to match one or more characters: append a + to your character class, as in [a-z]+. Or use ^[a-z]+$ and .find().
[a-z] matches a single char between a and z. So, if your string was just "d", for example, then it would have matched and been printed out.
You need to change your regex to [a-z]+ to match one or more chars.
String.matches returns whether the whole string matches the regex, not just any substring.
java's implementation of regexes try to match the whole string
that's different from perl regexes, which try to find a matching part
if you want to find a string with nothing but lower case characters, use the pattern [a-z]+
if you want to find a string containing at least one lower case character, use the pattern .*[a-z].*
Used
String[] words = {"{apf","hum_","dkoe","12f"};
for(String s:words)
{
if(s.matches("[a-z]+"))
{
System.out.println(s);
}
}
I have faced the same problem once:
Pattern ptr = Pattern.compile("^[a-zA-Z][\\']?[a-zA-Z\\s]+$");
The above failed!
Pattern ptr = Pattern.compile("(^[a-zA-Z][\\']?[a-zA-Z\\s]+$)");
The above worked with pattern within ( and ).
Your regular expression [a-z] doesn't match dkoe since it only matches Strings of lenght 1. Use something like [a-z]+.
you must put at least a capture () in the pattern to match, and correct pattern like this:
String[] words = {"{apf","hum_","dkoe","12f"};
for(String s:words)
{
if(s.matches("(^[a-z]+$)"))
{
System.out.println(s);
}
}
You can make your pattern case insensitive by doing:
Pattern p = Pattern.compile("[a-z]+", Pattern.CASE_INSENSITIVE);

How to check if there is at least: two letters, one number and one special character?

How to check if there is at least: two letters, one number and one special character in java? Here is my code but I dont know if I'm in the right direction.
public static boolean validarCodigo(String codigo){
//return Pattern.compile("[abc]").matcher("ba").find();
boolean ContainsAtLeastTwoLetters = Pattern.compile("[0-9]").matcher(codigo).find();
boolean tieneAlmenosDosLetras = Pattern.compile("('/[a-zA- Z]/')").matcher(codigo).find();
boolean ContainsAtLeastOneSpecialChar; = Pattern.compile ("'/[^a-zA-Z\\d]/'").matcher(codigo).find();
return ContainsAtLeastOneDigit && ContainsAtLeastTwoLetters && ContainsAtLeastOneSpecialChar;
Your regex seems a bit off but you've done a good job. One main thing is that you're only checking for one number and letter. To solve this, try the following regexes:
boolean containsAtLeastTwoLetters = Pattern.compile("[0-9].*[0-9]").matcher(codigo).find();
boolean tieneAlmenosDosLetras = Pattern.compile("[a-zA-Z].*[a-zA-Z").matcher(codigo).find();
boolean containsAtLeastOneSpecialChar = Pattern.compile ("[^a-zA-Z\\d]").matcher(codigo).find();
I took the freedom to reformat the variables to java standard practice (likeThis instead of LikeThis).
public static boolean validarCodigo(String codigo){
Pattern letter = Pattern.compile("[a-zA-z]");
Pattern digit = Pattern.compile("[0-9]");
// add or remove whatever special characters are permissible in your case
Pattern special = Pattern.compile ("[!##$%&*()_+=|<>?{}\\[\\]~-]");
Matcher hasLetter = letter.matcher(codigo);
Matcher hasDigit = digit.matcher(codigo);
Matcher hasSpecial = special.matcher(codigo);
return hasLetter.find() && hasDigit.find() && hasSpecial.find();
}
You can use look aheads to do it in one line:
boolean hasAllThree = codigo.matches("^(?=.*[^a-zA-Z\\d])(?=(.*\\d){2})(?=(.*[a-zA-Z]){2}).*");
From your comment it has to be:
at least one digit,
two letters
and a special character`
So your entire String need to exactly match each of these regexes
^.*[0-9].*$
^.*[a-zA-Z].*[a-zA-Z].*$
^.*[^a-zA-Z\\d].*$
Here is some fancy way :)
Using look-ahead mechanism (?=...) you can create one regex that will check if your string matches all of these conditions. It can look like:
^(?=.*[0-9])(?=.*[a-zA-Z].*[a-zA-Z])(?=.*[^a-zA-Z\\d]).*$
Now to check if your string is exactly the same as described in this regex use matches method. BTW method will also add ^ at start and $ at end to your regex so you don't have to write them.
Your testing code can look like this
codigo.matches("^(?=.*[0-9])(?=.*[a-zA-Z].*[a-zA-Z])(?=.*[^a-zA-Z\\d]).*$")

Java Regex to Validate Full Name allow only Spaces and Letters

I want regex to validate for only letters and spaces. Basically this is to validate full name. Ex: Mr Steve Collins or Steve Collins I tried this regex. "[a-zA-Z]+\.?" But didnt work. Can someone assist me please
p.s. I use Java.
public static boolean validateLetters(String txt) {
String regx = "[a-zA-Z]+\\.?";
Pattern pattern = Pattern.compile(regx,Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher(txt);
return matcher.find();
}
What about:
Peter Müller
François Hollande
Patrick O'Brian
Silvana Koch-Mehrin
Validating names is a difficult issue, because valid names are not only consisting of the letters A-Z.
At least you should use the Unicode property for letters and add more special characters. A first approach could be e.g.:
String regx = "^[\\p{L} .'-]+$";
\\p{L} is a Unicode Character Property that matches any kind of letter from any language
try this regex (allowing Alphabets, Dots, Spaces):
"^[A-Za-z\s]{1,}[\.]{0,1}[A-Za-z\s]{0,}$" //regular
"^\pL+[\pL\pZ\pP]{0,}$" //unicode
This will also ensure DOT never comes at the start of the name.
For those who use java/android and struggle with this matter try:
"^\\p{L}+[\\p{L}\\p{Z}\\p{P}]{0,}"
This works with names like
José Brasão
You could even try this expression ^[a-zA-Z\\s]*$ for checking a string with only letters and spaces (nothing else).
For me it worked. Hope it works for you as well.
Or go through this piece of code once:
CharSequence inputStr = expression;
Pattern pattern = Pattern.compile(new String ("^[a-zA-Z\\s]*$"));
Matcher matcher = pattern.matcher(inputStr);
if(matcher.matches())
{
//if pattern matches
}
else
{
//if pattern does not matches
}
please try this regex (allow only Alphabets and space)
"[a-zA-Z][a-zA-Z ]*"
if you want it for IOS then,
NSString *yourstring = #"hello";
NSString *Regex = #"[a-zA-Z][a-zA-Z ]*";
NSPredicate *TestResult = [NSPredicate predicateWithFormat:#"SELF MATCHES %#",Regex];
if ([TestResult evaluateWithObject:yourstring] == true)
{
// validation passed
}
else
{
// invalid name
}
Regex pattern for matching only alphabets and white spaces:
String regexUserName = "^[A-Za-z\\s]+$";
Accept only character with space :-
if (!(Pattern.matches("^[\\p{L} .'-]+$", name.getText()))) {
JOptionPane.showMessageDialog(null, "Please enter a valid character", "Error", JOptionPane.ERROR_MESSAGE);
name.setFocusable(true);
}
My personal choice is:
^\p{L}+[\p{L}\p{Pd}\p{Zs}']*\p{L}+$|^\p{L}+$, Where:
^\p{L}+ - It should start with 1 or more letters.
[\p{Pd}\p{Zs}'\p{L}]* - It can have letters, space character (including invisible), dash or hyphen characters and ' in any order 0 or more times.
\p{L}+$ - It should finish with 1 or more letters.
|^\p{L}+$ - Or it just should contain 1 or more letters (It is done to support single letter names).
Support for dots (full stops) was dropped, as in British English it can be dropped in Mr or Mrs, for example.
To validate for only letters and spaces, try this
String name1_exp = "^[a-zA-Z]+[\-'\s]?[a-zA-Z ]+$";
Validates such values as:
"", "FIR", "FIR ", "FIR LAST"
/^[A-z]*$|^[A-z]+\s[A-z]*$/
check this out.
String name validation only accept alphabets and spaces
public static boolean validateLetters(String txt) {
String regx = "^[a-zA-Z\\s]+$";
Pattern pattern = Pattern.compile(regx,Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher(txt);
return matcher.find();
}
To support language like Hindi which can contain /p{Mark} as well in between language characters.
My solution is ^[\p{L}\p{M}]+([\p{L}\p{Pd}\p{Zs}'.]*[\p{L}\p{M}])+$|^[\p{L}\p{M}]+$
You can find all the test cases for this here
https://regex101.com/r/3XPOea/1/tests
#amal. This code will match your requirement. Only letter and space in between will be allow, no number. The text begin with any letter and could have space in between only. "^" denotes the beginning of the line and "$" denotes end of the line.
public static boolean validateLetters(String txt) {
String regx = "^[a-zA-Z ]+$";
Pattern pattern = Pattern.compile(regx,Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher(txt);
return matcher.find();
}
Try with this:
public static boolean userNameValidation(String name){
return name.matches("(?i)(^[a-z])((?![? .,'-]$)[ .]?[a-z]){3,24}$");
}
For Java, you can use below for Name validation which uses Alpha (Letters) + Spaces (Blanks or tabs)
"[^\\\p{Alpha}\\\p{Blank}]"
Can get a reference from Wikipedia for ASCII values also.

How to check if only chosen characters are in a string?

What's the best and easiest way to check if a string only contains the following characters:
abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789_
I want like an example like this pseudo-code:
//If String contains other characters
else
//if string contains only those letters
Please and thanks :)
if (string.matches("^[a-zA-Z0-9_]+$")) {
// contains only listed chars
} else {
// contains other chars
}
For that particular class of String use the regular expression "\w+".
Pattern p = Pattern.compile("\\w+");
Matcher m = Pattern.matcher(str);
if(m.matches()) {}
else {};
Note that I use the Pattern object to compile the regex once so that it never has to be compiled again which may be nice if you are doing this check in a-lot or in a loop. As per the java docs...
If a pattern is to be used multiple
times, compiling it once and reusing
it will be more efficient than
invoking this method each time.
My turn:
static final Pattern bad = Pattern.compile("\\W|^$");
//...
if (bad.matcher(suspect).find()) {
// String contains other characters
} else {
// string contains only those letters
}
Above searches for single not matching or empty string.
And according to JavaDoc for Pattern:
\w A word character: [a-zA-Z_0-9]
\W A non-word character: [^\w]

How do I know if a regexp has more than one possible match?

I am writing Java code that has to distinguish regular expressions with more than one possible match from regular expressions that have only one possible match.
For example:
"abc." can have several matches ("abc1", abcf", ...),
while "abcd" can only match "abcd".
Right now my best idea was to look for all unescaped regexp special characters.
I am convinced that there is a better way to do it in Java. Ideas?
(Late addition):
To make things clearer - there is NO specific input to test against. A good solution for this problem will have to test the regex itself.
In other words, I need a method who'se signature may look something like this:
boolean isSingleResult(String regex)
This method should return true if only for one possible String s1. The expression s1.matches(regex) will return true. (See examples above.)
This sounds dirty, but it might be worth having a look at the Pattern class in the Java source code.
Taking a quick peek, it seems like it 'normalize()'s the given regex (Line 1441), which could turn the expression into something a little more predictable. I think reflection can be used to tap into some private resources of the class (use caution!). It could be possible that while tokenizing the regex pattern, there are specific indications if it has reached some kind "multi-matching" element in the pattern.
Update
After having a closer look, there is some data within package scope that you can use to leverage the work of the Pattern tokenizer to walk through the nodes of the regex and check for multiple-character nodes.
After compiling the regular expression, iterate through the compiled "Node"s starting at Pattern.root. Starting at line 3034 of the class, there are the generalized types of nodes. For example class Pattern.All is multi-matching, while Pattern.SingleI or Pattern.SliceI are single-matching, and so on.
All these token classes appear to be in package scope, so it should be possible to do this without using reflection, but instead creating a java.util.regex.PatternHelper class to do the work.
Hope this helps.
If it can only have one possible match it isn't reeeeeally an expression, now, is it? I suspect your best option is to use a different tool altogether, because this does not at all sound like a job for regular expressions, but if you insist, well, no, I'd say your best option is to look for unescaped special characters.
The only regular expression that can ONLY match one input string is one that specifies the string exactly. So you need to match expressions with no wildcard characters or character groups AND that specify a start "^" and end "$" anchor.
"the quick" matches:
"the quick brownfox"
"the quick brown dog"
"catch the quick brown fox"
"^the quick brown fox$" matches ONLY:
"the quick brown fox"
Now I understand what you mean. I live in Belgium...
So this is something what work on most expressions. I wrote this by myself. So maybe I forgot some rules.
public static final boolean isSingleResult(String regexp) {
// Check the exceptions on the exceptions.
String[] exconexc = "\\d \\D \\w \\W \\s \\S".split(" ");
for (String s : exconexc) {
int index = regexp.indexOf(s);
if (index != -1) // Forbidden char found
{
return false;
}
}
// Then remove all exceptions:
String regex = regexp.replaceAll("\\\\.", "");
// Now, all the strings how can mean more than one match
String[] mtom = "+ . ? | * { [:alnum:] [:word:] [:alpha:] [:blank:] [:cntrl:] [:digit:] [:graph:] [:lower:] [:print:] [:punct:] [:space:] [:upper:] [:xdigit:]".split(" ");
// iterate all mtom-Strings
for (String s : mtom) {
int index = regex.indexOf(s);
if (index != -1) // Forbidden char found
{
return false;
}
}
return true;
}
Martijn
I see that the only way is to check if regexp matches multiple times for particular input.
package com;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class AAA {
public static void main(String[] args) throws Exception {
String input = "123 321 443 52134 432";
Pattern pattern = Pattern.compile("\\d+");
Matcher matcher = pattern.matcher(input);
int i = 0;
while (matcher.find()) {
++i;
}
System.out.printf("Matched %d times%n", i);
}
}

Categories