Name validation with special conditions using regex - java

I want to validate the Name in Java that will allow following special characters for single time {,-.'}. I am able to achieve with the Expression that will allow user to enter only such special characters in a string. But I am not able to figure it out how to add restrictions where users cannot add these characters more then one time. I tried to achieve it using quantifiers but remain unsuccessful. I have done the following code yet!
Pattern validator = Pattern.compile("^[a-zA-Z+\\.+\\-+\\'+\\,]+$");

You can use lookahead assertion in your regex:
Pattern validator = Pattern.compile(
"^(?!(?:.*?\\.){2})(?!(?:.*?'){2})(?!(?:.*?,){2})(?!(?:.*?-){2})[a-zA-Z .',-]+$");
(?!(?:.*?[.',-]){2}) is a negative lookahead that means don't allow more than 1 of those characters in character class.
RegEx Demo

I think that you can just take into account names where such characters would only happen once. Names like "Jonathan's", "Thoms-Damm", "Thoms,Jon", "jonathan.thoms". In practice for names, I don't think that such special characters would occur at the edges of the string. As such, you can probably get away with a regex like:
Pattern validator = Pattern.compile("^[a-zA-Z]+(?:[-',\.][a-zA-Z]+)?$");
This regex should match a regular ASCII name followed optionally by a single "special" character with another name after it.

Related

java 8 regular expression for meta characters [duplicate]

This question already has answers here:
What special characters must be escaped in regular expressions?
(13 answers)
Closed 3 years ago.
Trying to write a regular expression to check if the sentence as metacharacters "I need to make payment of $50 for the purchase, should i use CASH|CC". In this sentence i need to identify if metacharacters are present.
\\\\$ or ^(\\\\$)\\$. What is the right syntax for Pattern.matches("^([\\\\$]$)", text); to identify the special characters. I don't need to replace just identify if the sentence contains these characters.
If you want to know whether a string contains meta characters, you can use some like this:
boolean hasIt = sentence.chars().anyMatch(c -> "\\.[]{}()*+?^$|".indexOf(c) >= 0);
By not using the Regex engine, you don’t need to quote the characters which have a special meaning to it.
Using Pattern.matches creates three unnecessary obstacles to the task. First, you have to quote all characters correctly, then, you need a regex construct to turn the characters into alternatives, e.g. [abc] or a|b|c, third, matches checks whether the entire string matches the pattern, rather than contains an occurrences, so you’d need something like .*pattern.* to make matches to behave like find, if you insist on it.
Which leads to the xy-problem of this task. It’s not clear which metacharacters you actually want to check and why you need this information in the first place.
If you want to search for this sentence within another text, just use Pattern.compile(sentence, Pattern.LITERAL) to disable interpretation of meta characters. Or Pattern.quote(sentence) when you want to assemble a pattern containing the sentence.
But if you don’t want to search for it, this information has no relevance. Note that “Is this a meta character?” may lead to a different answer than “Does it need quoting?”. Even this tutorial combines these questions in a misleading way. At two close places it names the metacharacters and describes the quoting syntax, leading to the wrong impression that all of these characters need quoting.
For example, - only has a special meaning within a character class, so if there is no character class, which you detect by the presence of [, the - does not imply the presence of metacharacters. But while - truly needs quoting within the character class, the characters = and ! are metacharacters only in a certain context, which requires a metacharacter, so they never require quoting.
But if you are trying to check for a metacharacter to decide whether to use the Regex engine or to perform a plain text search, e.g. via String.indexOf, you are performing premature optimization. This is not only a waste of development effort, optimizing before you even have an actual code you could measure often leads to the opposite result. Performing a pattern matching using the Regex engine with a string containing no metacharacters can lead to a more efficient search than a plain indexOf on the String. In the reference implementation, the Regex engine uses the Boyer Moore algorithm while the plaintext search methods on String use a naive search.
Edit: As mentioned by commenters Andreas and Holger, the meta characters used by regular expressions are sometimes depending on a syntactical subdefinition, like character classes, specific sequences (lookahead, lookbehind,...) and are therefore not intrinsically metacaracters per se. Some are only meta characters in a specific context. However the answer provided here will include all possible meta characters, with the exception of the operators that only become meta characters when prefixed by \. However, this means, that sometimes characters will be matched, in locations where they are not actually meta characters.
This question has half the answer: List of all special characters that need to be escaped in a regex
You can look at the javadoc of the Pattern class: http://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html
The Java regular expression system exposes no character class for it's own special characters (regrettably).
Special constructs (named-capturing and non-capturing)
(?X) X, as a named-capturing group
(?:X) X, as a non-capturing group
(?idmsuxU-idmsuxU) Nothing, but turns match flags i d m s u x U on - off
(?idmsux-idmsux:X) X, as a non-capturing group with the given flags i d m s u x on - off
(?=X) X, via zero-width positive lookahead
(?!X) X, via zero-width negative lookahead
This block alone contains a lot (though not all) of the meta characters. The last two rows of the citation I had ot leave out, because the character sequences confused the parser of this page.
I would suggest the following:
public static final Pattern META_CHARS = Pattern.compile("[\\\\\\]\\[(){}\\-!$?*+<>\\:\\.\\=\\,\\|^]");
But be aware, that this list might very well be incomplete, and that this contains typical characters such as , and . which are part of the regex syntax. So you probably got a lot of escaping to do...
From there you can:
Matcher metaDetector = META_CHARS.matcher(stringToTest);
if (metaDetector.find()) {
// this is the found meta character...
String metaCharacter = metaDetector.group(0);
System.out.print(metaCharacter);
}
And if you want to find all meta characters, then make a while out of if in the above code snippet. If you do, for the line "I need to make \\payment{[ of $50 for !!the purc\"hase, sh###ould i use CASH|CC." you receive \{[$!!,|., which is correct, as # and " are not meta characters in regex.
As Andreas correctly mentions, the exact pattern can be reduced to "[\\\\\\]\\[(){}^$?*+.|]", because this will tell you, whether or not at least one meta character is present. However this might miss some meta characters, if multiple are present. If this is not important, then the shorter chain is sufficient.

Regular expression to return results that do not match selection

I work on a product that provides a Java API to extend it.
The API provides a function which
takes a Perl regular expression and
returns a list of matching files.
I want to filter the list to remove all files that end in .xml, .xsl and .cfg; basically the opposite of .*(\.xml|\.xsl|\.cfg).
I have been searching but I haven't been able to get anything to work yet.
I tried .*(?!\.cfg) and ^((?!cfg).)*$ and \.(?!cfg$|?!xml$|?!xsl$).
I don't know if I am on the right track or not.
Note
I know the regex systems are similar, but I can't get a Java regex working either.
You may use
^(?!.*\.(x[ms]l|cfg)$).+
See the regex demo
Details:
^ - start of a string
(?!.*\.(x[ms]l|cfg)$) - a negative lookahead that fails the match if any 0+ chars other than line break chars (.*) are followed with xml, xsl or cfg ((x[ms]l|cfg)) at the end of the string ($)
.+ - any 1 or more chars other than linebreak chars. Might be omitted if the entire string match is not required (in some tools it is required though).
You need something like this, which matches only if the end of the string isn't preceded by a dot and one of the three unwanted types
/(?<!\.(?:xml|xsl|cfg))\z/

Validating a mathematical expression in java

I am trying to validate if the string "expression" as in the code below is a formula.
String expression = request.getParameter(FORMULA);
if(!Pattern.matches("[a-zA-Z0-9+-*/()]", expression)){return new AjaxMessage(AjaxMessage.ResponseStatusEnum.FAILURE, getJsonString(, "Manager.invalid.formula" , null));
}
examples of value for expression are {a+b/2, (a+b)*2,(john-Max),etc} just for the context (the variable names in the formula might vary and the arithmetic expression contains only [+-/()*] special characters. As you can see I tried to validate using regex (new to regex), but I think it's not possible as I don't know the length of the variable names.
Is there a way to achieve a validation using regex or any other library in java?
Thanks in advance.
The reason is you are using characters with special meaning in regex. You need to escape those characters. I have just modified yor regex to make it work.
Code:
List<String> expressions = new ArrayList<String>();
expressions.add("a+b/2");
expressions.add("(a+b)*2");
expressions.add("john-Max");
expressions.add("etc[");
for (String expression : expressions) {
if (!Pattern.matches("[a-zA-Z0-9\\+\\-\\*/\\(\\)]*", expression)) {
System.out.println("NOT match");
} else {
System.out.println("MATCH");
}
}
}
OUTPUT:
MATCH
MATCH
MATCH
NOT match
You're using special character in your regex, you need to escape them using \.
It should look like [a-zA-Z0-9+\\-*/()] . This only tests one character you need to add a * at the end to test multiple characters.
Edit (thanks Toto): because [] tests a single character, it's called a character class (not like a Java class actually), so only the -is considered special here. For a regex without the braces, you would neeed to escape the other special characters.
Special characters have special meaning using regex and won't be interpreted as the character they are (for example parenthesis are used to make groups, * means 0 or more of the previous character, etc.).
About character class: https://docs.oracle.com/javase/tutorial/essential/regex/char_classes.html
More info:
http://www.regular-expressions.info/characters.html and
http://www.regular-expressions.info/refcharacters.html
I use this site to test my regexes (note that regex engine may vary !):
https://regex101.com/
As said in comment, a mathematic expression is more than just different characters, so if you want to validate, you'll have to do more manual checking.

Add Dash to Java Regex

I am trying to modify an existing Regex expression being pulled in from a properties file from a Java program that someone else built.
The current Regex expression used to match an email address is -
RR.emailRegex=^[a-zA-Z0-9_\\.]+#[a-zA-Z0-9_]+\\.[a-zA-Z0-9_]+$
That matches email addresses such as abc.xyz#example.com, but now some email addresses have dashes in them such as abc-def.xyz#example.com and those are failing the Regex pattern match.
What would my new Regex expression be to add the dash to that regular expression match or is there a better way to represent that?
Basing on the regex you are using, you can add the dash into your character class:
RR.emailRegex=^[a-zA-Z0-9_\\.]+#[a-zA-Z0-9_]+\\.[a-zA-Z0-9_]+$
add
RR.emailRegex=^[a-zA-Z0-9_\\.-]+#[a-zA-Z0-9_-]+\\.[a-zA-Z0-9_-]+$
Btw, you can shorten your regex like this:
RR.emailRegex=^[\\w.-]+#[\\w-]+\\.[\\w-]+$
Anyway, I would use Apache EmailValidator instead like this:
if (EmailValidator.getInstance().isValid(email)) ....
Meaning of - inside a character class is different than used elsewhere. Inside character class - denotes range. e.g. 0-9. If you want to include -, write it in beginning or ending of character class like [-0-9] or [0-9-].
You also don't need to escape . inside character class because it is treated as . literally inside character class.
Your regex can be simplified further. \w denotes [A-Za-z0-9_]. So you can use
^[-\w.]+#[\w]+\.[\w]+$
In Java, this can be written as
^[-\\w.]+#[\\w]+\\.[\\w]+$
^[a-zA-Z0-9_\\.\\-]+#[a-zA-Z0-9_]+\\.[a-zA-Z0-9_]+$
Should solve your problem. In regex you need to escape anything that has meaning in the Regex engine (eg. -, ?, *, etc.).
The correct Regex fix is below.
OLD Regex Expression
^[a-zA-Z0-9_\\.]+#[a-zA-Z0-9_]+\\.[a-zA-Z0-9_]+$
NEW Regex Expression
^[a-zA-Z0-9_.+-]+#[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$
Actually I read this post it covers all special cases, so the best one that's work correctly with java is
String pattern ="(?:[a-zA-Z0-9!#$%&'*+/=?^_`{|}~-]+(?:\\.[a-zA-Z0-9!#$%&'*+/=?^_`{|}~-]+)*|\"(?:[\\x01-\\x08\\x0b\\x0c\\x0e-\\x1f\\x21\\x23-\\x5b\\x5d-\\x7f]|\\\\[\\x01-\\x09\\x0b\\x0c\\x0e-\\x7f])*\")#(?:(?:[a-zA-Z0-9](?:[a-zA-Z0-9-]*[a-zA-Z0-9])?\\.)+[a-zA-Z0-9](?:[a-zA-Z0-9-]*[a-zA-Z0-9])?|\\[(?:(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9]))\\.){3}(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9])|[a-zA-Z0-9-]*[a-zA-Z0-9]:(?:[\\x01-\\x08\\x0b\\x0c\\x0e-\\x1f\\x21-\\x5a\\x53-\\x7f]|\\\\[\\x01-\\x09\\x0b\\x0c\\x0e-\\x7f])+)\\])";

How to write regex pattern in lucene?

I want to match a string from regexp query in lucene.
Test String:
program-id. acinstal.
Regex pattern in java:
^[a-z0-9 ]{6}[^*]\s*(program-id)\.
How would i write this regex specifically for lucene regexp query to match the string.
Two problems with your regex (assuming here, based on previous questions, that your test string is indexed without any tokenization. As a StringField, for instance):
The regex must match a whole term. Without any analysis, as we're assuming, that means it must match the whole field. In this case, you need to add a .* to match the rest of the field
Since you have to match the whole field anyway, anchors are not supported, so get rid of the ^ at the beginning.
So the regex that should work is:
[a-z0-9 ]{6}[^*]\s*(program-id)\..*

Categories