Why does this Regular Expression not match anything? - java

I'm trying to use the following regular expression to find all e-mails in an html string:
RegExp
[A-Z0-9._%+-]+#[A-Z0-9.-]+\\.[A-Z]{2,4}
HTML
ddawson#gcitravel.net</span>. </p>
I'm using matcher.find() which is supposed to find substrings is it not? When I perform the search it is coming up empty, any ideas why?

Regex is case sensitive by default so for instance last part .net can't be matched with .[A-Z]{2,4}.
To make your regex case insensitive add (?i) flag
"(?i)[A-Z0-9._%+-]+#[A-Z0-9.-]+\\.[A-Z]{2,4}"
or compile it with Pattern.CASE_INSENSITIVE flag.
Pattern.compile("[A-Z0-9._%+-]+#[A-Z0-9.-]+\\.[A-Z]{2,4}",Pattern.CASE_INSENSITIVE);

A-Z will only match upper case, and there is an extra \. Try this...
[A-Za-z0-9._%+-]+#[A-Za-z0-9.-]+\.[a-zA-Z]{2,4}

This way of searching for emails is no longer correct when we have new domains. This regular expression would not find any email in domain site.berlin. Extend 2,4, delete or look for
[A-Za-z0-9-+/.]*#[A-Za-z0-9/.-]*\\.*[A-Za-z]$
I don't have enough reputation to comment a post, afair the longest TLD domain is .international so {2,4} won't find it and remember about domains with dot inside root name like .co.uk, .de.com. Domain must also end with a letter, it cannot be number or special character. Email address might contain delimiter like + or -

Related

Regex to match valid domain if it comes in beginning of a string

I am trying to match the below criteria with Regex:
String starts with http or https
It ends with two particular domains (abc.com or xyz.com)
There can be anything after abc.com or xyz.com
I wrote below regex for the same:
http[s]?:\/\/(.*).(abc|xyz).com\/(.*)
but it fails in below scenario as it shouldn't be matched with the Regex.
http://www.attacker.com/https://events.abc.com/#/en/navigator/init/meetings
Any help much appreciated
You may use
^https?:\/\/([^\/]*\.)?(abc|xyz)\.com\/.*
Regex Demo
You have to check for characters except / before abc|xyz.com
Details
^https?:\/\/: http:// or https:// at the beginning of the string
([^\/]*\.)?: any character except / before FQDN followed by dot, it's optional (?) for cases like //abc.com
(abc|xyz)\.com: match FQDN
\/.*: match / and anything that comes after
You should escape the literals dots (.):
https?:\/\/.*\.(abc|xyz)\.com\/.*
See a demo here.
Of course, if you want to put that in a Java string literal, it's got to be
https?:\\/\\/.*\\.(abc|xyz)\\.com\\/.*
I also removed some unnecessary things from your regex:
You don't need to put single letters in a character class ([s])
(.*) doesn't have to be in brackets, but it might be worth it to append a ? to it, so it becomes .*? to avoid greediness of *

Freemarker regex is not matching on all lowercase substrings

So I am following the user guide, which seems straight forward so I'm not sure what it is I am doing wrong. I want to use the matches builtin to find all lower case words in a string. So, taking the example straight from the docs, into my code (with some obvious changes), I always get the Does not match output. Any help is much appreciated:
<#assign res = "<UPPERCASE_WORD<lowercase_word>>"?matches("[a-z]+")>
<#if res>
Matches
<#else>
Does not match
</#if>
One thing that I've noticed between my code and the docs is that the example has spaces and I do not, but I doubt that's the issue as a quick test with < > replaced with spaces shows no difference. I was thinking the regex is incorrect or not supported by Freemarker, but the docs link directly to OracleRegexPattern docs so I think that's OK.
Don't use matches if you don't expect an exact match:
This built-in determines if the string exactly matches the pattern
If you know the appropriate exact regex use it,
For example for lower letters and then upper cases letters use:
?matches("[a-z]+[A-Z]+")>
If you want to check if the string contains [a-z] somewhere, then the regular expression should be ".*[a-z]+.*", because ?matches checks if the pattern matches the whole string.

Regular expression to return results that do not match selection

I work on a product that provides a Java API to extend it.
The API provides a function which
takes a Perl regular expression and
returns a list of matching files.
I want to filter the list to remove all files that end in .xml, .xsl and .cfg; basically the opposite of .*(\.xml|\.xsl|\.cfg).
I have been searching but I haven't been able to get anything to work yet.
I tried .*(?!\.cfg) and ^((?!cfg).)*$ and \.(?!cfg$|?!xml$|?!xsl$).
I don't know if I am on the right track or not.
Note
I know the regex systems are similar, but I can't get a Java regex working either.
You may use
^(?!.*\.(x[ms]l|cfg)$).+
See the regex demo
Details:
^ - start of a string
(?!.*\.(x[ms]l|cfg)$) - a negative lookahead that fails the match if any 0+ chars other than line break chars (.*) are followed with xml, xsl or cfg ((x[ms]l|cfg)) at the end of the string ($)
.+ - any 1 or more chars other than linebreak chars. Might be omitted if the entire string match is not required (in some tools it is required though).
You need something like this, which matches only if the end of the string isn't preceded by a dot and one of the three unwanted types
/(?<!\.(?:xml|xsl|cfg))\z/

Add Dash to Java Regex

I am trying to modify an existing Regex expression being pulled in from a properties file from a Java program that someone else built.
The current Regex expression used to match an email address is -
RR.emailRegex=^[a-zA-Z0-9_\\.]+#[a-zA-Z0-9_]+\\.[a-zA-Z0-9_]+$
That matches email addresses such as abc.xyz#example.com, but now some email addresses have dashes in them such as abc-def.xyz#example.com and those are failing the Regex pattern match.
What would my new Regex expression be to add the dash to that regular expression match or is there a better way to represent that?
Basing on the regex you are using, you can add the dash into your character class:
RR.emailRegex=^[a-zA-Z0-9_\\.]+#[a-zA-Z0-9_]+\\.[a-zA-Z0-9_]+$
add
RR.emailRegex=^[a-zA-Z0-9_\\.-]+#[a-zA-Z0-9_-]+\\.[a-zA-Z0-9_-]+$
Btw, you can shorten your regex like this:
RR.emailRegex=^[\\w.-]+#[\\w-]+\\.[\\w-]+$
Anyway, I would use Apache EmailValidator instead like this:
if (EmailValidator.getInstance().isValid(email)) ....
Meaning of - inside a character class is different than used elsewhere. Inside character class - denotes range. e.g. 0-9. If you want to include -, write it in beginning or ending of character class like [-0-9] or [0-9-].
You also don't need to escape . inside character class because it is treated as . literally inside character class.
Your regex can be simplified further. \w denotes [A-Za-z0-9_]. So you can use
^[-\w.]+#[\w]+\.[\w]+$
In Java, this can be written as
^[-\\w.]+#[\\w]+\\.[\\w]+$
^[a-zA-Z0-9_\\.\\-]+#[a-zA-Z0-9_]+\\.[a-zA-Z0-9_]+$
Should solve your problem. In regex you need to escape anything that has meaning in the Regex engine (eg. -, ?, *, etc.).
The correct Regex fix is below.
OLD Regex Expression
^[a-zA-Z0-9_\\.]+#[a-zA-Z0-9_]+\\.[a-zA-Z0-9_]+$
NEW Regex Expression
^[a-zA-Z0-9_.+-]+#[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$
Actually I read this post it covers all special cases, so the best one that's work correctly with java is
String pattern ="(?:[a-zA-Z0-9!#$%&'*+/=?^_`{|}~-]+(?:\\.[a-zA-Z0-9!#$%&'*+/=?^_`{|}~-]+)*|\"(?:[\\x01-\\x08\\x0b\\x0c\\x0e-\\x1f\\x21\\x23-\\x5b\\x5d-\\x7f]|\\\\[\\x01-\\x09\\x0b\\x0c\\x0e-\\x7f])*\")#(?:(?:[a-zA-Z0-9](?:[a-zA-Z0-9-]*[a-zA-Z0-9])?\\.)+[a-zA-Z0-9](?:[a-zA-Z0-9-]*[a-zA-Z0-9])?|\\[(?:(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9]))\\.){3}(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9])|[a-zA-Z0-9-]*[a-zA-Z0-9]:(?:[\\x01-\\x08\\x0b\\x0c\\x0e-\\x1f\\x21-\\x5a\\x53-\\x7f]|\\\\[\\x01-\\x09\\x0b\\x0c\\x0e-\\x7f])+)\\])";

Name validation with special conditions using regex

I want to validate the Name in Java that will allow following special characters for single time {,-.'}. I am able to achieve with the Expression that will allow user to enter only such special characters in a string. But I am not able to figure it out how to add restrictions where users cannot add these characters more then one time. I tried to achieve it using quantifiers but remain unsuccessful. I have done the following code yet!
Pattern validator = Pattern.compile("^[a-zA-Z+\\.+\\-+\\'+\\,]+$");
You can use lookahead assertion in your regex:
Pattern validator = Pattern.compile(
"^(?!(?:.*?\\.){2})(?!(?:.*?'){2})(?!(?:.*?,){2})(?!(?:.*?-){2})[a-zA-Z .',-]+$");
(?!(?:.*?[.',-]){2}) is a negative lookahead that means don't allow more than 1 of those characters in character class.
RegEx Demo
I think that you can just take into account names where such characters would only happen once. Names like "Jonathan's", "Thoms-Damm", "Thoms,Jon", "jonathan.thoms". In practice for names, I don't think that such special characters would occur at the edges of the string. As such, you can probably get away with a regex like:
Pattern validator = Pattern.compile("^[a-zA-Z]+(?:[-',\.][a-zA-Z]+)?$");
This regex should match a regular ASCII name followed optionally by a single "special" character with another name after it.

Categories