Regex to match pattern with subdomain in java gives issues - java

I am trying to match the sub domain of an url using http://([a-z0-9]*.)?example.com/.* which works perfectly for these cases.
http://example.com/index.html
http://test.example.com/index.html
http://test1.example.com/index.html
http://www.example.com/122/index.html
But the problem is it matches for this URL too.
http://www.test.com/?q=http://example.com/index.html
if an URL with another domain has the URL in path it matches.Can any one tell me how to match for current domain only. getting the host will work but i need to match full URL.

Are you aware that . matches any character?
If you use the regex
http://([a-z0-9]*\.)?example\.com/.*
(or, as a Java String)
"http://([a-z0-9]*\\.)?example\\.com/.*"
it should work because now the ?q= part won't be matched.
This assumes that you're using the .matches() method which forces the entire string to match. Otherwise, add a ^ at the start of the regex.

simplest way would be:
^http://([a-z0-9]*?\.)?example\.com/.*
the ^ matches the starting position within the string. Dont confuse it with [^ ] though.

Related

Regex to match valid domain if it comes in beginning of a string

I am trying to match the below criteria with Regex:
String starts with http or https
It ends with two particular domains (abc.com or xyz.com)
There can be anything after abc.com or xyz.com
I wrote below regex for the same:
http[s]?:\/\/(.*).(abc|xyz).com\/(.*)
but it fails in below scenario as it shouldn't be matched with the Regex.
http://www.attacker.com/https://events.abc.com/#/en/navigator/init/meetings
Any help much appreciated
You may use
^https?:\/\/([^\/]*\.)?(abc|xyz)\.com\/.*
Regex Demo
You have to check for characters except / before abc|xyz.com
Details
^https?:\/\/: http:// or https:// at the beginning of the string
([^\/]*\.)?: any character except / before FQDN followed by dot, it's optional (?) for cases like //abc.com
(abc|xyz)\.com: match FQDN
\/.*: match / and anything that comes after
You should escape the literals dots (.):
https?:\/\/.*\.(abc|xyz)\.com\/.*
See a demo here.
Of course, if you want to put that in a Java string literal, it's got to be
https?:\\/\\/.*\\.(abc|xyz)\\.com\\/.*
I also removed some unnecessary things from your regex:
You don't need to put single letters in a character class ([s])
(.*) doesn't have to be in brackets, but it might be worth it to append a ? to it, so it becomes .*? to avoid greediness of *

Match path with or without url parameters

I am trying to write a simple regex to match the following:
/path/foo.html
/path/foo.html?a=b
/path/foo.html?a=b&b=c
but not /path/foo.htmlx or anything else which is not foo.html + url parameter.
I tried
/path/foo.html(?:\?|$)
but it does not seem to work in my java project.
String.matches(String regex) does a full match
Tells whether or not this string matches the given regular expression.
An invocation of this method of the form str.matches(regex) yields exactly the same result as the expression
So for example "/path/foo.html?a=b".matches("/path/foo.html(?:\\?|$)") returns false, because the String doesn't end after the ?.
You can use "/path/foo.html(\\?.*)?"
For URL part use regexp provided by #reconnect:
^[^?]*
match from start until ? character meet.
for search exactly /path/foo.html use:
^/path/foo.html
if you need check that part of something exists with matches in the line add ^ - start of line and $ end of line, and in this case you should care about characters between end of your conditions, basically replaced with .*
I'm not exactly understand what you trying to match, but if you want match only URL part without search string:
/^[^?#]*/
# - hash string, too can pass some arguments

Regular expression to return results that do not match selection

I work on a product that provides a Java API to extend it.
The API provides a function which
takes a Perl regular expression and
returns a list of matching files.
I want to filter the list to remove all files that end in .xml, .xsl and .cfg; basically the opposite of .*(\.xml|\.xsl|\.cfg).
I have been searching but I haven't been able to get anything to work yet.
I tried .*(?!\.cfg) and ^((?!cfg).)*$ and \.(?!cfg$|?!xml$|?!xsl$).
I don't know if I am on the right track or not.
Note
I know the regex systems are similar, but I can't get a Java regex working either.
You may use
^(?!.*\.(x[ms]l|cfg)$).+
See the regex demo
Details:
^ - start of a string
(?!.*\.(x[ms]l|cfg)$) - a negative lookahead that fails the match if any 0+ chars other than line break chars (.*) are followed with xml, xsl or cfg ((x[ms]l|cfg)) at the end of the string ($)
.+ - any 1 or more chars other than linebreak chars. Might be omitted if the entire string match is not required (in some tools it is required though).
You need something like this, which matches only if the end of the string isn't preceded by a dot and one of the three unwanted types
/(?<!\.(?:xml|xsl|cfg))\z/

Why does this Regular Expression not match anything?

I'm trying to use the following regular expression to find all e-mails in an html string:
RegExp
[A-Z0-9._%+-]+#[A-Z0-9.-]+\\.[A-Z]{2,4}
HTML
ddawson#gcitravel.net</span>. </p>
I'm using matcher.find() which is supposed to find substrings is it not? When I perform the search it is coming up empty, any ideas why?
Regex is case sensitive by default so for instance last part .net can't be matched with .[A-Z]{2,4}.
To make your regex case insensitive add (?i) flag
"(?i)[A-Z0-9._%+-]+#[A-Z0-9.-]+\\.[A-Z]{2,4}"
or compile it with Pattern.CASE_INSENSITIVE flag.
Pattern.compile("[A-Z0-9._%+-]+#[A-Z0-9.-]+\\.[A-Z]{2,4}",Pattern.CASE_INSENSITIVE);
A-Z will only match upper case, and there is an extra \. Try this...
[A-Za-z0-9._%+-]+#[A-Za-z0-9.-]+\.[a-zA-Z]{2,4}
This way of searching for emails is no longer correct when we have new domains. This regular expression would not find any email in domain site.berlin. Extend 2,4, delete or look for
[A-Za-z0-9-+/.]*#[A-Za-z0-9/.-]*\\.*[A-Za-z]$
I don't have enough reputation to comment a post, afair the longest TLD domain is .international so {2,4} won't find it and remember about domains with dot inside root name like .co.uk, .de.com. Domain must also end with a letter, it cannot be number or special character. Email address might contain delimiter like + or -

Regular expression one or more groups - regex to match a url with and without port number

I have a regular expression to match a URL
(^http.?://\b)(.*):(\d*)(.*)
http://udara.com:8907/phpmyadmin/index.php
matches the above expression. However, there may be cases where port is not specified in the URL as below:
http://udara.com/phpmyadmin/index.php?token=48bdb70fd4f1e6abe5ecb84192c1835e
In this case the expression does not match.
How to say zero or more of the 3rd group.
Note that there may be IPs instead of the domain udara.com
Try the following: (^http.?://([a-zA-Z\-]+)(?::(\d*))?(.*)
EIDT:
^http.?://([a-zA-Z\-\.]+)(?::(\\d*))?(.*) - java
^http.?:\/\/([a-zA-Z\-\.]+)(?::(\d*))?(.*) - perl (and regex101.com)
This might help:
http://[a-z\.]+(:[0-9]+)*/.+

Categories