Regex exclude specific subdomain (Java) - java

I want to exclude a specific subdomain from a regex.
I have searched and tryed out different regex. But non worked for me.
The normal regex looks like this:
https?:\/\/((localhost(\:\d+)?)|([a-z\-\.]*\.)?(gaga.ch|gugus.ch))
To exclude a subdomain with name admin in gugus.ch I added:(^(?!.*admin).*)
So the whole regex looks like:
https?:\/\/((localhost(\:\d+)?)|([a-z\-\.]*\.)?(gaga.ch|(^(?!.*admin).*)gugus.ch))
So it should let through http://www.gugus.ch
But NOT http://admin.gugus.ch
This does not work. What I'am doing wrong?
thx Mike

Try this regex:
https?://((localhost(:\d+)?)|([a-z.-]*\.)?(gaga\.ch|(?<!\badmin\.)gugus\.ch)
(?<!\badmin\.) is a negative lookbehind to fail the match if gugus.ch is preceded by admin.

Related

Finding multiple groups in Java regex for simple option parser

I need to modify this regex to find multiple group matches:
(?:--)(?<key>[^\s=]+)(?:(?<assign> *[ =] *)(?! --)(?<value>"[^"]*"|\S+))?
In Java:
"(?:--)(?<key>[^\\s=]+)(?:(?<assign> *[ =] *)(?! --)(?<value>\"[^\"]*\"|\\S+))?"
This matches the following correctly:
--key=value
--key=--value
--key value
--flag
--key="--value"
--key "--value"
--key=value --foo=bar
--key=value --foo=bar --flag
But it fails if --flag comes before any other options:
--key=value --flag --foo=bar
I've been trying to modify the negative lookahead between the assign and value capture groups without success so far. The value captured for flag ends up being --foo=bar instead of null.
Any expert recommendations on how to solve this?
I managed to fix the regex. The website https://regexr.com/ was invaluable.
The fixed regex is:
(?<prefix>--)(?<key>[^\s=]+)(?:(?! --)(?<assign> *[ =] *)(?! --)(?<value>"[^"]*"|\S+))?
Here's the Java class and unit test:
https://gist.github.com/kirklund/845baf340a1999a57db9e59e6ba40ce0

Removing Hashtag using Java WebFilter

I have the following configuration in the urlrewrite.xml:
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE urlrewrite PUBLIC "-//tuckey.org//DTD UrlRewrite 4.0//EN" "http://www.tuckey.org/res/dtds/urlrewrite4.0.dtd">
<urlrewrite use-query-string="true">
<rule>
<from>^(/event/showEventList)(\.{1})(\bhtm\b|\bhtml\b)(\?{0,1})([a-zA-Z0-9-_=&]{0,}+)(#{0,1})([a-zA-Z0-9-_=&]{0,}+)$</from>
<to type="redirect" last="true">/events$4$5</to>
</rule>
</urlrewrite>
The regex ^(/event/showEventList)(\.{1})(\bhtm\b|\bhtml\b)(\?{0,1})([a-zA-Z0-9-_=&]{0,}+)(#{0,1})([a-zA-Z0-9-_=&]{0,}+)$ has 7 groups, which are:
(/event/showEventList): matches /event/showEventList
(\.{1}): matches a single dot (.)
(\bhtm\b|\bhtml\b): matches only htm or html
(\?{0,1}): matches question mark (?) which can may occur zero or one
([a-zA-Z0-9-_=&]{0,}+): matches the query string which can occur zero or more
(#{0,1}): matches hashtag (#) which can may occur zero or one
([a-zA-Z0-9-_=&]{0,}+): matches the fragment which can occur zero or more
If I test this configuration with a test URL: /event/showEventList.html?pageNumber=1#key=val, I am expecting that the redirected URL would be /events?pageNumber=1, but I am getting /events?pageNumber=1#key=val
I have a code snippet to test it, which is:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class UrlRewriterRegexTest {
public static void main(String[] args) {
String input = "/event/showEventList.html?pageNumber=1#key=val";
String regex = "^(/event/showEventList)(\\.{1})(\\bhtm\\b|\\bhtml\\b)(\\?{0,1})([a-zA-Z0-9-_=&]{0,}+)(#{0,1})([a-zA-Z0-9-_=&]{0,}+)$";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(input);
System.out.println(matcher.replaceFirst("/events$4$5"));
}
}
It outputs to: /events?pageNumber=1.
Any pointer would be very helpful.
I'd simplify the expression a bit.
Escape slashes, as they are typically used as delimiters for the regex (\/event\/showEventList)
Remove superfluous quantifier (\.)
Shorten the html string test (htm(l)?) - careful, this messes with your capturing group numbers
Remove word boundary checks around html
Use ? instead of {0,1}
Use * instead of {0,}
Remove possessive quantifier (I don't see why you'd need it)
Ignore everything after #, you don't seem to need it in your replacement
This gives us ^(\/event\/showEventList)(\.)(htm(l)?)(\??)([a-zA-Z0-9-_=&]+)*#(.+)$ which subsitutes your example to /events?pageNumber=1
To play around, see https://regexr.com/4otp7
I've simplified the expression and here is the working solution
<from>^(\/event\/showEventList\.html?)(\?[a-zA-Z0-9-_=&]*)\#.*$</from>
<to type="redirect" last="true">/events$2</to>
This will match any thing and take everything from the beginning of query string till the first occurrence of #
Explanation:
Group 1 : Match the url /event/showEventList.html OR /event/showEventList.htm
Group 2 : Match all query string between o to many till the first occurrence of #
Group 2 is the string which you want to use for redirect and ignore any thing after # including #
Example:
I am answering my own question, so that in future if someone else stumbles upon the same problem, this answer could help him.
There is nothing to do with the UrlRewriteFilter framework. By enabling the debug log for this framework I have seen that the URL it is receiving before applying the defined rules doesn't have the URL Hash(#). From other SO answers and by analyzing the network traffic of the browser, I saw that the browser does not send the URL fragment to the server so it's not available in the HttpServletRequest. This is the reason the Regular Expressions are not working.
Since this hash is available in the client browser and thanks to HTML5 History API I am able to solve the problem using JavaScript:
<script type="text/javascript">
window.addEventListener('DOMContentLoaded', (event) => {
const url = new URL(window.location);
url.hash = '';
history.replaceState(null, document.title, url);
});
</script>

combine regex for RegularExpressionValidator

In Asp.net for RegularExpressionValidator, I need to regex validation for don't allow html tag in textbox having only < or > or &#.
Also, I can validate separately for < or > with this regex ([^<>])*
and separately for "&#" with regex ^((?!&#).)*$
But not able to validate both together. So please suggest me how to fix that problem.
Thanks.
Here is a possible solution:
^((?!(&#|>|<)).)*$
Try it online.
This is your provided ^((?!&#).)*$ modified by changing &# to (&#|>|<).

RegEx to match ends with

I need to write regex in java to match domain and subdomain(.domain.com).
Regex should return true for
domain.com
m.domain.com
abc.domain.com
www.domain.com
but returns false for
abcdomain.com
1domain.com
I try to match domain.com and and if preceding character is present then it must be .
I tried various options but it is failing in one or other test cases.
(^|.*?\.)domain\.com
Try this. See demo.
http://regex101.com/r/lB2sH2/1
Try this:
(\.|^)domain.com$
The first part means that there should be a . or nothing
and the $ means, "ends with"
You can try:
(^|\.)domain\.com$
but Java mostly handles only full-line matches, so:
(.+\.)?domain\.com
or you can use the .endWith() method in Java code:
if (domain.equals("domain.com") || domain.endsWith(".domain.com")) {
// do something...
}
I think you want something like this,
(?:\\w+\\.?)?domain\\.com
DEMO
try this regex
\bdomain\.com$
http://rubular.com/r/QG0FtVWtm6
If you don't know what "domain.com" is going to be, this regex below should give you just the subdomain of whatever domain you are looking for. Matches your specifications, including domains that look like abc.net
([a-z]+)(?=\.[a-z]+\.)
DEMO

Extract certain words from predefined sentence using regular expression

I have seemingly simple task, but I have no experience with regular expressions.
I have to parse SMS body with predefined message text, to get out certain information.
Here is one example:
Täname! {FirstName} {LastName} isikukoodiga {PersonCode} on sõlminud EMT Reisikindlustuse lepingu numbriga {PolicyNumber}, mis kehtib alates {CoverStartDate} kell {CoverStartTime} kuni {CoverEndDate} kell {CoverEndTime} (Eesti aja järgi). Hind: {PremiumEur} eurot. Tutvu tingimustega ({Terms}) http://emt.ee/kindlustus. Kahjukäsitluse number +3727330700.
I have to parse out everything that is in curly braces.
I came up with something like this in Java:
public static final String REGEX_CONFIRMATION = "Täname! (.*) (.*) isikukoodiga (.*) on sõlminud EMT Reisikindlustuse lepingu numbriga (.*), mis kehtib alates (.*) kell (.*) kuni (.*) kell (.*) \\(Eesti aja järgi\\). Hind: (.*) eurot. Tutvu tingimustega \\((.*)\\) http://emt.ee/kindlustus. Kahjukäsitluse number \\+3727330700.";
But it parses out only following groups:
{MARIS}, {PLOTS}, {17204046521}, {22414152}, {01.10.2002}, {13:07},
{02.10.2002}, {23:59}.
As you can see {Terms} is missing. And I can't seem to figure out where is the problem?
how about using this pattern?
\{.*?\}
Wouldn't it make more sense to simply use
\{[^{}]*\}
as your regex? In a string, you would need to write that as
"\\{[^{}]*\\}"
Explanation:
\{ # Match an opening brace
[^{}]* # Match any number of characters except braces
\} # Match a closing brace
http://www.java2s.com/Code/Java/Regular-Expressions/Findallmatches.htm
along with the following regex
\{(.*?)\}
Seems correct to me. Use the DOTALL (and in other cases maybe MULTILINE) options. DOTALL can be added as "(?s)Täname!...". Then the ".*" also maps newline chars.
As the prior matches were found this might be it.
Does it work, when You include brackets into your {TERMS} part?
Instead of:
String regex = "...Tutvu tingimustega \\((.*)\\) http://emt.ee/kindlustus. ...";
You could try:
String regex = "...Tutvu tingimustega (.*) http://emt.ee/kindlustus. ...";
OR depending on, what You have in {TERMS} string, You could change _.*_ to _[^)]*_
This way you would find zero to N chars that are not ending bracket.

Categories