Regular expression for either or matching - java

I have a string which could end with either
"/"
or
"/?secure=true"
I need to validate it. I tried this
\b(/(/?secure=true)?)\b
but I get no matches found with the regex.
However, with
^(/(\$secure=true)?)$
match is found. But this wont help in my case as the string that I want to validate will be prefixed with chars something like "thisismystring/" or "thisismystring/?secure=true".
I want to know how to go about this.

You need to use the pipe symbol (|) to specify an or:
(/|/\?secure=true)$
You also need to escape the question mark, as here, and should specify the "$" for end of string but not the "^" for start of string.

Both cases would contain the
/
So that should not be in the optional part.
You put the optional part in ()?
and escape the question mark with backslash
/(\?secure=true)?
And to indicate that it should end with that, you put $ at the end
/(\?secure=true)?$

Can be done like this:
/(\?secure=true)?$

Related

Java Regex Match Pattern Groups unexpectedly matched [duplicate]

I am writing a regex that will be used for recognizing commands in a string. I have three possible words the commands could start with and they always end with a semi-colon.
I believe the regex pattern should look something like this:
(command1|command2|command3).+;
The problem, I have found, is that since . matches any character and + tells it to match one or more, it skips right over the first instance of a semi-colon and continues going.
Is there a way to get it to stop at the first instance of a semi-colon it comes across? Is there something other than . that I should be using instead?
The issue you are facing with this: (command1|command2|command3).+; is that the + is greedy, meaning that it will match everything till the last value.
To fix this, you will need to make it non-greedy, and to do that you need to add the ? operator, like so: (command1|command2|command3).+?;
Just as an FYI, the same applies for the * operator. Adding a ? will make it non greedy.
Tell it to find only non-semicolons.
[^;]+
What you are looking for is a non-greedy match.
.+?
The "?" after your greedy + quantifier will make it match as less as possible, instead of as much as possible, which it does by default.
Your regex would be
'(command1|command2|command3).+?;'
See Python RE documentation

Add Dash to Java Regex

I am trying to modify an existing Regex expression being pulled in from a properties file from a Java program that someone else built.
The current Regex expression used to match an email address is -
RR.emailRegex=^[a-zA-Z0-9_\\.]+#[a-zA-Z0-9_]+\\.[a-zA-Z0-9_]+$
That matches email addresses such as abc.xyz#example.com, but now some email addresses have dashes in them such as abc-def.xyz#example.com and those are failing the Regex pattern match.
What would my new Regex expression be to add the dash to that regular expression match or is there a better way to represent that?
Basing on the regex you are using, you can add the dash into your character class:
RR.emailRegex=^[a-zA-Z0-9_\\.]+#[a-zA-Z0-9_]+\\.[a-zA-Z0-9_]+$
add
RR.emailRegex=^[a-zA-Z0-9_\\.-]+#[a-zA-Z0-9_-]+\\.[a-zA-Z0-9_-]+$
Btw, you can shorten your regex like this:
RR.emailRegex=^[\\w.-]+#[\\w-]+\\.[\\w-]+$
Anyway, I would use Apache EmailValidator instead like this:
if (EmailValidator.getInstance().isValid(email)) ....
Meaning of - inside a character class is different than used elsewhere. Inside character class - denotes range. e.g. 0-9. If you want to include -, write it in beginning or ending of character class like [-0-9] or [0-9-].
You also don't need to escape . inside character class because it is treated as . literally inside character class.
Your regex can be simplified further. \w denotes [A-Za-z0-9_]. So you can use
^[-\w.]+#[\w]+\.[\w]+$
In Java, this can be written as
^[-\\w.]+#[\\w]+\\.[\\w]+$
^[a-zA-Z0-9_\\.\\-]+#[a-zA-Z0-9_]+\\.[a-zA-Z0-9_]+$
Should solve your problem. In regex you need to escape anything that has meaning in the Regex engine (eg. -, ?, *, etc.).
The correct Regex fix is below.
OLD Regex Expression
^[a-zA-Z0-9_\\.]+#[a-zA-Z0-9_]+\\.[a-zA-Z0-9_]+$
NEW Regex Expression
^[a-zA-Z0-9_.+-]+#[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$
Actually I read this post it covers all special cases, so the best one that's work correctly with java is
String pattern ="(?:[a-zA-Z0-9!#$%&'*+/=?^_`{|}~-]+(?:\\.[a-zA-Z0-9!#$%&'*+/=?^_`{|}~-]+)*|\"(?:[\\x01-\\x08\\x0b\\x0c\\x0e-\\x1f\\x21\\x23-\\x5b\\x5d-\\x7f]|\\\\[\\x01-\\x09\\x0b\\x0c\\x0e-\\x7f])*\")#(?:(?:[a-zA-Z0-9](?:[a-zA-Z0-9-]*[a-zA-Z0-9])?\\.)+[a-zA-Z0-9](?:[a-zA-Z0-9-]*[a-zA-Z0-9])?|\\[(?:(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9]))\\.){3}(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9])|[a-zA-Z0-9-]*[a-zA-Z0-9]:(?:[\\x01-\\x08\\x0b\\x0c\\x0e-\\x1f\\x21-\\x5a\\x53-\\x7f]|\\\\[\\x01-\\x09\\x0b\\x0c\\x0e-\\x7f])+)\\])";

Any suggestions to match and extract the pattern?

I want to match something like this
$(string).not(string).not(string)
The not(string) can repeat zero or more times, after $(string).
Note that the string can be whatever things, except nested not(string).
I used the regular expression (\\$\\((.*)\\))((\\.not\\((.*?)\\))*?)(?!(\\.not)), I think the *? is to non-greedily match any number of sequence of not(string), and use the lookahead to stop the match that is not not(string), so that I can extract only the part that I want.
However, when I tested on the input like
$(string).not(string).not(string).append(string)
the group(0) returns the whole string, which I only need $(string).not(string).not(string).
Obviously I still miss something or misuse of anything, any suggestions?
Try this one (escaped for java):
(\\$\\(string\\)(?:(?:\\.not\(.*?\\))+))
It should capture just the part that you are after. You can test it out (unescaped for java though)
If we assume that parenthesis are not nested, you can write something like this:
string p = "\\$\\([^)]*\\)(?:\\.not\\([^)]*\\))*";
Not need to add a lookahead since the non-capturing group has a greedy quantifier (so the group is repeated as possible).
if what you called string in your question may be a quoted string with parenthesis inside like in Pshemo example: $(string).not(".not(foo)").not(string), you can replace each [^)]* with (?:\\s*\"[^\"]*\"\\s*|[^)]*) to ignore characters inside quoted parts.
From here, "group zero denotes the entire pattern". Use group(1).
(\$\([\w ]+\))(\.not\([\w ]+\))*
This will also work, it would give you two groups, One consisting of the word with $ sign, another would give you the set of all ".not" strings.
Please note: You might have to add escape characters for java.

Remove anything between two character

i want to remove anything between "?" and "/"
my text is "hi?0/hello/hi"
i need to see this out put
"hi?/hello/hi"
My Code Is
key.replaceAll("\\?.*/","?/");
but my Output Is
"hi?/hi"
whats wrong?
You are using greedy matching, so it matches up to the next slash too. Try:
key.replaceAll("\\?.*?/","?/");
An alternative still using greedy matching is to match any character except /:
key.replaceAll("\\?[^/]*/","?/");
Use this:
key.replaceAll("\\?.*?/","?/")
You can read more about greedyand non greedy matching here

How to search for a special character in java string?

I am having some problem with searching for a special character "(".
I got a java.util.regex.PatternSyntaxException exception has occurred.
It might have something to do with "(" being treated as special character.
I am not very good with pattern expression. Can someone help me properly search for the escape character?
// I need to split the string at the "("
String myString = "Room Temperature (C)";
String splitList[] = myString.split ("("); // i got an exception
// I tried this but got compile error
String splitList[] = myString.split ("\(");
Try one of these:
string.split("\\(");
string.split(Pattern.quote("("));
Since a string split takes a regular expression as an argument, you need to escape characters properly. See Jon Skeet's answer on this here:
The reason you got an exception the first time is because split() takes a regular expression as argument, and ( has a special meaning there, as you suggest. To avoid this, you need to escape it using a \, like you tried.
What you missed, is that you also need to escape your backslashes with an extra \ in Java, meaning you need a total of two:
String splitList[] = myString.split ("\\(");
You need to escape the character via backslashes: string.split("\\(");
( is one of regex special characters. To escape it you can use e.g.
split(Pattern.quote("(")),
split("\\Q(\\E"),
split("\\("),
split("[(]").

Categories