Java Regex Everything Before and Including Match - java

I need the regex expression to remove any text before a match and including the match
eg. I want to remove "123S" and everything before it, I know I can do this with
string.replaceAll("^.*?(?=[123S])","");
string.replaceAll("123S","");
But really want to do it in a single expression (can't find another example anywhere!)

You can do it with:
string.replaceAll("^.*123S","");
Remove non-greedy ? to match last occurence and .* everything before.

You don't need the look ahead:
"abc123Sdef123Sxyz".replaceAll("^.*?123S","");
This replaces the first occurence only, if that is what you need (output is def123Sxyz).
In case you want to replace up to the last 123S, just remove the ? modifier:
"abc123Sdef123Sxyz".replaceAll("^.*123S","");
Output is xyz.

string.replaceAll("^.*?123S", "");
(?= is the "if followed by" pattern which you don't want, and [123S] isn't even correct it'll catch just '2' for instance.

string.replaceAll("^.*?123S","");
More efficient and improves clarity so someone else knows what you're doing.

Related

Java Regex Match Pattern Groups unexpectedly matched [duplicate]

I am writing a regex that will be used for recognizing commands in a string. I have three possible words the commands could start with and they always end with a semi-colon.
I believe the regex pattern should look something like this:
(command1|command2|command3).+;
The problem, I have found, is that since . matches any character and + tells it to match one or more, it skips right over the first instance of a semi-colon and continues going.
Is there a way to get it to stop at the first instance of a semi-colon it comes across? Is there something other than . that I should be using instead?
The issue you are facing with this: (command1|command2|command3).+; is that the + is greedy, meaning that it will match everything till the last value.
To fix this, you will need to make it non-greedy, and to do that you need to add the ? operator, like so: (command1|command2|command3).+?;
Just as an FYI, the same applies for the * operator. Adding a ? will make it non greedy.
Tell it to find only non-semicolons.
[^;]+
What you are looking for is a non-greedy match.
.+?
The "?" after your greedy + quantifier will make it match as less as possible, instead of as much as possible, which it does by default.
Your regex would be
'(command1|command2|command3).+?;'
See Python RE documentation

Any suggestions to match and extract the pattern?

I want to match something like this
$(string).not(string).not(string)
The not(string) can repeat zero or more times, after $(string).
Note that the string can be whatever things, except nested not(string).
I used the regular expression (\\$\\((.*)\\))((\\.not\\((.*?)\\))*?)(?!(\\.not)), I think the *? is to non-greedily match any number of sequence of not(string), and use the lookahead to stop the match that is not not(string), so that I can extract only the part that I want.
However, when I tested on the input like
$(string).not(string).not(string).append(string)
the group(0) returns the whole string, which I only need $(string).not(string).not(string).
Obviously I still miss something or misuse of anything, any suggestions?
Try this one (escaped for java):
(\\$\\(string\\)(?:(?:\\.not\(.*?\\))+))
It should capture just the part that you are after. You can test it out (unescaped for java though)
If we assume that parenthesis are not nested, you can write something like this:
string p = "\\$\\([^)]*\\)(?:\\.not\\([^)]*\\))*";
Not need to add a lookahead since the non-capturing group has a greedy quantifier (so the group is repeated as possible).
if what you called string in your question may be a quoted string with parenthesis inside like in Pshemo example: $(string).not(".not(foo)").not(string), you can replace each [^)]* with (?:\\s*\"[^\"]*\"\\s*|[^)]*) to ignore characters inside quoted parts.
From here, "group zero denotes the entire pattern". Use group(1).
(\$\([\w ]+\))(\.not\([\w ]+\))*
This will also work, it would give you two groups, One consisting of the word with $ sign, another would give you the set of all ".not" strings.
Please note: You might have to add escape characters for java.

Remove anything between two character

i want to remove anything between "?" and "/"
my text is "hi?0/hello/hi"
i need to see this out put
"hi?/hello/hi"
My Code Is
key.replaceAll("\\?.*/","?/");
but my Output Is
"hi?/hi"
whats wrong?
You are using greedy matching, so it matches up to the next slash too. Try:
key.replaceAll("\\?.*?/","?/");
An alternative still using greedy matching is to match any character except /:
key.replaceAll("\\?[^/]*/","?/");
Use this:
key.replaceAll("\\?.*?/","?/")
You can read more about greedyand non greedy matching here

Java Regex - Finding specific string within a String

I am trying to match a string that start with the set word "hotel", then a hyphen, then a word of any length, then another hyphen and finally a number of any length.
Edit: Dima gave the solution I needed in the comments of this question! Thanks Dima.
Further edit: elaborating on Dima's answer, adding capturing groups making it easier to retrieve the information entered, and correcting the last bit to only accept digits:
^hotel-(.+)-(\d+)
^hotel-(.)*$
(But hotel-something WILL work, according to your initial statement).
So, if you actually want something like:
hotel-XXXXXX-YYYYYYY
Then the regex is :
^hotel-(.)*-(.)*$
Try a regex online tester like http://www.regextester.com/.
If you want to match the start of the input, you use ^.
so if you have ^hotel-\b, that will force hotel to be at the start of the string.
as a note, you can use $ for the end of the string in a similar way.
\bhotel-[^\s-]+-[^\s-]+\b
\b means that it should be a word boundery
[^\s-] means anything but - or whitespace
https://regex101.com/r/mH3vY8/1

Regular Expression Issue in Java

I have searched everywhere and I cannot find what I am doing wrong.
I have this regular expression: ^(\[\[).+(\]\]) that I want to match for this data that starts just at the beginning of the line as shown below (I do not want to match anything but the things starting at the beginning of a line):
[[match this]] [[don't match this]]
{{Link GA|es}}
{{Link FA|ca}}
And for some reason it is not matching anything in Java (or other regex "testers" such as regexpal.com). By "in Java" i mean with the String.replaceAll(String regex, String replacement) method in the Java String API.
But, if I omit the ^ and just have (\[\[).+(\]\]) it matches fine at the beginning of the line, but also matches inline instances which I do not want.
Can anyone point out what the error is here? Thank you
^ means "start of string", not "start of line", unless you use the Pattern.MULTILINE (or (?m)) option when building the regex. Also, you should be using a lazy quantifier (as pointed out by Dave Newton in his comment).
Finally, don't forget to double the backslashes:
String result = subject.replaceAll("(?m)^\\[\\[.+?\\]\\]", "");
.+ is greedy, in that it will match everything it can (here, matching everything up to the last \]\]
To stop this behaviour just add a ? to make it non-greedy
^\[\[.+?\]\]
Will match [[ then look for any characters until it finds the first occurrence of ]]
(\[\[).+(\]\]){1}+ {1}+ that mean exactly one time's improve link

Categories