Java non-greedy (?) regex to match string - java

String poolId = "something/something-else/pools[name='test'][scope='lan1']";
String statId = "something/something-else/pools[name='test'][scope='lan1']/stats[base-string='10.10.10.10']";
Pattern pattern = Pattern.compile(".+pools\\[name='.+'\\]\\[scope='.+'\\]$");
What regular expression should be used such that
pattern.matcher(poolId).matches()
returns true whereas
pattern.matcher(statsId).matches()
returns false?
Note that
something/something-else is irrelevant and can be of any length
Both name and scope can have ANY character including any of \, /, [, ] etc
stats[base-string='10.10.10.10'] is an example and there can be anything else after /
I tried to use the non-greedy ? like so .+pools\\[name='.+'\\]\\[scope='.+?'\\]$ but still both matches return true

You can use
.+pools\[name='[^']*'\]\[scope='[^']*'\]$
See the regex demo. Details:
.+ - any one or more chars other than line break chars as many as possible
pools\[name=' - a pools[name='string
[^']* - zero or more chars other than a '
'\]\[scope=' - a '][scope=' string
[^']* - zero or more chars other than a '
'\] - a '] substring
$ - end of string.
In Java:
Pattern pattern = Pattern.compile(".+pools\\[name='[^']*']\\[scope='[^']*']$");
See the Java demo:
//String s = "something/something-else/pools[name='test'][scope='lan1']"; // => Matched!
String s = "something/something-else/pools[name='test'][scope='lan1']/stats[base-string='10.10.10.10']";
Pattern pattern = Pattern.compile(".+pools\\[name='[^']*']\\[scope='[^']*']$");
Matcher matcher = pattern.matcher(s);
if (matcher.find()){
System.out.println("Matched!");
} else {
System.out.println("Not Matched!");
}
// => Not Matched!

Wiktor assumed that your values for name and scope cannot have single quotes in them. Thus the following:
.../pools[name='tes't']
would not match. This is really the only valid assumption to make, as if you can include unescaped single quotes, then what's to stop the value of scope from being (for example) the literal value lan1']/stats[base-string='10.10.10.10? The regex you included in your question has this issue. If you simply must have these values in your code, you need to escape them somehow. Try the following (edit of Wiktor's regex):
.+pools\[name='([^']|\\')*'\]\[scope='([^']|\\')*'\]$

Related

java regex minimum character not working

^[a-zA-Z1-9][a-zA-Z1-9_\\.-]{2,64}[^\\.-]$
this is the regex that should match the following conditions
should start only with alphabets and numbers ,
contains alphabets numbers ,dot and hyphen
should not end with hyphen
it works for all conditions but when i try with three character like
vu6
111
aaa
after four characters validation is working properly did i miss anything
Reason why your Regex doesn't work:
Hope breaking it into smaller pieces will help:
^[a-zA-Z1-9][a-zA-Z1-9_\\.-]{2,64}[^\\.-]$
[a-zA-Z1-9]: Will match a single alphanumeric character ( except for _ )
[a-zA-Z1-9_\\.-]{2,64}: Will match alphanumeric character + "." + -
[^\\.-]: Will expect exactly 1 character which should not be "." or "-"
Solution:
You can use 2 simple regex:
This answer assumes that the length of the string you want to match lies between [3-65] (both inclusive)
First, that will actually validate the string
[a-zA-Z1-9][a-zA-Z1-9_\\.-]{2,64}
Second, that will check the char doesn't end with ".|-"
[^\\.-]$
In Java
Pattern pattern1 = Pattern.compile("^[a-zA-Z1-9][a-zA-Z1-9_\\.-]{2,64}$");
Pattern pattern2 = Pattern.compile("[^\\.-]$");
Matcher m1 = pattern1.matcher(input);
Matcher m2 = pattern1.matcher(input);
if(m1.find() && m2.find()) {
System.out.println("found");
}

Java regex matches but String.replaceAll() doesn't replace matching substrings

public class test {
public static void main(String[]args) {
String test1 = "Nørrebro, Denmark";
String test2 = "ø";
String regex = new String("^&\\S*;$");
String value = test1.replaceAll(regex,"");
System.out.println(test2.matches(regex));
System.out.println(value);
}
}
This gives me following Output:
true
Nørrebro, Denmark
How is that possible ? Why does replaceAll() not register a match?
Your regex includes ^. Which makes the regex match from the very start.
If you try
test1.matches(regex)
you will get false.
You need to understand what ^ and $ means.
You probably put them in there because you want to say:
At the start of each match, I want a &, then 0 or more non-whitespace characters, then a ; at the end of the match.
However, ^ and $ doesn't mean the start and end of each match. It means the start and end of the string.
So you should remove the ^ and $ from your regex:
String regex = "&\\S*;";
Now it outputs:
true
Nrrebro, Denmark
"What character specifies the start and end of the match then?" you might ask. Well, since your regex basically the pattern you are matching, the start of the regex is the start of the match (unless you have lookbehinds)!
It is possible because ^&\S*;$ pattern matches the entire ø string but it does not match entire Nørrebro, Denmark string. The ^ matches (requires here) start of string to be right before & and $ requires the ; to appear right at the end of the string.
Just removing the ^ and $ anchors may not work, because \S* is a greedy pattern, and it may overmatch, e.g. in Nørrebro;.
You may use &\w+; or &\S+?; pattern, e.g.:
String test1 = "Nørrebro, Denmark";
String regex = "&\\w+;";
String value = test1.replaceAll(regex,"");
System.out.println(value); // => Nrrebro, Denmark
See the Java demo.
The &\w+; pattern matches a &, then any 1+ word chars, and then ;, anywhere inside the string. \S*? matches any 0+ chars other than whitespace.
You can use this regex : &(.*?);
String test1 = "Nørrebro, Denmark";
String test2 = "ø";
String regex = new String("&(.*?);");
String value = test1.replaceAll(regex,"");
System.out.println(test2.matches(regex));
System.out.println(value);
output :
true
Nrrebro, Denmark

How to find and skip special characters at the start and end of the word

New to regex and using following code to find if a word contains special characters at the end/start.
String s = "K-factor:";
String regExp = "^[^<>{}\"/|;:.,~!?##$%^=&*\\]\\\\()\\[0-9_+]*$";
Matcher matcher = Pattern.compile(regExp).matcher(s);
while (matcher.find()) {
System.out.println("Start: "+ matcher.start());
System.out.println("End: "+ matcher.end());
System.out.println("Group: "+ matcher.group());
s = s.substring(0, matcher.start());
}
Would like to find if there's any special character(: in this sample code) at the start or end of the string. Trying to skip the character.
Neither compile time error nor output.
Note that your regex matches a whole string that does not contain the chars you defined in the character class. The string in question does not match that pattern since it contains :.
You might consider splitting the pattern into two parts to check for the unwanted chars at the start or end using an alternation group:
String regExp = "^[<>{}\"/|;:.,~!?##$%^=&*\\]\\\\()\\[0-9_+]|[<>{}\"/|;:.,~!?##$%^=&*\\]\\\\()\\[0-9_+]$";
Here, the pattern has a ^<special_char_class>|<special_char_class>$ structure, ^ anchors the match at start, $ anchors the match at the string end, and | is the alternation operator. Note I removed the ^ from the start of the character class to make them positive rather than negated, so that they could match those chars/ranges defined in the class.
Alternatively, since you seem to just match a string if it contains a non-letter at the start/end, you may use a
String regExp = "^\\P{L}|\\P{L}$";
that is Unicode letter aware or - ASCII only:
String regExp = "^\\P{Alpha}|\\P{Alpha}$";

Regular Expression to extract text containing pipe charcters

I have a string and required an regular expression to extract the substring from a string.
Example: this is a|b|c|d whatever e|f|g|h
Result: a|b|c|d, e|f|g|h
However based on the Java code that I wrote, it is producing the results as follows:
Pattern ptyy = Pattern.compile("\\|*.+? ");
Matcher matcher_values = ptyy.matcher("this is a|b|c|d whatever e|f|g|h");
while (matcher_values.find()) {
String line = matcher_values.group(0);
System.out.println(line);
}
Result
this
is
a|b|c|d
whatever
The result is not what I have hoped for. Any advice?
I think this regex is enough (.\|)+.
see the example
(.\|) this find all the a|b|...| and last . find the last char of the sub-string.
Your \|*.+? pattern matches 0 or more pipes, then 1 or more any chars other than a newline up to the first space. Thus, it matches almost all non-whitespace chunks in a string.
If a, b and c are just placeholders and there can be any non-whitespace chars, I'd suggest:
[^\s|]+(?:\|[^\s|])+
See the regex demo
Details:
[^\s|]+ - 1 or more chars other than whitespace and |
(?:\|[^\s|])+ - 1 or more sequences of:
\| - a literal |
[^\s|] - 1 or more chars other than whitespace and |
Java demo:
Pattern ptyy = Pattern.compile("[^\\s|]+(?:\\|[^\\s|])+");
Matcher matcher_values = ptyy.matcher("this is a|b|c|d whatever e|f|g|h");
while (matcher_values.find()) {
String line = matcher_values.group(0);
System.out.println(line);
}
Based on your advice, i managed to come up with my own regular expression that can address different combination of the pipe expression.
Pattern ptyy = Pattern.compile("[^\\s|]+(?:\\|[^\\s|])+");
Matcher matcher_values = ptyy.matcher("this is a|b|c|d whater e|f|g|h and Az|09|23|A3 and 22|1212|12121|55555");
while (matcher_values.find()) {
String line = matcher_values.group(0);
System.out.println(line);
}
This will enable me to get the result
a|b|c|d
e|f|g|h
Az|09|23|A
22|1212|12121|5
Thanks everyone!

Java Regex Word Extract exclude with special char

below are the String values
"method" <in> abs
("method") <in> abs
method <in> abs
i want to extract only the Word method, i tries with below regex
"(^[^\\<]*)" its included the special char also
O/p for the above regex
"method"
("method")
method
my expected output
method
method
method
^\\W*(\\w+)
You can use this and grab the group 1 or capture 1.See demo.
https://regex101.com/r/sS2dM8/20
A couple of words on your "(^[^<]*)" regex: it does not match because it has beginning of string anchor ^ after ", which is never the case. However, even if you remove it "([^<]*)", it will not match the last case where " and ( are missing. You need to make them optional. And note the brackets must escaped, and the order of quotes and brackets is different than in your input.
So, your regex could be fixed as
^\(?"?(\b[^<]*)\b"?\)?(?=\s+<)
See demo
However, I'd suggest using a replaceAll approach:
String rx = "(?s)\\(?\"?(.*?)\"?\\)?\\s+<.*";
System.out.println("\"My method\" <in> abs".replaceAll(rx, "$1"));
See IDEONE demo
If the strings start with ("My method, you can also add ^ to the beginning of the pattern: String rx = "(?s)^\\(?\"?(.*?)\"?\\)?\\s+<.*";.
The regex (?s)^\\(?\"?(.*?)\"?\\)?\\s+<.* matches:
(?s) makes . match a newline symbol (may not be necessary)
^ - matches the beginning of a string
\\(? - matches an optional (
\"? - matches an optional "
(.*?) - matches and captures into Group 1 any characters as few as possible
\"? - matches an optional "
\\)? - matches an optional )
\\s+ - matches 1 or more whitespace
< - matches a <
.* - matches 0 or more characters to the end of string.
With $1, we restore the group 1 text in the resulting string.
In fact it is not too complicated.
Here is my answer:
Pattern pattern = Pattern.compile("([a-zA-Z]+)");
String[] myStrs = {
"\"method\"",
"(\"method\")",
"method"
};
for(String s:myStrs) {
Matcher matcher = pattern.matcher(s);
if(matcher.find()) {
System.out.println( matcher.group(0) );
}
}
The output is:
method
method
method
You just need to use:
[a-zA-Z]+

Categories