Java regex matches but String.replaceAll() doesn't replace matching substrings - java

public class test {
public static void main(String[]args) {
String test1 = "Nørrebro, Denmark";
String test2 = "ø";
String regex = new String("^&\\S*;$");
String value = test1.replaceAll(regex,"");
System.out.println(test2.matches(regex));
System.out.println(value);
}
}
This gives me following Output:
true
Nørrebro, Denmark
How is that possible ? Why does replaceAll() not register a match?

Your regex includes ^. Which makes the regex match from the very start.
If you try
test1.matches(regex)
you will get false.

You need to understand what ^ and $ means.
You probably put them in there because you want to say:
At the start of each match, I want a &, then 0 or more non-whitespace characters, then a ; at the end of the match.
However, ^ and $ doesn't mean the start and end of each match. It means the start and end of the string.
So you should remove the ^ and $ from your regex:
String regex = "&\\S*;";
Now it outputs:
true
Nrrebro, Denmark
"What character specifies the start and end of the match then?" you might ask. Well, since your regex basically the pattern you are matching, the start of the regex is the start of the match (unless you have lookbehinds)!

It is possible because ^&\S*;$ pattern matches the entire ø string but it does not match entire Nørrebro, Denmark string. The ^ matches (requires here) start of string to be right before & and $ requires the ; to appear right at the end of the string.
Just removing the ^ and $ anchors may not work, because \S* is a greedy pattern, and it may overmatch, e.g. in Nørrebro;.
You may use &\w+; or &\S+?; pattern, e.g.:
String test1 = "Nørrebro, Denmark";
String regex = "&\\w+;";
String value = test1.replaceAll(regex,"");
System.out.println(value); // => Nrrebro, Denmark
See the Java demo.
The &\w+; pattern matches a &, then any 1+ word chars, and then ;, anywhere inside the string. \S*? matches any 0+ chars other than whitespace.

You can use this regex : &(.*?);
String test1 = "Nørrebro, Denmark";
String test2 = "ø";
String regex = new String("&(.*?);");
String value = test1.replaceAll(regex,"");
System.out.println(test2.matches(regex));
System.out.println(value);
output :
true
Nrrebro, Denmark

Related

Java non-greedy (?) regex to match string

String poolId = "something/something-else/pools[name='test'][scope='lan1']";
String statId = "something/something-else/pools[name='test'][scope='lan1']/stats[base-string='10.10.10.10']";
Pattern pattern = Pattern.compile(".+pools\\[name='.+'\\]\\[scope='.+'\\]$");
What regular expression should be used such that
pattern.matcher(poolId).matches()
returns true whereas
pattern.matcher(statsId).matches()
returns false?
Note that
something/something-else is irrelevant and can be of any length
Both name and scope can have ANY character including any of \, /, [, ] etc
stats[base-string='10.10.10.10'] is an example and there can be anything else after /
I tried to use the non-greedy ? like so .+pools\\[name='.+'\\]\\[scope='.+?'\\]$ but still both matches return true
You can use
.+pools\[name='[^']*'\]\[scope='[^']*'\]$
See the regex demo. Details:
.+ - any one or more chars other than line break chars as many as possible
pools\[name=' - a pools[name='string
[^']* - zero or more chars other than a '
'\]\[scope=' - a '][scope=' string
[^']* - zero or more chars other than a '
'\] - a '] substring
$ - end of string.
In Java:
Pattern pattern = Pattern.compile(".+pools\\[name='[^']*']\\[scope='[^']*']$");
See the Java demo:
//String s = "something/something-else/pools[name='test'][scope='lan1']"; // => Matched!
String s = "something/something-else/pools[name='test'][scope='lan1']/stats[base-string='10.10.10.10']";
Pattern pattern = Pattern.compile(".+pools\\[name='[^']*']\\[scope='[^']*']$");
Matcher matcher = pattern.matcher(s);
if (matcher.find()){
System.out.println("Matched!");
} else {
System.out.println("Not Matched!");
}
// => Not Matched!
Wiktor assumed that your values for name and scope cannot have single quotes in them. Thus the following:
.../pools[name='tes't']
would not match. This is really the only valid assumption to make, as if you can include unescaped single quotes, then what's to stop the value of scope from being (for example) the literal value lan1']/stats[base-string='10.10.10.10? The regex you included in your question has this issue. If you simply must have these values in your code, you need to escape them somehow. Try the following (edit of Wiktor's regex):
.+pools\[name='([^']|\\')*'\]\[scope='([^']|\\')*'\]$

When use java regular-expression pattern.matcher(), source does not match regex.But, my hope result is ,source matches regex

When use java regular-expression pattern.matcher(), source does not match regex.But, my hope result is ,source matches regex.
String source = "ONE.TWO"
String regex = "^ONE\\.TWO\\..*"
import java.util.regex.Pattern;
public class Test {
public static void main(String[] args) {
test();
}
public static void test() {
Test stringDemo = new Test();
stringDemo.testMatcher();
}
public void testMatcher() {
String source = "ONE.TWO";
String regex = "^ONE\\.TWo\\..*";
// The result = false, "not match". But, the hope result is true, "match"
matcher(source, regex);
}
public void matcher(String source, String regex) {
Pattern pattern = Pattern.compile(regex);
boolean match = pattern.matcher(source).matches();
if (match) {
System.out.println("match");
} else {
System.out.println("not match");
}
}
}
In your code, your regular expression expects the o in TWO to be lower case and expects it to be followed by a ..
Try:
String source = "ONE.TWo.";
This will match your regular expression as coded in your question.
The expression \. means match a literal dot (rather than any character). When you code this into a Java String, you have to escape the backslash with another backslash, so it becomes "\\.".
The .* on the end of the expression means "match zero or more of any character (except line-break)".
So this would also match:
String source = "ONE.TWo.blah blah";
Well it doesn't match for two reasons:
Your regex "^ONE\\.TWo\\..*" isn't case sensitive so how do you expect TWo to match TWO.
And your regex expects a . character at the end while your string "ONE.TWO" doesn't have it.
Use the following Regex, to match your source string:
String regex = "^ONE\\.TWO\\.*.*";
Pattern matching is case sensitive by Default. In your case source has a uppercase O and regex a lowercase o.
So you have to add Pattern.CASE_INSENSITIVE or Change the case of o
Pattern pattern = Pattern.compile(regex,Pattern.CASE_INSENSITIVE );
or
String regex = "^ONE\\.TWO\\..*";
Your regex is a bit incorrect. You have an extra dot here:
String regex = "^ONE\.TWO\.(extra dot).*"
Try this one, without dot:
String regex = "^ONE\.TWO.*"
String regex = "^ONE\\.TWO\\..*"
The DOUBLE SLASH \\ in regex is escape sequence to match a SINGLE SLASH \ in Source string.
The .* at the end matches any character 0 or More times except line breaks.
To match the regex your source should be like
String source = "ONE\.TWO\three blah ##$ etc" OR
String source = "ONE\.TWO\.123##$ etc"
Basically its Any String which starts with ONE\.TWO\ and without line breaks.

How to find and skip special characters at the start and end of the word

New to regex and using following code to find if a word contains special characters at the end/start.
String s = "K-factor:";
String regExp = "^[^<>{}\"/|;:.,~!?##$%^=&*\\]\\\\()\\[0-9_+]*$";
Matcher matcher = Pattern.compile(regExp).matcher(s);
while (matcher.find()) {
System.out.println("Start: "+ matcher.start());
System.out.println("End: "+ matcher.end());
System.out.println("Group: "+ matcher.group());
s = s.substring(0, matcher.start());
}
Would like to find if there's any special character(: in this sample code) at the start or end of the string. Trying to skip the character.
Neither compile time error nor output.
Note that your regex matches a whole string that does not contain the chars you defined in the character class. The string in question does not match that pattern since it contains :.
You might consider splitting the pattern into two parts to check for the unwanted chars at the start or end using an alternation group:
String regExp = "^[<>{}\"/|;:.,~!?##$%^=&*\\]\\\\()\\[0-9_+]|[<>{}\"/|;:.,~!?##$%^=&*\\]\\\\()\\[0-9_+]$";
Here, the pattern has a ^<special_char_class>|<special_char_class>$ structure, ^ anchors the match at start, $ anchors the match at the string end, and | is the alternation operator. Note I removed the ^ from the start of the character class to make them positive rather than negated, so that they could match those chars/ranges defined in the class.
Alternatively, since you seem to just match a string if it contains a non-letter at the start/end, you may use a
String regExp = "^\\P{L}|\\P{L}$";
that is Unicode letter aware or - ASCII only:
String regExp = "^\\P{Alpha}|\\P{Alpha}$";

Regex for numeric portion of Java string

I'm trying to write a Java method that will take a string as a parameter and return another string if it matches a pattern, and null otherwise. The pattern:
Starts with a number (1+ digits); then followed by
A colon (":"); then followed by
A single whitespace (" "); then followed by
Any Java string of 1+ characters
Hence, some valid string thats match this pattern:
50: hello
1: d
10938484: 394958558
And some strings that do not match this pattern:
korfed49
: e4949
6
6:
6:sdjjd4
The general skeleton of the method is this:
public String extractNumber(String toMatch) {
// If toMatch matches the pattern, extract the first number
// (everything prior to the colon).
// Else, return null.
}
Here's my best attempt so far, but I know I'm wrong:
public String extractNumber(String toMatch) {
// If toMatch matches the pattern, extract the first number
// (everything prior to the colon).
String regex = "???";
if(toMatch.matches(regex))
return toMatch.substring(0, toMatch.indexOf(":"));
// Else, return null.
return null;
}
Thanks in advance.
Your description is spot on, now it just needs to be translated to a regex:
^ # Starts
\d+ # with a number (1+ digits); then followed by
: # A colon (":"); then followed by
# A single whitespace (" "); then followed by
\w+ # Any word character, one one more times
$ # (followed by the end of input)
Giving, in a Java string:
"^\\d+: \\w+$"
You also want to capture the numbers: put parentheses around \d+, use a Matcher, and capture group 1 if there is a match:
private static final Pattern PATTERN = Pattern.compile("^(\\d+): \\w+$");
// ...
public String extractNumber(String toMatch) {
Matcher m = PATTERN.matcher(toMatch);
return m.find() ? m.group(1) : null;
}
Note: in Java, \w only matches ASCII characters and digits (this is not the case for .NET languages for instance) and it will also match an underscore. If you don't want the underscore, you can use (Java specific syntax):
[\w&&[^_]]
instead of \w for the last part of the regex, giving:
"^(\\d+): [\\w&&[^_]]+$"
Try using the following: \d+: \w+

How to create a java regular expression pattern that would match a string only at certain positon?

I would like to create a regular expression pattern that would succeed in matching only if the pattern string not followed by any other string in the test string or input string ! Here is what i tried :
Pattern p = Pattern.compile("google.com");//I want to know the right format
String input1 = "mail.google.com";
String input2 = "mail.google.com.co.uk";
Matcher m1 = p.matcher(input1);
Matcher m2 = p.matcher(input2);
boolean found1 = m1.find();
boolean found2 = m2.find();//This should be false because "google.com" is followed by ".co.uk" in input2 string
Any help would be appreciated!
Your pattern should be google\.com$. The $ character matches the end of a line. Read about regex boundary matchers for details.
Here is how to match and get the non-matching part as well.
Here is the raw regex pattern as an interactive link to a great regular expression tool
^(.*)google\.com$
^ - match beginning of string
(.*) - capture everything in a group up to the next match
google - matches google literal
\. - matches the . literal has to be escaped with \
com - matches com literal
$ - matches end of string
Note: In Java the \ in the String literal has to be escaped as well! ^(.*)google\\.com$
You should use google\.com$. $ character matches the end of a line.
Pattern p = Pattern.compile("google\\.com$");//I want to know the right format
String input2 = "mail.google.com.co.uk";
Matcher m2 = p.matcher(input2);
boolean found2 = m2.find();
System.out.println(found2);
Output = false
Pattern p = Pattern.compile("google\.com$");
The dollar sign means it has to occur at the end of the line/string being tested. Note too that your dot will match any character, so if you want it to match a dot only, you need to escape it.

Categories