SCENARIO :
Pattern whitespace = Pattern.compile("^\\s");
matcher = whitespace.matcher(" WhiteSpace");
Pattern whitespace2 = Pattern.compile("^\\s\\s");
matcher2 = whitespace2.matcher(" WhiteSpace");
I am trying to get whitespaces at the beginning of a line. I want to get exact number of white spaces matcher true. My string is " WhiteSpace".
Problem is both matcher and matcher2 work on this string.
The thing I want is:
A pattern that only get 1 white space, but this pattern should not work
for 2 white space string. In the scenario below both matcher.find() and matcher2.find() are true. But matcher.find() should be false, matcher2.find() should be true.
I want matcher to be true for " WhiteSpace", false for " WhiteSpace" (two spaces)
I want matcher2 to be true for :" WhiteSpace".
The thing I want to do is;
I have a string " two whitespaces".
Below both if statements are true. matcher should be false.
matcher2 should be true.
Pattern whitespace = Pattern.compile("^\\s");
matcher = whitespace.matcher(" two whitespaces");
Pattern whitespace2 = Pattern.compile("^\\s\\s");
matcher2 = whitespace2.matcher(" two whitespaces");
if(matcher.find()==true){
//XXXXXXXXXXX
} else if(matcher2.find()==true){
//YYYYYYYYYYY
}
If you want to ensure that after one whitespace there is no another whitespace, but you don't actually want to include that second character which you will test in match (regardless if it was whitespace or not), you can use negative lookahead mechanism (?!..).
So pattern which can match only whitespace at start of line if it doesn't have another whitespace after it may look like
Pattern whitespace = Pattern.compile("^\\s(?!\\s)");
This can be adapted for any number by spaces
Pattern whitespace = Pattern.compile("^\\s{3}(?!\\s)");
A pattern may be an overkill here*. Use Character.isWhitespace and get a simpler code:
String in = " your input here";
int wsPrefix=0;
for ( ; wsPrefix < in.length() && Character.isWhitespace(in.charAt(wsPrefix)) ;
wsPrefix++ ) {}
System.out.println("wsPrefix = " + wsPrefix);
* For it is said:
"Some people, when confronted with a problem, think
“I know, I'll use regular expressions.” Now they have two problems.
-- Jaimie Zawinski, 1997
Related
I am having trouble with Java Pattern and Matcher. I've included a very simplified example of what I'm trying to do.
I had expected the pattern ".\b" to find the last character of the first word (or "4" in the example), but as I step through the code, m.find() always returns false. What am I missing here?
Why does the following Java code always print out "Not Found"?
Pattern p = Pattern.compile(".\b");
Matcher m = p.matcher("102939384 is a word");
int ixEndWord = 0;
if (m.find()) {
ixEndWord = m.end();
System.out.println("Found: " + ixEndWord);
} else {
System.out.println("Not Found");
}
You need to escape special characters in the regex: ".\\b"
Basically, in a String the backslash has to be escaped. So "\\" becomes the character '\'.
So the String ".\\b" becomes the litteral String ".\b", which will be used by the Pattern.
To expand upton AntonH's comment, whenever you want the "\" character to appear in a regex expression, you have to escape it so that it first appears in the string you are passing in.
As is, ".\b" is the string of a dot . followed by the special backspace character represented by \b, compared to ".\\b", which is the regex .\b.
I have a regex to match a line and delete it. Everything is below it (and keep everything above it).
Two Part Ask:
1) Why won't this pattern match the given String text below?
2) How can I be sure to just match on a single line and not multiple lines?
- The pattern has to be found on the same single line.
String text = "Keep this.\n\n\nPlease match junkhere this t-h-i-s is missing.\n"
+ "Everything should be deleted here but don't match this on this line" + "\n\n";
Pattern p = Pattern.compile("^(Please(\\s)(match)(\\s)(.*?)\\sthis\\s(.*))$", Pattern.DOTALL );
Matcher m = p.matcher(text);
if (m.find()) {
text = (m.replaceAll("")).replaceAll("[\n]+$", ""); // remove everything below at and below "Please match ... this"
System.out.println(text);
}
Expected Output:
Keep this.
You are complicating your life...
First, as I said in the comment, use Pattern.MULTILINE.
Then, to truncate the string from the beginning of the match, use .substring():
final Pattern p = Pattern.compile("^Please\\s+match\\b.*?this",
Pattern.MULTILINE);
final Matcher m = p.matcher(input);
return m.find() ? input.substring(0, m.start()) : input;
Remove DOTALL to make sure to match on a single line and convert \s to " "
Pattern p = Pattern.compile("^(Please( )(match)( )(.*?) this (.*))$");
DOTALL makes a dot match newlines as well
\s can match any whitespace including new lines.
I have a string that begins with one or more occurrences of the sequence "Re:". This "Re:" can be of any combinations, for ex. Re<any number of spaces>:, re:, re<any number of spaces>:, RE:, RE<any number of spaces>:, etc.
Sample sequence of string : Re: Re : Re : re : RE: This is a Re: sample string.
I want to define a java regular expression that will identify and strip off all occurrences of Re:, but only the ones at the beginning of the string and not the ones occurring within the string.
So the output should look like This is a Re: sample string.
Here is what I have tried:
String REGEX = "^(Re*\\p{Z}*:?|re*\\p{Z}*:?|\\p{Z}Re*\\p{Z}*:?)";
String INPUT = title;
String REPLACE = "";
Pattern p = Pattern.compile(REGEX);
Matcher m = p.matcher(INPUT);
while(m.find()){
m.appendReplacement(sb,REPLACE);
}
m.appendTail(sb);
I am using p{Z} to match whitespaces(have found this somewhere in this forum, as Java regex does not identify \s).
The problem I am facing with this code is that the search stops at the first match, and escapes the while loop.
Try something like this replace statement:
yourString = yourString.replaceAll("(?i)^(\\s*re\\s*:\\s*)+", "");
Explanation of the regex:
(?i) make it case insensitive
^ anchor to start of string
( start a group (this is the "re:")
\\s* any amount of optional whitespace
re "re"
\\s* optional whitespace
: ":"
\\s* optional whitespace
) end the group (the "re:" string)
+ one or more times
in your regex:
String regex = "^(Re*\\p{Z}*:?|re*\\p{Z}*:?|\\p{Z}Re*\\p{Z}*:?)"
here is what it does:
see it live here
it matches strings like:
\p{Z}Reee\p{Z: or
R\p{Z}}}
which make no sense for what you try to do:
you'd better use a regex like the following:
yourString.replaceAll("(?i)^(\\s*re\\s*:\\s*)+", "");
or to make #Doorknob happy, here's another way to achieve this, using a Matcher:
Pattern p = Pattern.compile("(?i)^(\\s*re\\s*:\\s*)+");
Matcher m = p.matcher(yourString);
if (m.find())
yourString = m.replaceAll("");
(which is as the doc says the exact same thing as yourString.replaceAll())
Look it up here
(I had the same regex as #Doorknob, but thanks to #jlordo for the replaceAll and #Doorknob for thinking about the (?i) case insensitivity part ;-) )
I am trying to find a pattern in the string in java. Below is the code written as-
String line = "10011011001;0110,1001,1001,0,10,11";
String regex ="[A-Za-z]?"; //[A-Za-z2-9\W]?
//create a pattern obj
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(line);
boolean a = m.find();
System.out.println("The value of a is::"+a +" asdsd "+m.group(0));
I am expecting the boolean value to be false, but instead it is always returning as true. Any input or idea where I am going wrong.?
The ? makes the entire character group optional. So your regex essentially means "find any character* ... or not". And the "or not" part means it matches the empty string.
* not really "any", just those characters that are represented in ASCII.
[A-Za-z]? means "zero or one letters". It will always match somewhere in the string; even if there aren't any letters, it will match zero of them.
The below regex should work;
[A-Za-z]?-----> once or not at all
Reference :
http://docs.oracle.com/javase/1.5.0/docs/api/java/util/regex/Pattern.html
String line = "10011011001;0110,1001,1001,0,10,11";
String regex ="[A-Za-z]";// to find letter
String regex ="[A-Za-z]+$";// to find last string..
String regex ="[^0-9,;]";//means non digits and , ;
//create a pattern obj
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(line);
boolean a = m.find();
System.out.println("The value of a is::"+a +" asdsd "+m.group(0));
How can I write a regex that matches anything between two specific characters?
like:
ignore me [take:me] ignore me?
How can I match inclusive [take:me]?
The word take:me is dynamic, so I'd also would like to match [123as d:.-,§""§%]
You can use this regex:
"\\[(.*?)\\]"
This link should help you to understand why it works.
Pattern pattern = Pattern.compile("\\[(.*?)\\]");
Matcher matcher = pattern.matcher("ignore me [take:me] ignore me");
if (matcher.find()) {
System.out.println(matcher.group(1));
}
This will print take:me.
If you want to match &([take:me]) you should use this:
&\\(\\[(.*?)\\]\\)
Not that you should escape chars with special meaning in regex. (like ( and )).
Escaping them is done by adding a backslash, but because backslash in Java is written as \\ then you add \\ before any char that have a special meaning. So by doing \\( you're telling Java:
"Take ( as the regular char and not the special char".
Try (?<=c)(.+)(?=c) where c is the caharacter you're using
The java.util.regex.Matcher class is used to search through a text for multiple occurrences of a regular expression. You can also use a Matcher to search for the same regular expression in different texts.
The Matcher class has a lot of useful methods. For a full list, see the official JavaDoc for the Matcher class. I will cover the core methods here. Here is a list of the methods covered:
Creating a Matcher
Creating a Matcher is done via the matcher() method in the Pattern class. Here is an example:
String text =
"This is the text to be searched " +
"for occurrences of the http:// pattern.";
String patternString = ".*http://.*";
Pattern pattern = Pattern.compile(patternString);
Matcher matcher = pattern.matcher(text);
matches()
The matches() method in the Matcher class matches the regular expression against the whole text passed to the Pattern.matcher() method, when the Matcher was created. Here is an example:
boolean matches = matcher.matches();
If the regular expression matches the whole text, then the matches() method returns true. If not, the matches() method returns false.
You cannot use the matches() method to search for multiple occurrences of a regular expression in a text. For that, you need to use the find(), start() and end() methods.
lookingAt()
The lookingAt() method works like the matches() method with one major difference. The lookingAt() method only matches the regular expression against the beginning of the text, whereas matches() matches the regular expression against the whole text. In other words, if the regular expression matches the beginning of a text but not the whole text, lookingAt() will return true, whereas matches() will return false.
Here is an example:
String text =
"This is the text to be searched " +
"for occurrences of the http:// pattern.";
String patternString = "This is the";
Pattern pattern = Pattern.compile(patternString, Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher(text);
System.out.println("lookingAt = " + matcher.lookingAt());
System.out.println("matches = " + matcher.matches());
find() + start() + end()
The find() method searches for occurrences of the regular expressions in the text passed to the Pattern.matcher(text) method, when the Matcher was created. If multiple matches can be found in the text, the find() method will find the first, and then for each subsequent call to find() it will move to the next match.
The methods start() and end() will give the indexes into the text where the found match starts and ends.
Here is an example:
String text =
"This is the text which is to be searched " +
"for occurrences of the word 'is'.";
String patternString = "is";
Pattern pattern = Pattern.compile(patternString);
Matcher matcher = pattern.matcher(text);
int count = 0;
while(matcher.find()) {
count++;
System.out.println("found: " + count + " : "
+ matcher.start() + " - " + matcher.end());
}
This example will find the pattern "is" four times in the searched string. The output printed will be this:
found: 1 : 2 - 4
found: 2 : 5 - 7
found: 3 : 23 - 25
found: 4 : 70 - 72
You can also refer these tutorials..
Tutorial 1
You can also use lookaround assertions. This way the brackets are not included in the match itself.
(?<=\\[).*?(?=\\])
(?<=\\[) is a positive lookbehind assertion. It is true, when the char "[" is before the match
(?=\\]) is a positive lookahead assertion. It is true, when the char "[" is after the match
.*? is matching any character zero or more times, but as less as possible, because of the modifier ?. It changes the matching behaviour of quantifiers from "greedy" to "lazy".