Pattern is not matching when it contains a new line

Pattern is not matching when it contains a new line - java

Here is my code
Pattern pbold = Pattern.compile(".*\\* *(.*?) *\\*.*");
Matcher mbold = pbold.matcher(s);
mbold.find();

What you need is the metacharacter that matches whitespaces charaters: (?s)
This whitespace metacharacter matches:
A space character
A tab character
A carriage return character
A new line character
A vertical tab character
For more info about this special characters, please consult The Java Tutorials - Regular Expressions - Predefined Character Classes.
The code belows matches the case you need:
String s = "abc021\n" +
"34-+\n" +
"*\n" +
"a\n" +
"p\n" +
"p\n" +
"l\n" +
"e\n" +
"*\n" +
"fga32\n" +
"49";
Pattern pbold = Pattern.compile(".*\\* *((?s).*?) *\\*.*");
Matcher mbold = pbold.matcher(s);
mbold.find();
There is also a similar question here:
Regular expression does not match newline obtained from Formatter object

Use flags igm like below:
Pattern pbold = Pattern.compile(".*\\* *(.*?) *\\*.*");
Matcher mbold = pbold.matcher(s, Pattern.MULTILINE|Pattern.CASE_INSENSITIVE|Pattern.DOTALL);
mbold.find();

This regular expression might solve your problem...
Pattern pbold = Pattern.compile(".*\\*[ \n]*(.*?)[ \n]*\\*.*");
Matcher mbold = pbold.matcher(s);
mbold.find();
If this doesn't solve it..please elaborate what you are trying to get through this expression.

Related

Matching a whole word with leading or trailing special symbols like dollar in a string

I can replace dollar signs by using Matcher.quoteReplacement. I can replace words by adding boundary characters:
from = "\\b" + from + "\\b";
outString = line.replaceAll(from, to);
But I can't seem to combine them to replace words with dollar signs.
Here's an example. I am trying to replace "$temp4" (NOT $temp40) with "register1".
String line = "add, $temp4, $temp40, 42";
String to = "register1";
String from = "$temp4";
String outString;
from = Matcher.quoteReplacement(from);
from = "\\b" + from + "\\b"; //do whole word replacement
outString = line.replaceAll(from, to);
System.out.println(outString);
Outputs
"add, $temp4, $temp40, 42"
How do I get it to replace $temp4 and only $temp4?

Use unambiguous word boundaries, (?<!\w) and (?!\w), instead of \b that are context dependent:
from = "(?<!\\w)" + Pattern.quote(from) + "(?!\\w)";
See the regex demo.
The (?<!\w) is a negative lookbehind that fails the match if there is a non-word char immediately to the left of the current location and (?!\w) is a negative lookahead that fails the match if there is a non-word char immediately to the right of the current location. The Pattern.quote(from) is necessary to escape any special chars in the from variable.
See the Java demo:
String line = "add, $temp4, $temp40, 42";
String to = "register1";
String from = "$temp4";
String outString;
from = "(?<!\\w)" + Pattern.quote(from) + "(?!\\w)";
outString = line.replaceAll(from, to);
System.out.println(outString);
// => add, register1, $temp40, 42

Matcher.quoteReplacement() is for the replacement string (to), not the regex (from). To include a string literal in the regex, use Pattern.quote():
from = Pattern.quote(from);

$ has special meaning in regex (it means “end of input”). To remove any special meaning from characters in your target, wrap it in regex quote/unquote expressions \Q...\E. Also, because $ is not ”word” character, the word boundary won’t wiork, so use look arounds instead:
line = line.replaceAll("(?<!\\S)\\Q" + from + "\\E(?![^ ,])", to);

Normally, Pattern.quote is the way to go to escape characters that may be specially interpreted by the regex engine.
However, the regular expression is still incorrect, because there is no word boundary before the $ in line; space and $ are both non-word characters. You need to place the word boundary after the $ character. There is no need for Pattern.quote here, because you're escaping things yourself.
String from = "\\$\\btemp4\\b";
Or more simply, because you know there is a word boundary between $ and temp4 already:
String from = "\\$temp4\\b";
The from variable can be constructed from the expression to replace. If from has "$temp4", then you can escape the dollar sign and add a word boundary.
from = "\\" + from + "\\b";
Output:
add, register1, $temp40, 42

Java Pattern / Matcher not finding word break

I am having trouble with Java Pattern and Matcher. I've included a very simplified example of what I'm trying to do.
I had expected the pattern ".\b" to find the last character of the first word (or "4" in the example), but as I step through the code, m.find() always returns false. What am I missing here?
Why does the following Java code always print out "Not Found"?
Pattern p = Pattern.compile(".\b");
Matcher m = p.matcher("102939384 is a word");
int ixEndWord = 0;
if (m.find()) {
ixEndWord = m.end();
System.out.println("Found: " + ixEndWord);
} else {
System.out.println("Not Found");
}

You need to escape special characters in the regex: ".\\b"
Basically, in a String the backslash has to be escaped. So "\\" becomes the character '\'.
So the String ".\\b" becomes the litteral String ".\b", which will be used by the Pattern.

To expand upton AntonH's comment, whenever you want the "\" character to appear in a regex expression, you have to escape it so that it first appears in the string you are passing in.
As is, ".\b" is the string of a dot . followed by the special backspace character represented by \b, compared to ".\\b", which is the regex .\b.

How to add character "-" and "`" to the string regex?

I'm using the next source:
Matcher mather = Pattern.compile("(\\p{Alnum}*" + subtext + "\\p{Alnum}*)").matcher(ssb.toString());
But if string = "fefrefewre-rfrefrf" or "fefrefewre`rfrefrf" my mather = "fefrefewre"
I need mather = "fefrefewre-rfrefrf" or "fefrefewre`rfrefrf"
How add character "-" and "`" to the string regex?
subtext = "fefref" - for example

It looks like you just want to match the '-' and '`' symbols in addition to the "\p{alpha}".
I think this is the most straightforward solution:
Matcher mather = Pattern.compile("((\\p{Alnum}|[\\-`])*" + subtext + "(\\p{Alnum}|[\\-`])*)").matcher(ssb.toString());

Rather than using POSIX character classes that you don't seem to well understand, you could just add the characters you want to allow to a [] character class
Matcher mather = Pattern.compile("[a-zA-Z0-9`-]*" + subtext + "[a-zA-Z0-9`-]*").matcher(ssb.toString());
The - has to be escaped in a character class unless it is at the start or end of it.

How to provide regular expression for matching $$

I am having String str = "$$\\frac{6}{8}$$"; I want to match for strings using starting with '$$' and ending with '$$'
How to write the regular expression for this?

Try using the regex:
^\$\$.*\$\$$
which in Java will be:
^\\$\\$.*\\$\\$$
A $ is a regex metacharacter used as end anchor. To mean a literal $ you need to escape it with a backslash \.
In Java \ is the escape character in a String and also in the regular expression. So to make a \ reach the regex engine you need to have \\ in the String.
See it

Use this regex string:
"^$$.*$$$"
The ^ anchors the expression to the start of the string being matched, and the last $ anchors it to the end. All other $ characters are taken literally.

You may want something like this:
final String str = "$$\\frac{6}{8}$$";
final String latex = "A display math formula " + str + " and once again " + str + " and another one " + "$$42.$$";
final Pattern pattern = Pattern.compile("\\$\\$([^$]|\\$[^$])+\\$\\$");
final Matcher m = pattern.matcher(latex);
while (m.find()) {
System.out.println(m.group());
}

Can you help with regular expressions in Java?

I have a bunch of strings which may of may not have random symbols and numbers in them. Some examples are:
contains(reserved[j])){
close();
i++){
letters[20]=word
I want to find any character that is NOT a letter, and replace it with a white space, so the above examples look like:
contains reserved j
close
i
letters word
What is the best way to do this?

It depends what you mean by "not a letter", but assuming you mean that letters are a-z or A-Z then try this:
s = s.replaceAll("[^a-zA-Z]", " ");
If you want to collapse multiple symbols into a single space then add a plus at the end of the regular expression.
s = s.replaceAll("[^a-zA-Z]+", " ");

yourInputString = yourInputString.replaceAll("[^\\p{Alpha}]", " ");
^ denotes "all characters except"
\p{Alpha} denotes all alphabetic characters
See Pattern for details.

I want to find any character that is NOT a letter
That will be [^\p{Alpha}]+. The [] indicate a group. The \p{Alpha} matches any alphabetic character (both uppercase and lowercase, it does basically the same as \p{Upper}\p{Lower} and a-zA-Z. The ^ inside group inverses the matches. The + indicates one-or-many matches in sequence.
and replace it with a white space
That will be " ".
Summarized:
string = string.replaceAll("[^\\p{Alpha}]+", " ");
Also see the java.util.regex.Pattern javadoc for a concise overview of available patterns. You can learn more about regexs at the great site http://regular-expression.info.

Use the regexp /[^a-zA-Z]/ which means, everything that is not in the a-z/A-Z characters
In ruby I would do:
"contains(reserved[j]))".gsub(/[^a-zA-Z]/, " ")
=> "contains reserved j "
In Java should be something like:
import java.util.regex.*;
...
String inputStr = "contains(reserved[j])){";
String patternStr = "[^a-zA-Z]";
String replacementStr = " ";
// Compile regular expression
Pattern pattern = Pattern.compile(patternStr);
// Replace all occurrences of pattern in input
Matcher matcher = pattern.matcher(inputStr);
String output = matcher.replaceAll(replacementStr);

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Pattern is not matching when it contains a new line - java

Here is my code Pattern pbold = Pattern.compile(".\\ *(.?) \\."); Matcher mbold = pbold.matcher(s); mbold.find();

Use flags igm like below: Pattern pbold = Pattern.compile(".\\ *(.?) \\."); Matcher mbold = pbold.matcher(s, Pattern.MULTILINE|Pattern.CASE_INSENSITIVE|Pattern.DOTALL); mbold.find();

This regular expression might solve your problem... Pattern pbold = Pattern.compile(".\\[ \n](.?)[ \n]\\.*"); Matcher mbold = pbold.matcher(s); mbold.find(); If this doesn't solve it..please elaborate what you are trying to get through this expression.

Related

Matching a whole word with leading or trailing special symbols like dollar in a string

Java Pattern / Matcher not finding word break

How to add character "-" and "`" to the string regex?

How to provide regular expression for matching $$

Can you help with regular expressions in Java?

Categories

Resources

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Pattern is not matching when it contains a new line - java

Here is my code Pattern pbold = Pattern.compile(".*\\* *(.*?) *\\*.*"); Matcher mbold = pbold.matcher(s); mbold.find();

Use flags igm like below: Pattern pbold = Pattern.compile(".*\\* *(.*?) *\\*.*"); Matcher mbold = pbold.matcher(s, Pattern.MULTILINE|Pattern.CASE_INSENSITIVE|Pattern.DOTALL); mbold.find();

This regular expression might solve your problem... Pattern pbold = Pattern.compile(".*\\*[ \n]*(.*?)[ \n]*\\*.*"); Matcher mbold = pbold.matcher(s); mbold.find(); If this doesn't solve it..please elaborate what you are trying to get through this expression.

Related

Matching a whole word with leading or trailing special symbols like dollar in a string

Java Pattern / Matcher not finding word break

How to add character "-" and "`" to the string regex?

How to provide regular expression for matching $$

Can you help with regular expressions in Java?

Categories

Resources

Here is my code Pattern pbold = Pattern.compile(".\\ *(.?) \\."); Matcher mbold = pbold.matcher(s); mbold.find();

Use flags igm like below: Pattern pbold = Pattern.compile(".\\ *(.?) \\."); Matcher mbold = pbold.matcher(s, Pattern.MULTILINE|Pattern.CASE_INSENSITIVE|Pattern.DOTALL); mbold.find();

This regular expression might solve your problem... Pattern pbold = Pattern.compile(".\\[ \n](.?)[ \n]\\.*"); Matcher mbold = pbold.matcher(s); mbold.find(); If this doesn't solve it..please elaborate what you are trying to get through this expression.