How to provide regular expression for matching $$ - java

I am having String str = "$$\\frac{6}{8}$$"; I want to match for strings using starting with '$$' and ending with '$$'
How to write the regular expression for this?

Try using the regex:
^\$\$.*\$\$$
which in Java will be:
^\\$\\$.*\\$\\$$
A $ is a regex metacharacter used as end anchor. To mean a literal $ you need to escape it with a backslash \.
In Java \ is the escape character in a String and also in the regular expression. So to make a \ reach the regex engine you need to have \\ in the String.
See it

Use this regex string:
"^$$.*$$$"
The ^ anchors the expression to the start of the string being matched, and the last $ anchors it to the end. All other $ characters are taken literally.

You may want something like this:
final String str = "$$\\frac{6}{8}$$";
final String latex = "A display math formula " + str + " and once again " + str + " and another one " + "$$42.$$";
final Pattern pattern = Pattern.compile("\\$\\$([^$]|\\$[^$])+\\$\\$");
final Matcher m = pattern.matcher(latex);
while (m.find()) {
System.out.println(m.group());
}

Related

Using Pattern and Matcher to search for special characters (Example: $)

Apologies if this has already been answered.
I am using the following code to search for a substring:
String subject = "ABC"
String subString = "AB"
Pattern pattern = Pattern.compile(subString, Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher(subject);
while (matcher.find()){
//Matched
}
But when my subject string contains a $ in the beginning, it does not work since it is a special character.
String subject = "$ABC"
String subString = "$"
How does one handle that?
By escaping the special character in the subString. Like,
String subString = "\\$";
or telling the Pattern to match literals. Like,
Pattern pattern = Pattern.compile(subString, Pattern.LITERAL | Pattern.CASE_INSENSITIVE);
There are few meta characters in regex. And some of them which are supported by regex in java are
( ) [ ] { { \ ^ $ | ? * + . < > - = !
So $ is a indeed meta character here. The meta character conveys special meaning to the regex engine and hence can't be use literally. So in order to use them you have to combine them with escape character which is backslash \
So String subject = "\\$ABC"
String subString = "\\$"
would do. Java uses double backslash instead of single for escape character unlike the other regex engine.

Matching a whole word with leading or trailing special symbols like dollar in a string

I can replace dollar signs by using Matcher.quoteReplacement. I can replace words by adding boundary characters:
from = "\\b" + from + "\\b";
outString = line.replaceAll(from, to);
But I can't seem to combine them to replace words with dollar signs.
Here's an example. I am trying to replace "$temp4" (NOT $temp40) with "register1".
String line = "add, $temp4, $temp40, 42";
String to = "register1";
String from = "$temp4";
String outString;
from = Matcher.quoteReplacement(from);
from = "\\b" + from + "\\b"; //do whole word replacement
outString = line.replaceAll(from, to);
System.out.println(outString);
Outputs
"add, $temp4, $temp40, 42"
How do I get it to replace $temp4 and only $temp4?
Use unambiguous word boundaries, (?<!\w) and (?!\w), instead of \b that are context dependent:
from = "(?<!\\w)" + Pattern.quote(from) + "(?!\\w)";
See the regex demo.
The (?<!\w) is a negative lookbehind that fails the match if there is a non-word char immediately to the left of the current location and (?!\w) is a negative lookahead that fails the match if there is a non-word char immediately to the right of the current location. The Pattern.quote(from) is necessary to escape any special chars in the from variable.
See the Java demo:
String line = "add, $temp4, $temp40, 42";
String to = "register1";
String from = "$temp4";
String outString;
from = "(?<!\\w)" + Pattern.quote(from) + "(?!\\w)";
outString = line.replaceAll(from, to);
System.out.println(outString);
// => add, register1, $temp40, 42
Matcher.quoteReplacement() is for the replacement string (to), not the regex (from). To include a string literal in the regex, use Pattern.quote():
from = Pattern.quote(from);
$ has special meaning in regex (it means “end of input”). To remove any special meaning from characters in your target, wrap it in regex quote/unquote expressions \Q...\E. Also, because $ is not ”word” character, the word boundary won’t wiork, so use look arounds instead:
line = line.replaceAll("(?<!\\S)\\Q" + from + "\\E(?![^ ,])", to);
Normally, Pattern.quote is the way to go to escape characters that may be specially interpreted by the regex engine.
However, the regular expression is still incorrect, because there is no word boundary before the $ in line; space and $ are both non-word characters. You need to place the word boundary after the $ character. There is no need for Pattern.quote here, because you're escaping things yourself.
String from = "\\$\\btemp4\\b";
Or more simply, because you know there is a word boundary between $ and temp4 already:
String from = "\\$temp4\\b";
The from variable can be constructed from the expression to replace. If from has "$temp4", then you can escape the dollar sign and add a word boundary.
from = "\\" + from + "\\b";
Output:
add, register1, $temp40, 42

How to find and skip special characters at the start and end of the word

New to regex and using following code to find if a word contains special characters at the end/start.
String s = "K-factor:";
String regExp = "^[^<>{}\"/|;:.,~!?##$%^=&*\\]\\\\()\\[0-9_+]*$";
Matcher matcher = Pattern.compile(regExp).matcher(s);
while (matcher.find()) {
System.out.println("Start: "+ matcher.start());
System.out.println("End: "+ matcher.end());
System.out.println("Group: "+ matcher.group());
s = s.substring(0, matcher.start());
}
Would like to find if there's any special character(: in this sample code) at the start or end of the string. Trying to skip the character.
Neither compile time error nor output.
Note that your regex matches a whole string that does not contain the chars you defined in the character class. The string in question does not match that pattern since it contains :.
You might consider splitting the pattern into two parts to check for the unwanted chars at the start or end using an alternation group:
String regExp = "^[<>{}\"/|;:.,~!?##$%^=&*\\]\\\\()\\[0-9_+]|[<>{}\"/|;:.,~!?##$%^=&*\\]\\\\()\\[0-9_+]$";
Here, the pattern has a ^<special_char_class>|<special_char_class>$ structure, ^ anchors the match at start, $ anchors the match at the string end, and | is the alternation operator. Note I removed the ^ from the start of the character class to make them positive rather than negated, so that they could match those chars/ranges defined in the class.
Alternatively, since you seem to just match a string if it contains a non-letter at the start/end, you may use a
String regExp = "^\\P{L}|\\P{L}$";
that is Unicode letter aware or - ASCII only:
String regExp = "^\\P{Alpha}|\\P{Alpha}$";

Pattern is not matching when it contains a new line

Here is my code
Pattern pbold = Pattern.compile(".*\\* *(.*?) *\\*.*");
Matcher mbold = pbold.matcher(s);
mbold.find();
What you need is the metacharacter that matches whitespaces charaters: (?s)
This whitespace metacharacter matches:
A space character
A tab character
A carriage return character
A new line character
A vertical tab character
For more info about this special characters, please consult The Java Tutorials - Regular Expressions - Predefined Character Classes.
The code belows matches the case you need:
String s = "abc021\n" +
"34-+\n" +
"*\n" +
"a\n" +
"p\n" +
"p\n" +
"l\n" +
"e\n" +
"*\n" +
"fga32\n" +
"49";
Pattern pbold = Pattern.compile(".*\\* *((?s).*?) *\\*.*");
Matcher mbold = pbold.matcher(s);
mbold.find();
There is also a similar question here:
Regular expression does not match newline obtained from Formatter object
Use flags igm like below:
Pattern pbold = Pattern.compile(".*\\* *(.*?) *\\*.*");
Matcher mbold = pbold.matcher(s, Pattern.MULTILINE|Pattern.CASE_INSENSITIVE|Pattern.DOTALL);
mbold.find();
This regular expression might solve your problem...
Pattern pbold = Pattern.compile(".*\\*[ \n]*(.*?)[ \n]*\\*.*");
Matcher mbold = pbold.matcher(s);
mbold.find();
If this doesn't solve it..please elaborate what you are trying to get through this expression.

How to create a java regular expression pattern that would match a string only at certain positon?

I would like to create a regular expression pattern that would succeed in matching only if the pattern string not followed by any other string in the test string or input string ! Here is what i tried :
Pattern p = Pattern.compile("google.com");//I want to know the right format
String input1 = "mail.google.com";
String input2 = "mail.google.com.co.uk";
Matcher m1 = p.matcher(input1);
Matcher m2 = p.matcher(input2);
boolean found1 = m1.find();
boolean found2 = m2.find();//This should be false because "google.com" is followed by ".co.uk" in input2 string
Any help would be appreciated!
Your pattern should be google\.com$. The $ character matches the end of a line. Read about regex boundary matchers for details.
Here is how to match and get the non-matching part as well.
Here is the raw regex pattern as an interactive link to a great regular expression tool
^(.*)google\.com$
^ - match beginning of string
(.*) - capture everything in a group up to the next match
google - matches google literal
\. - matches the . literal has to be escaped with \
com - matches com literal
$ - matches end of string
Note: In Java the \ in the String literal has to be escaped as well! ^(.*)google\\.com$
You should use google\.com$. $ character matches the end of a line.
Pattern p = Pattern.compile("google\\.com$");//I want to know the right format
String input2 = "mail.google.com.co.uk";
Matcher m2 = p.matcher(input2);
boolean found2 = m2.find();
System.out.println(found2);
Output = false
Pattern p = Pattern.compile("google\.com$");
The dollar sign means it has to occur at the end of the line/string being tested. Note too that your dot will match any character, so if you want it to match a dot only, you need to escape it.

Categories