How to add character "-" and "`" to the string regex? - java

I'm using the next source:
Matcher mather = Pattern.compile("(\\p{Alnum}*" + subtext + "\\p{Alnum}*)").matcher(ssb.toString());
But if string = "fefrefewre-rfrefrf" or "fefrefewre`rfrefrf" my mather = "fefrefewre"
I need mather = "fefrefewre-rfrefrf" or "fefrefewre`rfrefrf"
How add character "-" and "`" to the string regex?
subtext = "fefref" - for example

It looks like you just want to match the '-' and '`' symbols in addition to the "\p{alpha}".
I think this is the most straightforward solution:
Matcher mather = Pattern.compile("((\\p{Alnum}|[\\-`])*" + subtext + "(\\p{Alnum}|[\\-`])*)").matcher(ssb.toString());

Rather than using POSIX character classes that you don't seem to well understand, you could just add the characters you want to allow to a [] character class
Matcher mather = Pattern.compile("[a-zA-Z0-9`-]*" + subtext + "[a-zA-Z0-9`-]*").matcher(ssb.toString());
The - has to be escaped in a character class unless it is at the start or end of it.

Related

Matching a whole word with leading or trailing special symbols like dollar in a string

I can replace dollar signs by using Matcher.quoteReplacement. I can replace words by adding boundary characters:
from = "\\b" + from + "\\b";
outString = line.replaceAll(from, to);
But I can't seem to combine them to replace words with dollar signs.
Here's an example. I am trying to replace "$temp4" (NOT $temp40) with "register1".
String line = "add, $temp4, $temp40, 42";
String to = "register1";
String from = "$temp4";
String outString;
from = Matcher.quoteReplacement(from);
from = "\\b" + from + "\\b"; //do whole word replacement
outString = line.replaceAll(from, to);
System.out.println(outString);
Outputs
"add, $temp4, $temp40, 42"
How do I get it to replace $temp4 and only $temp4?
Use unambiguous word boundaries, (?<!\w) and (?!\w), instead of \b that are context dependent:
from = "(?<!\\w)" + Pattern.quote(from) + "(?!\\w)";
See the regex demo.
The (?<!\w) is a negative lookbehind that fails the match if there is a non-word char immediately to the left of the current location and (?!\w) is a negative lookahead that fails the match if there is a non-word char immediately to the right of the current location. The Pattern.quote(from) is necessary to escape any special chars in the from variable.
See the Java demo:
String line = "add, $temp4, $temp40, 42";
String to = "register1";
String from = "$temp4";
String outString;
from = "(?<!\\w)" + Pattern.quote(from) + "(?!\\w)";
outString = line.replaceAll(from, to);
System.out.println(outString);
// => add, register1, $temp40, 42
Matcher.quoteReplacement() is for the replacement string (to), not the regex (from). To include a string literal in the regex, use Pattern.quote():
from = Pattern.quote(from);
$ has special meaning in regex (it means “end of input”). To remove any special meaning from characters in your target, wrap it in regex quote/unquote expressions \Q...\E. Also, because $ is not ”word” character, the word boundary won’t wiork, so use look arounds instead:
line = line.replaceAll("(?<!\\S)\\Q" + from + "\\E(?![^ ,])", to);
Normally, Pattern.quote is the way to go to escape characters that may be specially interpreted by the regex engine.
However, the regular expression is still incorrect, because there is no word boundary before the $ in line; space and $ are both non-word characters. You need to place the word boundary after the $ character. There is no need for Pattern.quote here, because you're escaping things yourself.
String from = "\\$\\btemp4\\b";
Or more simply, because you know there is a word boundary between $ and temp4 already:
String from = "\\$temp4\\b";
The from variable can be constructed from the expression to replace. If from has "$temp4", then you can escape the dollar sign and add a word boundary.
from = "\\" + from + "\\b";
Output:
add, register1, $temp40, 42

java regex- get specific index where not have a specific word before

Im trying to add the double quote on a xml string only on specific place.
Here an example of xml content
<opr:sec name=display>
<opr:fld name=fieldName>Value1</opr:fld>
<opr:fld name=someName>value2</opr:fld>
I need to add double quote like : name="fieldName" and the field names are different each line.
The first double quote are simple using the name= that need to be before
But for the closing double quote i think to use the > sign, but need to avoid the fld at end.
How i regex a letter that don't have a specific text before
Here is a simpler way to do what you want.
Use this regex :
name=([^>]*)>
And replace it by :
name="$1">
You can use capturing blocks, split your line into 3 blocks and reconstruct it from the pieces:
String line = "<opr:fld name=fieldName>Value1</opr:fld>";
String regex = "(.*name=)(.*)(>.*>)";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(line);
matcher.matches();
String result = matcher.group(1) + "\"" + matcher.group(2) + "\"" + matcher.group(3);
System.out.println(result);

Pattern is not matching when it contains a new line

Here is my code
Pattern pbold = Pattern.compile(".*\\* *(.*?) *\\*.*");
Matcher mbold = pbold.matcher(s);
mbold.find();
What you need is the metacharacter that matches whitespaces charaters: (?s)
This whitespace metacharacter matches:
A space character
A tab character
A carriage return character
A new line character
A vertical tab character
For more info about this special characters, please consult The Java Tutorials - Regular Expressions - Predefined Character Classes.
The code belows matches the case you need:
String s = "abc021\n" +
"34-+\n" +
"*\n" +
"a\n" +
"p\n" +
"p\n" +
"l\n" +
"e\n" +
"*\n" +
"fga32\n" +
"49";
Pattern pbold = Pattern.compile(".*\\* *((?s).*?) *\\*.*");
Matcher mbold = pbold.matcher(s);
mbold.find();
There is also a similar question here:
Regular expression does not match newline obtained from Formatter object
Use flags igm like below:
Pattern pbold = Pattern.compile(".*\\* *(.*?) *\\*.*");
Matcher mbold = pbold.matcher(s, Pattern.MULTILINE|Pattern.CASE_INSENSITIVE|Pattern.DOTALL);
mbold.find();
This regular expression might solve your problem...
Pattern pbold = Pattern.compile(".*\\*[ \n]*(.*?)[ \n]*\\*.*");
Matcher mbold = pbold.matcher(s);
mbold.find();
If this doesn't solve it..please elaborate what you are trying to get through this expression.

String.split() at a meta character +

I'm making a simple program that will deal with equations from a String input of the equation
When I run it, however, I get an exception because of trying to replace the " +" with a " +" so i can split the string at the spaces. How should I go about using
the string replaceAll method to replace these special characters? Below is my code
Exception in thread "main" java.util.regex.PatternSyntaxException: Dangling meta character '+' near index 0
+
^
public static void parse(String x){
String z = "x^2+2=2x-1";
String[] lrside = z.split("=",4);
System.out.println("Left side: " + lrside[0] + " / Right Side: " + lrside[1]);
String rightside = lrside[0];
String leftside = lrside[1];
rightside.replaceAll("-", " -");
rightside.replaceAll("+", " +");
leftside.replaceAll("-", " -"); leftside.replaceAll("+", " +");
List<String> rightt = Arrays.asList(rightside.split(" "));
List<String> leftt = Arrays.asList(leftside.split(" "));
System.out.println(leftt);
System.out.println(rightt);
replaceAll accepts a regular expression as its first argument.
+ is a special character which denotes a quantifier meaning one or more occurrences. Therefore it should be escaped to specify the literal character +:
rightside = rightside.replaceAll("\\+", " +");
(Strings are immutable so it is necessary to assign the variable to the result of replaceAll);
An alternative to this is to use a character class which removes the metacharacter status:
rightside = rightside.replaceAll("[+]", " +");
The simplest solution though would be to use the replace method which uses non-regex String literals:
rightside = rightside.replace("+", " +");
I had similar problem with regex = "?". It happens for all special characters that have some meaning in a regex. So you need to have "\\" as a prefix to your regex.
rightside = rightside.replaceAll("\\+", " +");
String#replaceAll expects regex as input, and + is not proper pattern, \\+ would be pattern. rightside.replaceAll("\\+", " +");
The reason behind this is - There are reserved characters for regex. So when you split them using the java split() method, You will have to use them with escape.
FOr example you want to split by + or * or dot(.) then you will have to do it as split("\+") or split("\*") or split("\.") according to your need.
The reason behind my long explanation on regex is -
YOU MAY FACE IT in OTHER PLACES TOO.
For example the same issue will occur if you use replace or replaceAll methods of java Because they are also working based on regex.

Why this code don't work properly?

Why this code:
String keyword = "pattern";
String text = "sometextpatternsometext";
String patternStr = "^.*" + keyword + ".*$"; //
Pattern pattern = Pattern.compile(patternStr, Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher(text);
while (matcher.find()) {
int start = matcher.start();
int end = matcher.end();
System.out.println("start = " + start + ", end = " + end);
}
start = 0, end = 23
don't work properly.
But, this code:
String keyword = "pattern";
String text = "sometext pattern sometext";
String patternStr = "\\b" + keyword + "\\b"; //
Pattern pattern = Pattern.compile(patternStr, Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher(text);
while (matcher.find()) {
int start = matcher.start();
int end = matcher.end();
System.out.println("start = " + start + ", end = " + end);
}
start = 9, end = 16
work fine.
It does work. Your pattern
^.*pattern.*$
says to match:
start at the beginning
accept any number of characters
followed by the string pattern
followed by any number of characters
until the end of the string
The result is the entire input string. If you wanted to find only the word pattern, then the regex would be just the word by itself, or as you found, bracketed with word-boundary metacharacters.
It is not that the first example didn't work, it is that you inadvertently asked it to match more than you meant.
The .* expressions expand to contain all the characters before "pattern" and all the characters after pattern, so the whole expression matches the whole line.
With your second example, you only specify that it match a blank space before and after "pattern" so the expression matches mostly pattern, plus a couple of spaces.
The problem is in your regex: "^.*" + keyword + ".*$"
The expression .* matches as many characters as there are in the string. It means that it actually matches whole string. After the whole string it cannot find your keyword.
To make it working you have to make it greedy, i.e. add question sign after .*:
"^.*?" + keyword + ".*$"
This time .*? matches minimum characters followed by your keyword.

Categories