Sample string
astabD (tabD) tabD .tabD tabD. (tabD tabD)
I need to replace tabD with something like temp.tabD for every of the occurrence in the above string except the first and second one.
For this I tried replaceAll with word boundary
str.replaceAll("\b"+ "tabD" + "\b","temp.tabD"))
Works except for the second occurrence. Would appreciate any help since '(' and ')' are also keywords and only the occurrence of both have to be ignored.
You may use
.replaceAll("\\b(?<!\\((?=\\w+\\)))tabD\\b", "")
Or, if tabD comes in from user input:
String s = "astabD (tabD) tabD .tabD tabD. (tabD tabD)";
String word = "tabD";
String wordRx = Pattern.quote(word);
s = s.replaceAll("(?<!\\w)(?<!\\((?=" + wordRx + "\\)))" + wordRx + "(?!\\w)", "");
See the regex demo.
Details
\b - a word boundary ((?<!\w) is an unambiguous left word boundary)
(?<!\((?=\w+\))) - a negative lookbehind that fails the match if right before the current location there is a ( that is followed with 1+ word chars (\w+ is required to match the tabD word) followed with ) (NOTE: If your IDE tells you the + is inside a lookbehind, it is the IDE bug since the + is in a lookahead here and + / * quantifiers are allowed in lookaheads)
tabD - the word to find
\b - a word boundary ((?!\w) is an unambiguous right word boundary)
Java demo:
String s = "astabD (tabD) tabD .tabD tabD. (tabD tabD)";
System.out.println(s.replaceAll("\\b(?<!\\((?=\\w+\\)))tabD\\b", ""));
// => astabD (tabD) . . ( )
Related
I'm new to regex and have been trying to work this out on my own but I don't seem to get it working. I have an input that contains start and end flags and I want to replace a certain char, but only if it's between the flags.
So for example if the start flag is START and the end flag is END and the char i'm trying to replace is " and I would be replacing it with \"
I would say input.replaceAll(regex, '\\\"');
I tried making a regex to only match the correct " chars but so far I have only been able to get it to match all chars between the flags and not just the " chars. -> (?<=START)(.*)(?=END)
Example input:
This " is START an " example input END string ""
START This is a "" second example END
This" is "a START third example END " "
Expected output:
This " is START an \" example input END string ""
START This is a \"\" second example END
This" is "a START third example END " "
Find all characters between START and END, and for those characters replace " with \".
To achieve this, apply a replacer function to all matches of characters between START and END:
string = Pattern.compile("(?<=START).*?(?=END)").matcher(string)
.replaceAll(mr -> mr.group().replace("\"", "\\\\\""));
which produces your expected output.
Some notes on how this works.
This first step is to match all characters between START and END, which uses look arounds with a reluctant quantifier:
(?<=START).*?(?=END)
The ? after the .* changes the match from greedy (as many chars as possible while still matching) to reluctant (as few chars as possible while still matching). This prevents the middle quote in the following input from being altered:
START a"b END c"d START e"f END
A greedy quantifier will match from the first START all the way past the next END to the last END, incorrectly including c"d.
The next step is for each match to replace " with \". The full match is group 0, or just MatchResult#group. and we don't need regex for this replacement - just plain string replace is enough (and yes, replace() replaces all occurrences).
For now i've been able to solve it by creating 3 capture groups and continuously replacing the match until there are no more matches left. In this case I even had to insert a replace indentifier because replacing with " would keep the " char there and create an infinite loop. Then when there are no more matches left I replaced my identifier and i'm now getting the expected result.
I still feel like there has to be a way cleaner way to do this using only 1 replace statement...
Code that worked for me:
class Playground {
public static void main(String[ ] args) {
String input = "\"ThSTARTis is a\" te\"\"stEND \" !!!";
String regex = "(.*START.+)\"+(.*END+.*)";
while(input.matches(regex)){
input = input.replaceAll(regex, "$1---replace---$2");
}
String result = input.replace("---replace---", "\\\"");
System.out.println(result);
}
}
Output:
"ThSTARTis is a\" te\"\"stEND " !!!
I would love any suggestions as to how I could solve this in a better/cleaner way.
Another option is to make use of the \G anchor with 2 capture groups. In the replacement use the 2 capture groups followed by \"
(?:(START)(?=.*END)|\G(?!^))((?:(?!START|END)(?>\\+\"|[^\r\n\"]))*)\"
Explanation
(?: Non capture group
(START)(?=.*END) Capture group 1, match START and assert there is END to the right
| Or
\G(?!^) Assert the current position at the end of the previous match
) Close non capture group
( Capture group 2
(?: Non capture group
(?!START|END) Negative lookhead, assert not START or END directly to the right
(?>\\+\"|[^\r\n\"]) Match 1+ times \ followed by " or match any char except " or a newline
)* Close the non capture group and optionally repeat it
) Close group 2
\" Match "
See a Java regex demo and a Java demo
For example:
String regex = "(?:(START)(?=.*END)|\\G(?!^))((?:(?!START|END)(?>\\\\+\\\"|[^\\r\\n\\\"]))*)\\\"";
String string = "This \" is START an \" example input END string \"\"\n"
+ "START This is a \"\" second example END\n"
+ "This\" is \"a START third example END \" \"";
String subst = "$1$2\\\\\"";
Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
Matcher matcher = pattern.matcher(string);
String result = matcher.replaceAll(subst);
System.out.println(result);
Output
This " is START an \" example input END string ""
START This is a \"\" second example END
This" is "a START third example END " "
I can replace dollar signs by using Matcher.quoteReplacement. I can replace words by adding boundary characters:
from = "\\b" + from + "\\b";
outString = line.replaceAll(from, to);
But I can't seem to combine them to replace words with dollar signs.
Here's an example. I am trying to replace "$temp4" (NOT $temp40) with "register1".
String line = "add, $temp4, $temp40, 42";
String to = "register1";
String from = "$temp4";
String outString;
from = Matcher.quoteReplacement(from);
from = "\\b" + from + "\\b"; //do whole word replacement
outString = line.replaceAll(from, to);
System.out.println(outString);
Outputs
"add, $temp4, $temp40, 42"
How do I get it to replace $temp4 and only $temp4?
Use unambiguous word boundaries, (?<!\w) and (?!\w), instead of \b that are context dependent:
from = "(?<!\\w)" + Pattern.quote(from) + "(?!\\w)";
See the regex demo.
The (?<!\w) is a negative lookbehind that fails the match if there is a non-word char immediately to the left of the current location and (?!\w) is a negative lookahead that fails the match if there is a non-word char immediately to the right of the current location. The Pattern.quote(from) is necessary to escape any special chars in the from variable.
See the Java demo:
String line = "add, $temp4, $temp40, 42";
String to = "register1";
String from = "$temp4";
String outString;
from = "(?<!\\w)" + Pattern.quote(from) + "(?!\\w)";
outString = line.replaceAll(from, to);
System.out.println(outString);
// => add, register1, $temp40, 42
Matcher.quoteReplacement() is for the replacement string (to), not the regex (from). To include a string literal in the regex, use Pattern.quote():
from = Pattern.quote(from);
$ has special meaning in regex (it means “end of input”). To remove any special meaning from characters in your target, wrap it in regex quote/unquote expressions \Q...\E. Also, because $ is not ”word” character, the word boundary won’t wiork, so use look arounds instead:
line = line.replaceAll("(?<!\\S)\\Q" + from + "\\E(?![^ ,])", to);
Normally, Pattern.quote is the way to go to escape characters that may be specially interpreted by the regex engine.
However, the regular expression is still incorrect, because there is no word boundary before the $ in line; space and $ are both non-word characters. You need to place the word boundary after the $ character. There is no need for Pattern.quote here, because you're escaping things yourself.
String from = "\\$\\btemp4\\b";
Or more simply, because you know there is a word boundary between $ and temp4 already:
String from = "\\$temp4\\b";
The from variable can be constructed from the expression to replace. If from has "$temp4", then you can escape the dollar sign and add a word boundary.
from = "\\" + from + "\\b";
Output:
add, register1, $temp40, 42
I have regexp for check if some text containing word (with ignoring boundary)
String regexp = ".*\\bSOME_WORD_HERE\\b.*";
but this regexp return false when "SOME_WORD" starts with # (hashtag).
Example, without #
String text = "some text and test word";
String matchingWord = "test";
boolean contains = text.matches(".*\\b" + matchingWord + "\\b.*");
// now contains == true;
But with hashtag `contains` was false. Example:
text = "some text and #test word";
matchingWord = "#test";
contains = text.matches(".*\\b" + matchingWord + "\\b.*");
//contains == fasle; but I expect true
The \b# pattern matches a # that is preceded with a word character: a letter, digit or underscore.
If you need to match # that is not preceded with a word char, use a negative lookbehind (?<!\w). Similarly, to make sure the trailing \b matches if a non-word char is there, use (?!\w) negative lookahead:
text.matches("(?s).*(?<!\\w)" + matchingWord + "(?!\\w).*");
Using Pattern.quote(matchingWord) is a good idea if your matchingWord can contain special regex metacharacters.
Alternatively, if you plan to match your search words in between whitespace or start/end of string, you can use (?<!\S) as the initial boundary and (?!\S) as the trailing one
text.matches("(?s).*(?<!\\S)" + matchingWord + "(?!\\S).*");
And one more thing: the .* in the .matches is not the best regex solution. A regex like "(?<!\\S)" + matchingWord + "(?!\\S)" with Matcher#find() will be processed in a much more optimized way, but you will need to initialize the Matcher object for that.
If you are looking for words with leading '#', just simple remove the leading '#' from the searchword and use following regex.
text.matches("#\\b" + matchingWordWithoutLeadingHash + "\\b");
I am writing a function that allows users to search a field of text for search terms that they can enter, and mark them up in some way such as highlighting. What I currently have is:
String text = "This is my (simple) test.";
String searchExpression = "(?i)\\b(" + Pattern.quote(searchTerm) + ")\\b";
String replaceExpression = markupToken + "$1" + markupToken;
String newText = text.replaceAll(searchExpression, replaceExpression);
This works great if search term is "simple"; however, if the user searches for "(simple)" it will not successfully match. If I remove Pattern.quote or the \b's this works fine.
Is there a way to modify the searchExpression that it will work in both of these scenarios?
Your regex is failing because you cannot match \b (word boundary) before and and after ( and ) since these are not considered word characters.
You can tweak your regex as this:
String searchExpression = "(?i)(?<!\\w(?=\\w))(" + Pattern.quote(searchTerm) +
")(?!(?<=\\w)\\w)";
i.e. use lookarounds on either side which means there should not a word character before and after the pattern if search term has a word character at start and end.
i have seen to replace "," to "." by using ".$"|",$", but this logic is not working with alphabets.
i need to replace last letter of a word to another letter for all word in string containing EXAMPLE_TEST using java
this is my code
Pattern replace = Pattern.compile("n$");//here got the real problem
matcher2 = replace.matcher(EXAMPLE_TEST);
EXAMPLE_TEST=matcher2.replaceAll("k");
i also tried "//n$" ,"\n$" etc
Please help me to get the solution
input text=>njan ayman
output text=> njak aymak
Instead of the end of string $ anchor, use a word boundary \b
String s = "njan ayman";
s = s.replaceAll("n\\b", "k");
System.out.println(s); //=> "njak aymak"
You can use lookahead and group matching:
String EXAMPLE_TEST = "njan ayman";
s = EXAMPLE_TEST.replaceAll("(n)(?=\\s|$)", "k");
System.out.println("s = " + s); // prints: s = njak aymak
Explanation:
(n) - the matched word character
(?=\\s|$) - which is followed by a space or at the end of the line (lookahead)
The above is only an example! if you want to switch every comma with a period the middle line should be changed to:
s = s.replaceAll("(,)(?=\\s|$)", "\\.");
Here's how I would set it up:
(?=.\b)\w
Which in Java would need to be escaped as following:
(?=.\\b)\\w
It translates to something like "a character (\w) after (?=) any single character (.) at the end of a word (\b)".
String s = "njan ayman aowkdwo wdonwan. wadawd,.. wadwdawd;";
s = s.replaceAll("(?=.\\b)\\w", "");
System.out.println(s); //nja ayma aowkdw wdonwa. wadaw,.. wadwdaw;
This removes the last character of all words, but leaves following non-alphanumeric characters. You can specify only specific characters to remove/replace by changing the . to something else.
However, the other answers are perfectly good and might achieve exactly what you are looking for.
if (word.endsWith("char oldletter")) {
name = name.substring(0, name.length() - 1 "char newletter");
}