I want to match
String firstName = "CHRIS JAMES MR";
String prefix = "MR"; //mr Mr mR
firstName.matches("^(.*?)(?i)" + prefix + "$")
the current regex will return true even if it's
JOAmR
JOHN HOMR
wherein, it shouldn't.
So basically I'm trying to match
[anyNumberOfChar][anyNumberOfWhiteSpaces][anyCaseOf
"MR"]
I'm using java's String.matches
You could use a word boundary \b
^(.*?)(?i)\bMR$
firstName.matches("^(.*?)(?i)\\b" + prefix + "$");
Demo
Related
Sample string
astabD (tabD) tabD .tabD tabD. (tabD tabD)
I need to replace tabD with something like temp.tabD for every of the occurrence in the above string except the first and second one.
For this I tried replaceAll with word boundary
str.replaceAll("\b"+ "tabD" + "\b","temp.tabD"))
Works except for the second occurrence. Would appreciate any help since '(' and ')' are also keywords and only the occurrence of both have to be ignored.
You may use
.replaceAll("\\b(?<!\\((?=\\w+\\)))tabD\\b", "")
Or, if tabD comes in from user input:
String s = "astabD (tabD) tabD .tabD tabD. (tabD tabD)";
String word = "tabD";
String wordRx = Pattern.quote(word);
s = s.replaceAll("(?<!\\w)(?<!\\((?=" + wordRx + "\\)))" + wordRx + "(?!\\w)", "");
See the regex demo.
Details
\b - a word boundary ((?<!\w) is an unambiguous left word boundary)
(?<!\((?=\w+\))) - a negative lookbehind that fails the match if right before the current location there is a ( that is followed with 1+ word chars (\w+ is required to match the tabD word) followed with ) (NOTE: If your IDE tells you the + is inside a lookbehind, it is the IDE bug since the + is in a lookahead here and + / * quantifiers are allowed in lookaheads)
tabD - the word to find
\b - a word boundary ((?!\w) is an unambiguous right word boundary)
Java demo:
String s = "astabD (tabD) tabD .tabD tabD. (tabD tabD)";
System.out.println(s.replaceAll("\\b(?<!\\((?=\\w+\\)))tabD\\b", ""));
// => astabD (tabD) . . ( )
I have regexp for check if some text containing word (with ignoring boundary)
String regexp = ".*\\bSOME_WORD_HERE\\b.*";
but this regexp return false when "SOME_WORD" starts with # (hashtag).
Example, without #
String text = "some text and test word";
String matchingWord = "test";
boolean contains = text.matches(".*\\b" + matchingWord + "\\b.*");
// now contains == true;
But with hashtag `contains` was false. Example:
text = "some text and #test word";
matchingWord = "#test";
contains = text.matches(".*\\b" + matchingWord + "\\b.*");
//contains == fasle; but I expect true
The \b# pattern matches a # that is preceded with a word character: a letter, digit or underscore.
If you need to match # that is not preceded with a word char, use a negative lookbehind (?<!\w). Similarly, to make sure the trailing \b matches if a non-word char is there, use (?!\w) negative lookahead:
text.matches("(?s).*(?<!\\w)" + matchingWord + "(?!\\w).*");
Using Pattern.quote(matchingWord) is a good idea if your matchingWord can contain special regex metacharacters.
Alternatively, if you plan to match your search words in between whitespace or start/end of string, you can use (?<!\S) as the initial boundary and (?!\S) as the trailing one
text.matches("(?s).*(?<!\\S)" + matchingWord + "(?!\\S).*");
And one more thing: the .* in the .matches is not the best regex solution. A regex like "(?<!\\S)" + matchingWord + "(?!\\S)" with Matcher#find() will be processed in a much more optimized way, but you will need to initialize the Matcher object for that.
If you are looking for words with leading '#', just simple remove the leading '#' from the searchword and use following regex.
text.matches("#\\b" + matchingWordWithoutLeadingHash + "\\b");
I am writing a function that allows users to search a field of text for search terms that they can enter, and mark them up in some way such as highlighting. What I currently have is:
String text = "This is my (simple) test.";
String searchExpression = "(?i)\\b(" + Pattern.quote(searchTerm) + ")\\b";
String replaceExpression = markupToken + "$1" + markupToken;
String newText = text.replaceAll(searchExpression, replaceExpression);
This works great if search term is "simple"; however, if the user searches for "(simple)" it will not successfully match. If I remove Pattern.quote or the \b's this works fine.
Is there a way to modify the searchExpression that it will work in both of these scenarios?
Your regex is failing because you cannot match \b (word boundary) before and and after ( and ) since these are not considered word characters.
You can tweak your regex as this:
String searchExpression = "(?i)(?<!\\w(?=\\w))(" + Pattern.quote(searchTerm) +
")(?!(?<=\\w)\\w)";
i.e. use lookarounds on either side which means there should not a word character before and after the pattern if search term has a word character at start and end.
I found the following regex in one of the Android Source file:
String regex = "\\s+(?i)src=\"cid(?-i):\\Q" + attachment.mContentId + "\\E\"";
if(string.matches(regex)) {
Print -- Matched
} else {
Print -- Not Found
}
NOTE: attachment.mContentId will basically have values like C4EA83841E79F643970AF3F20725CB04#gmail.com
I made a sample code as below:
String content = "Hello src=\"cid:something#gmail.com\" is present";
String contentId = "something#gmail.com";
String regex = "\\s+(?i)src=\"cid(?-i):\\Q" + contentId + "\\E\"";
if(content.matches(regex))
System.out.println("Present");
else
System.out.println("Not Present");
This always gives "Not Present" as output.
But when I am doing the below:
System.out.println(content.replaceAll(regex, " Replaced Value"));
And the output is replaced with new value. If it is Not Present, then how could replaceAll work and replace the new value? Please clear my confusions.
Can anybody say what kind of content in string will make the control go to the if part?
String regex = "\\s+(?i)src=\"cid(?-i):\\Q" + attachment.mContentId + "\\E\"";
Break it down:
\\s+ - Match 1 or more spaces
(?i) - Turn on case-insensitive matching for the subsequent string
src=\"cid - match src="cid
(?-i) - Turn off case-insensitive matching
: - Obviously a colon
\\Q - Treat all following stuff before \\E as literal characters,
and not control characters. Special regex characters are disabled until \\E
attachment.mContentId - whatever your string is
\\E - End the literal quoting sandwich started by \\Q
\" - End quote
So it will match a string like src="cid:YOUR-STRING-LITERAL"
Or, to use your own example, something like this string will match (there are leading white space characters):
src="cid:C4EA83841E79F643970AF3F20725CB04#gmail.com"
For your update
The problem you're running into is using java.lang.String.matches() and expecting it does what you think it should.
String.matches() (and Matcher) has a problem: it tries to match the entire string against the regular expression.
If you use this regex:
String regex = "\\s+(?i)src=\"cid(?-i):\\Q" + attachment.mContentId + "\\E\"";
And this input:
String content = "Hello src=\"cid:something#gmail.com\" is present";
content will never match the regex because the entire string doesn't match the regular expression.
What you want to do is use Matcher.find - this should work for you.
String content = "Hello src=\"cid:something#gmail.com\" is present";
String contentId = "something#gmail.com";
Pattern pattern = Pattern.compile("\\s+(?i)src=\"cid(?-i):\\Q" + contentId + "\\E\"");
Matcher m = pattern.matcher(content);
if(m.find())
System.out.println("Present");
else
System.out.println("Not Present");
IDEone example: https://ideone.com/8RTf0e
That regex will match any
src="cid:contentId"
where only contentId needs to match case sensitive.
For instance giving your example contentId (C4EA83841E79F643970AF3F20725CB04#gmail.com) these strings will match:
SrC="CiD:C4EA83841E79F643970AF3F20725CB04#gmail.com"
src="cid:C4EA83841E79F643970AF3F20725CB04#gmail.com"
SRC="CID:C4EA83841E79F643970AF3F20725CB04#gmail.com"
while these will not match:
src="cid:c4Ea83841e79F643970aF3f20725Cb04#GmaiL.com"
src="cid:C4EA83841E79F643970AF3F20725CB04#GMAIL.COM"
Also the contentId part is escaped (\Q ... \E) so that the regex engine will not consider special characters inside it.
After runing this
Names.replaceAll("^(\\w)\\w+", "$1.")
I have a String Like
Names = F.DA, ABC, EFG
I want a String format like
F.DA, A.BC & E.FG
How do I do that ?
Update :
If I had a name Like
Robert Filip, Robert Morris, Cirstian Jed
I want like
R.Filp, R.Morris & C.Jed
I will be happy, If also you suggest me a good resource on JAVA Regex.
You need to re-assign the result back to Names, since Strings are immutable, the replaceAll methods does not do in place replacement, rather it returns a new String:
names = names.replaceAll(", (?=[^,]*$)", " & ")
Following should work for you:
String names = "Robert Filip, Robert Morris, Cirstian Jed, S.Smith";
String repl = names.replaceAll("((?:^|[^A-Z.])[A-Z])[a-z]*\\s(?=[A-Z])", "$1.")
.replaceAll(", (?=[^,]*$)", " & ");
System.out.println(repl); //=> R.Filip, R.Morris, C.Jed & S.Smith
Explanation:
1st replaceAll call is matching a non-word && non-dot character + a capital letter in group #1 + 0 or more lower case letters + a space which should be followed by 1 capital letter. It is then inserting a dot in front of the match $1.
2ns replaceAll call is matching a comma that is not followed by another comma and replacing that by literal string " & ".
Try this
String names = "Amal.PM , Rakesh.KR , Ajith.N";
names = names.replaceAll(" , (?=[^,]*$)", " & ");
System.out.println("New String : "+names);