I am having trouble with Java Pattern and Matcher. I've included a very simplified example of what I'm trying to do.
I had expected the pattern ".\b" to find the last character of the first word (or "4" in the example), but as I step through the code, m.find() always returns false. What am I missing here?
Why does the following Java code always print out "Not Found"?
Pattern p = Pattern.compile(".\b");
Matcher m = p.matcher("102939384 is a word");
int ixEndWord = 0;
if (m.find()) {
ixEndWord = m.end();
System.out.println("Found: " + ixEndWord);
} else {
System.out.println("Not Found");
}
You need to escape special characters in the regex: ".\\b"
Basically, in a String the backslash has to be escaped. So "\\" becomes the character '\'.
So the String ".\\b" becomes the litteral String ".\b", which will be used by the Pattern.
To expand upton AntonH's comment, whenever you want the "\" character to appear in a regex expression, you have to escape it so that it first appears in the string you are passing in.
As is, ".\b" is the string of a dot . followed by the special backspace character represented by \b, compared to ".\\b", which is the regex .\b.
Related
I can't find the correct way to remove substrings case insensitive equals to "null" and replacing them with an empty string against a huge input data string, which contains many lines and uses ; as a separator.
To simplify here is an example of what I am looking for:
Input string
Steve;nuLL;2;null\n
null;nullo;nUll;Marc\n
....
Expected Output
Steve;;2;\n
;nullo;;Marc\n
...
Code
Matcher matcher = Pattern.compile("(?i)(^|;)(null)(;|$)").matcher(dataStr);
StringBuffer sb = new StringBuffer();
while (matcher.find()) {
matcher.appendReplacement(sb, matcher.group(1) + "" + matcher.group(3));
}
return sb.toString();
Can this be solved by using regex?
EDIT:
From the java code above I only get the first match ever being replaced, but not every appearance in the line and in the data stream. For whatever reason the matcher.find() is only executed once.
return dataStr.replaceAll("(?smi)\\bnull\\b", "");
\b is the word boundary.
(?i) is a command with i=ignore case.
((?s) is DOT_ALL, . matching newline characters too.)
(?m) is MULTI_LINE.
You forgot appendTail, for all after the last replacement.
If the string contains more than one line, add the MULTI_LINE option for reinterpretation of ^ and $. See the javadoc of Pattern.
while (matcher.find()) {
matcher.appendReplacement(sb, matcher.group(1) + "" + matcher.group(3));
}
matcher.appendTail(sb);
Alternatively with lambda:
String result = matcher.replaceAll(mr -> mr.group(1) + mr.group(3));
where mr is a freely named MatchResult provided by replaceAll.
You probably what to replace null as long as it is followed by some characters, like:
first.replaceAll("(?i)(null)(?=[;$\\\n])", "")
You don't need anything fancy:
str = str.replaceAll("(?i)\\bnull\\b", "");
(?1) means "ignore case". \b means "word boundary". Embedded newlines are irrelevant.
New to regex and using following code to find if a word contains special characters at the end/start.
String s = "K-factor:";
String regExp = "^[^<>{}\"/|;:.,~!?##$%^=&*\\]\\\\()\\[0-9_+]*$";
Matcher matcher = Pattern.compile(regExp).matcher(s);
while (matcher.find()) {
System.out.println("Start: "+ matcher.start());
System.out.println("End: "+ matcher.end());
System.out.println("Group: "+ matcher.group());
s = s.substring(0, matcher.start());
}
Would like to find if there's any special character(: in this sample code) at the start or end of the string. Trying to skip the character.
Neither compile time error nor output.
Note that your regex matches a whole string that does not contain the chars you defined in the character class. The string in question does not match that pattern since it contains :.
You might consider splitting the pattern into two parts to check for the unwanted chars at the start or end using an alternation group:
String regExp = "^[<>{}\"/|;:.,~!?##$%^=&*\\]\\\\()\\[0-9_+]|[<>{}\"/|;:.,~!?##$%^=&*\\]\\\\()\\[0-9_+]$";
Here, the pattern has a ^<special_char_class>|<special_char_class>$ structure, ^ anchors the match at start, $ anchors the match at the string end, and | is the alternation operator. Note I removed the ^ from the start of the character class to make them positive rather than negated, so that they could match those chars/ranges defined in the class.
Alternatively, since you seem to just match a string if it contains a non-letter at the start/end, you may use a
String regExp = "^\\P{L}|\\P{L}$";
that is Unicode letter aware or - ASCII only:
String regExp = "^\\P{Alpha}|\\P{Alpha}$";
i have seen to replace "," to "." by using ".$"|",$", but this logic is not working with alphabets.
i need to replace last letter of a word to another letter for all word in string containing EXAMPLE_TEST using java
this is my code
Pattern replace = Pattern.compile("n$");//here got the real problem
matcher2 = replace.matcher(EXAMPLE_TEST);
EXAMPLE_TEST=matcher2.replaceAll("k");
i also tried "//n$" ,"\n$" etc
Please help me to get the solution
input text=>njan ayman
output text=> njak aymak
Instead of the end of string $ anchor, use a word boundary \b
String s = "njan ayman";
s = s.replaceAll("n\\b", "k");
System.out.println(s); //=> "njak aymak"
You can use lookahead and group matching:
String EXAMPLE_TEST = "njan ayman";
s = EXAMPLE_TEST.replaceAll("(n)(?=\\s|$)", "k");
System.out.println("s = " + s); // prints: s = njak aymak
Explanation:
(n) - the matched word character
(?=\\s|$) - which is followed by a space or at the end of the line (lookahead)
The above is only an example! if you want to switch every comma with a period the middle line should be changed to:
s = s.replaceAll("(,)(?=\\s|$)", "\\.");
Here's how I would set it up:
(?=.\b)\w
Which in Java would need to be escaped as following:
(?=.\\b)\\w
It translates to something like "a character (\w) after (?=) any single character (.) at the end of a word (\b)".
String s = "njan ayman aowkdwo wdonwan. wadawd,.. wadwdawd;";
s = s.replaceAll("(?=.\\b)\\w", "");
System.out.println(s); //nja ayma aowkdw wdonwa. wadaw,.. wadwdaw;
This removes the last character of all words, but leaves following non-alphanumeric characters. You can specify only specific characters to remove/replace by changing the . to something else.
However, the other answers are perfectly good and might achieve exactly what you are looking for.
if (word.endsWith("char oldletter")) {
name = name.substring(0, name.length() - 1 "char newletter");
}
I have the following java mehod and have some conditions for the parameter searchPattern:
public boolean checkPatternMatching(String sourceToScan, String searchPattern) {
boolean patternFounded;
if (sourceToScan == null) {
patternFounded = false;
} else {
Pattern pattern = Pattern.compile(Pattern.quote(searchPattern),
Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher(sourceToScan);
patternFounded = matcher.find();
}
return patternFounded;
}
I want to search for all letter (uppercase and lowercase must be considered) and only (!) the special signs "-", ":" and "=". All other values must be occured a "false" from this method.
How can i implemented this logic for the parameter "searchPattern"?
Try searchPattern = "[a-zA-Z:=-]"
Try this pattern [a-zA-Z=,_!:]
String pattern ="[a-zA-Z=,_!:]";
String input="hello_:,!=";
if(input.matches(pattern)){
System.out.println("true");
}else{
System.out.println("false");
}
"[[a-zA-Z]!-=:\\s]+"
The square bracket mean a character class in which each character in which it will match all character within the brackets. The + means one or more characters in the character class, and the \\s is for spaces.
So if you want just letter an spaces, as per your comment in the original post
"[[a-zA-z]\\s]+"
Use searchPattern as ([a-zA-Z]!-:=)+
searchPattern = "^[A-Za-z!=:-]+$"
^ means "begins with"
$ means "ends with"
[A-Za-z!=:-] is a character class that contains any letter or the symbols !, =, :, -
+ means "1 or more` of the preceding
This will work if the string will solely contain those symbols, ie no spaces or anything else.
If you want a string that contains the given symbols and may also contain whitespace, use:
searchPattern = "^[A-Za-z!=:-\\s]+$"
\\s stands for white-space character
Finally, if you want to simply see if a string contains any one of these symbols, you can use:
searchPattern = "[A-Za-z!=:-]"
I have a string that begins with one or more occurrences of the sequence "Re:". This "Re:" can be of any combinations, for ex. Re<any number of spaces>:, re:, re<any number of spaces>:, RE:, RE<any number of spaces>:, etc.
Sample sequence of string : Re: Re : Re : re : RE: This is a Re: sample string.
I want to define a java regular expression that will identify and strip off all occurrences of Re:, but only the ones at the beginning of the string and not the ones occurring within the string.
So the output should look like This is a Re: sample string.
Here is what I have tried:
String REGEX = "^(Re*\\p{Z}*:?|re*\\p{Z}*:?|\\p{Z}Re*\\p{Z}*:?)";
String INPUT = title;
String REPLACE = "";
Pattern p = Pattern.compile(REGEX);
Matcher m = p.matcher(INPUT);
while(m.find()){
m.appendReplacement(sb,REPLACE);
}
m.appendTail(sb);
I am using p{Z} to match whitespaces(have found this somewhere in this forum, as Java regex does not identify \s).
The problem I am facing with this code is that the search stops at the first match, and escapes the while loop.
Try something like this replace statement:
yourString = yourString.replaceAll("(?i)^(\\s*re\\s*:\\s*)+", "");
Explanation of the regex:
(?i) make it case insensitive
^ anchor to start of string
( start a group (this is the "re:")
\\s* any amount of optional whitespace
re "re"
\\s* optional whitespace
: ":"
\\s* optional whitespace
) end the group (the "re:" string)
+ one or more times
in your regex:
String regex = "^(Re*\\p{Z}*:?|re*\\p{Z}*:?|\\p{Z}Re*\\p{Z}*:?)"
here is what it does:
see it live here
it matches strings like:
\p{Z}Reee\p{Z: or
R\p{Z}}}
which make no sense for what you try to do:
you'd better use a regex like the following:
yourString.replaceAll("(?i)^(\\s*re\\s*:\\s*)+", "");
or to make #Doorknob happy, here's another way to achieve this, using a Matcher:
Pattern p = Pattern.compile("(?i)^(\\s*re\\s*:\\s*)+");
Matcher m = p.matcher(yourString);
if (m.find())
yourString = m.replaceAll("");
(which is as the doc says the exact same thing as yourString.replaceAll())
Look it up here
(I had the same regex as #Doorknob, but thanks to #jlordo for the replaceAll and #Doorknob for thinking about the (?i) case insensitivity part ;-) )