How to match regex with dollar amounts and phrases in Java?

How to match regex with dollar amounts and phrases in Java? - java

I have this regex
Pattern pa = Pattern.compile("\\b(\\$|hello|world|foo|blah blargh)\\b");
Matcher m = pa.matcher("$");
boolean b = m.matches();
System.out.println(b);
This prints out false, but I'm not sure why.
Why?
https://coderpad.io/GWFMKYQQ --> coderpad if it helps.

The point is that \b word boundary is ambiguous: when it appears after a word character (i.e. a letter, digit or underscore), the next character must a non-word one or the end of string. When \b stands after a non-word character it requires a word character to appear right after it, also excluding the end of the string.
So, if your intent is to match $ only if it is not enclosed with word characters, use unambiguous (?<!\w) and (?!\w) lookarounds:
Pattern pa = Pattern.compile("(?<!\\w)(\\$|hello|world|foo|blah blargh)(?!\\w)")
(?<!\w) will fail the match if the $ is preceded with a word character, and (?!\w) negative lookahead will fail the match if $ is followed with a word character.
NOTE: If you add (?U) (or Pattern.UNICODE_CHARACTER_CLASS flag), \w and \b will become Unicode aware (it might be important in some cases).

I did a bit of research on this, and it turns out, the \b metacharacter does not like dollar signs. You can match a dollar sign after a space by using the regular expression below:
Pattern.compile("(\\s|^)\\$")
And trimming out the preceding whitespace with another regular expression:
Pattern.compile("\\S+")
Alternatively, since this is Java, and not JavaScript's crap regex engine, you can just use this:
Pattern.compile("(?<=\\s)\\$")

Related

Regex not matching against ampersand

I'm trying to match the following regex:
\b(?:mr|mrs|ms|miss|messrs|mmes|dr|prof|rev|sr|jr|&|and)\.?\b
In other words, a word boundary followed by any of the strings above (optionally followed by a period character) followed by a word boundary.
I'm trying to match this in Java, but the ampersand will not match. For example:
Pattern p = Pattern.compile(
"\\b(?:mr|mrs|ms|miss|messrs|mmes|dr|prof|rev|sr|jr|&|and)\\.?\\b",
Pattern.CASE_INSENSITIVE);
String result = p.matcher("mr one and mrs.two and three & four").replaceAll(" ");
System.out.println("["+result+"]");
The output of this is: [ one two three & four]
I've also tried this at regex101, and again the ampersand does not match: https://regex101.com/r/klkmwl/1
Escaping the ampersand does not make a difference, and I've tried using the hex escape sequence \x26 instead of ampersand (as suggested in this question). Why is this not matching?

Your regex will match an ampersand if it is located in between word chars, e.g. three&four, see this regex demo. This happens because \b before a non-word char requires a word char to appear immediately before it. Also, as there is a \b after an optional dot, both the dot and ampersand will only match if there is a word char immediately on the left.
You need to re-write the pattern so that the word boundaries are applied to the words rather than symbols:
Pattern p = Pattern.compile(
"(?:\\b(?:mr|mrs|ms|miss|messrs|mmes|dr|prof|rev|sr|jr|and)\\b|&)\\.?",
Pattern.CASE_INSENSITIVE);
See the regex demo online.

Problem is due to use of word boundaries. There are no word boundaries before or after a non-word character like &.
In place of word boundary you can use lookarounds:
(?<!\w)(?:[jsdm]r|mr?s|miss|messrs|mmes|prof|re|&|and)\.?(?!\w)
Updated RegEx Demo
(?<!\w): Make sure that previous character is not a word character
(?!\w): Make sure that next character is not a word character
Note some tweaks in your regex to make it shorter.

Find regular expression of length specified and starting and ending also specified in Java

I want to find all the words of length 3 with starting with 'l' and ending with 'f'.
Here's my code:
Pattern pt = Pattern.compile("\\bl.+?f{3}\\b");
Matcher mt = pt.matcher("#Java life! Go ahead Java,lyf,fly,luf,loof");
while(mt.find()) {
System.out.println(mt.group());
}
It's showing nothing. tried out this also Pattern pt = Pattern.compile("l.+?f{3}"); still not getting expected o/p.
The o/p should be:
lyf luf

You can use a word boundary \b, then match for l, a word character \w and then f ending with a word boundary \b.
\bl\wf\b
Explanation
Match a word boundary \b
Match l
Match a word character \w (\w is a shorthand character, matches the ASCII characters [A-Za-z0-9_])
Match a f
Match a word boundary \b
Demo

The regex you need is
\bl\wf\b
Explanation:
Since your word must be three character long, that means there can only be one letter between l and f, so that's why I didn't put a quantifier there.
Your regex is wrong because
f{3} means 3 f's, not 3 character long in total
. matches everything, including non word characters. Use \w instead.

word boundary that rejects leading/end non-alphanumeric character

Right now I'm learning regular expression on Java and I have a question about the word boundaries. So when I looking for word boundaries on Java Regular Expression, I got this \b that accepts word bordered by non-word character so this regex
\b123\b
will accepts this string 123 456 but will rejects 456123456. Now I found that a condition like the word !$###%123^^%$# or "123" still got accepted by the regex above. Is there any word boundaries/pattern that rejects word that bordered by non-alphanumeric (except space) like the example above?

You want to use \s instead of \b. That will look for a whitespace character rather than a word boundary.
If you want your first example of 123 456 to be a match, however, then you will also need to use anchors to accept 123 at the immediate start or end of the string. This can be accomplished via (\s|^)123(\s|$). The carat ^ matches the start of the string and $ matches the end of the string.

(?<!\S)123(?!\S)
(?<!\S) matches a position that is not preceded by a non-whitespace character. (negative lookbehind)
(?!\S) matches a position that is not followed by a non-whitespace character. (negative lookahead)
I know this seems gratuitously complicated, but that's because \b conceals a lot of complexity. It's equivalent to this:
(?<=\w)(?!\w)|(?=\w)(?<!\w)
...meaning a position that's preceded by a word character and not followed by one, or a position that's followed by a word character and not preceded by one.

How can I remove all leading and trailing punctuation?

I want to remove all the leading and trailing punctuation in a string. How can I do this?
Basically, I want to preserve punctuation in between words, and I need to remove all leading and trailing punctuation.
., #, _, &, /, - are allowed if surrounded by letters
or digits
\' is allowed if preceded by a letter or digit
I tried
Pattern p = Pattern.compile("(^\\p{Punct})|(\\p{Punct}$)");
Matcher m = p.matcher(term);
boolean a = m.find();
if(a)
term=term.replaceAll("(^\\p{Punct})", "");
but it didn't work!!

Ok. So basically you want to find some pattern in your string and act if the pattern in matched.
Doing this the naiive way would be tedious. The naiive solution could involve something like
while(myString.StartsWith("." || "," || ";" || ...)
myString = myString.Substring(1);
If you wanted to do a bit more complex task, it could be even impossible to do the way i mentioned.
Thats why we use regular expressions. Its a "language" with which you can define a pattern. the computer will be able to say, if a string matches that pattern. To learn about regular expressions, just type it into google. One of the first links: http://www.codeproject.com/Articles/9099/The-30-Minute-Regex-Tutorial
As for your problem, you could try this:
myString.replaceFirst("^[^a-zA-Z]+", "")
The meaning of the regex:
the first ^ means that in this pattern, what comes next has to be at
the start of the string.
The [] define the chars. In this case, those are things that are NOT
(the second ^) letters (a-zA-Z).
The + sign means that the thing before it can be repeated and still
match the regex.
You can use a similar regex to remove trailing chars.
myString.replaceAll("[^a-zA-Z]+$", "");
the $ means "at the end of the string"

You could use a regular expression:
private static final Pattern PATTERN =
Pattern.compile("^\\p{Punct}*(.*?)\\p{Punct}*$");
public static String trimPunctuation(String s) {
Matcher m = PATTERN.matcher(s);
m.find();
return m.group(1);
}
The boundary matchers ^ and $ ensure the whole input is matched.
A dot . matches any single character.
A star * means "match the preceding thing zero or more times".
The parentheses () define a capturing group whose value is retrieved by calling Matcher.group(1).
The ? in (.*?) means you want the match to be non-greedy, otherwise the trailing punctuation would be included in the group.

Use this tutorial on patterns. You have to create a regex that matches string starting with alphabet or number and ending with alphabet or number and do inputString.matches("regex")

java regex until certain word/text/characters

Please consider the following text :
That is, it matches at any position that has a non-word character to the left of it, and a word character to the right of it.
How can I get the following result :
That is, it matches at any position that has a non-word character to the
That is everything until left

input.replace("^(.*?)\\bleft.*$", "$1");
^ anchors to the beginning of the string
.*? matches as little as possible of any character
\b matches a word boundary
left matches the string literal "left"
.* matches the remainder of the string
$ anchors to the end of the string
$1 replaces the matched string with group 1 in ()
If you want to use any word (not just "left"), be careful to escape it. You can use Pattern.quote(word) to escape the string.

The answer is actually /(.*)\Wleft\w/ but it won't match anything in
That is, it matches at any position that has a non-word character to the left of it, and a word character to the right of it.

String result = inputString.replace("(.*?)left.*", "$1");

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to match regex with dollar amounts and phrases in Java? - java

I have this regex Pattern pa = Pattern.compile("\\b(\\$|hello|world|foo|blah blargh)\\b"); Matcher m = pa.matcher("$"); boolean b = m.matches(); System.out.println(b); This prints out false, but I'm not sure why. Why? https://coderpad.io/GWFMKYQQ --> coderpad if it helps.

Related

Regex not matching against ampersand

Find regular expression of length specified and starting and ending also specified in Java

word boundary that rejects leading/end non-alphanumeric character

How can I remove all leading and trailing punctuation?

java regex until certain word/text/characters

Categories

Resources