I want regular expression to find a word between $$ sign only. It must start and end with $ sign. I have tried below expression
final String regex = "\\$\\w+\\$";
final Pattern pattern = Pattern.compile(regex);
final Matcher matcher = pattern.matcher("$abc$ cde$efg$hij pqr");
This should give me count as 1. But my regular expression also considering second occurrence of (cde$efg$hij) which it should not consider as it is not starting and ending with $$ sign.
You may use non-word boundaries:
final String regex = "\\B\\$\\w+\\$\\B";
The pattern will only match if the $abc$ is not preceded and followed with word chars. See the regex demo.
See Java demo:
String regex = "\\B\\$\\w+\\$\\B";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher("$abc$ cde$efg$hij pqr");
while (matcher.find()){
System.out.println(matcher.group(0));
} // => $abc$
Besides non-word boundaries, you may use whitespace boundaries if you only want to match in between whitespace chars or start/end of string:
String regex = "(?<!\\S)\\$\\w+\\$(?!\\S)";
Or, use unambiguous word boundaries (as I call them):
String regex = "(?<!\\w)\\$\\w+\\$(?!\\w)";
The (?<!\\w) negative lookbehind will fail the match if a word char is found immediately to the left of the current location, and the (?!\w) negative lookahead will fail the match if a word char is found immediately to the right of the current location.
The problem was extracting fields between dollar signs for me.
List<String> getFieldNames(#NotNull String str) {
final String regex = "\\$(\\w+)\\$";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(str);
List<String> fields = new ArrayList<>();
while (matcher.find()) {
fields.add(matcher.group(1));
}
return fields;
}
This will return list of words between dollar signs.
Related
This is my input string
String inputString = "fff.fre def $fff$ £45112,662 $0.33445533 abc,def 12,34"
I tried below regex to split
String[] tokens = inputString.split("(?![$£](?=(\\d)*[.,]?(\\d)*))[\\p{Punct}\\s]");
but it is not preserving comma and dot if they are surrounded by numbers. Basically,I don't want to split by comma and dot if they are part of price value
Output I get is
token==>fff
token==>fre
token==>def
token==>$fff$
token==>£45112
token==>662
token==>$0
token==>33445533
token==>abc
token==>def
token==>12
token==>34
Expected output
token==>fff
token==>fre
token==>def
token==>$fff$
token==>£45112.662
token==>$0.33445533
token==>abc
token==>def
token==>12
token==>34
Instead of split, you may use this simpler regex to get all the desired matches:
[$£]\w+[$£]?|[^\p{Punct}\h]+
RegEx Demo
RegEx Breakup:
[$£]: Match $ or £
\w+: Match 1+ word chars
[$£]?: Match optional $ or £
|: OR
[^\p{Punct}\h]+: Match 1+ of any char that are not whitespace or punctuation
Code:
final String regex = "[$£]\\w+[$£]?|[^\\p{Punct}\\h]+";
final String string = "fff.fre def $fff$ £45112,662 $0.33445533 abc,def 12,34";
final Pattern pattern = Pattern.compile(regex);
final Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
System.out.println("token==>" + matcher.group());
}
Let's say I have a string:
String sentence = "My nieces are Cara:8 Sarah:9 Tara:10";
And I would like to find all their respective names and ages with the following pattern matcher:
String regex = "My\\s+nieces\\s+are((\\s+(\\S+):(\\d+))*)";
Pattern pattern = Pattern.compile;
Matcher matcher = pattern.matcher(sentence);
I understand something like
matcher.find(0); // resets "pointer"
String niece = matcher.group(2);
String nieceName = matcher.group(3);
String nieceAge = matcher.group(4);
would give me my last niece (" Tara:10", "Tara", "10",).
How would I collect all of my nieces instead of only the last, using only one regex/pattern?
I would like to avoid using split string.
Another idea is to use the \G anchor that matches where the previous match ended (or at start).
String regex = "(?:\\G(?!\\A)|My\\s+nieces\\s+are)\\s+(\\S+):(\\d+)";
If My\s+nieces\s+are matches
\G will chain matches from there
(?!\A) neg. lookahead prevents \G from matching at \A start
\s+(\S+):(\d+) using two capturing groups for extraction
See this demo at regex101 or a Java demo at tio.run
Matcher m = Pattern.compile(regex).matcher(sentence);
while (m.find()) {
System.out.println(m.group(1));
System.out.println(m.group(2));
}
You can't iterate over repeating groups, but you can match each group individually, calling find() in a loop to get the details of each one. If they need to be back-to-back, you can iteratively bound your matcher to the last index, like this:
Matcher matcher = Pattern.compile("My\\s+nieces\\s+are").matcher(sentence);
if (matcher.find()) {
int boundary = matcher.end();
matcher = Pattern.compile("^\\s+(\\S+):(\\d+)").matcher(sentence);
while (matcher.region(boundary, sentence.length()).find()) {
System.out.println(matcher.group());
System.out.println(matcher.group(1));
System.out.println(matcher.group(2));
boundary = matcher.end();
}
}
I have a String which contains " Dear user BAL= 1,234/ ".
I want to extract 1,234 from the String using the regular expression. It can be 1,23, 1,2345, 5,213 or 500
final Pattern p=Pattern.compile("((BAL)=*(\\s{1}\\w+))");
final Matcherm m = p.matcher(text);
if(m.find())
return m.group(3);
else
return "";
This returns 3.
What regular expression should I make? I am new to regular expressions.
You search in your regex for word characters \w+ but you should search for digits with \d+.
Additionally there is the comma, so you need to match that as well.
I'd use
/.BAL=\s([\d,]+(?=/)./
as pattern and get only the number in the resulting group.
Explanation:
.* match anything before
BAL= match the string "BAL="
\s match a whitespace
( start matching group
[\d,]+ matches every digit or comma one ore more times
(?=/) match the former only if followed by a slash
) end matching group
.* matches anything thereaft
This is untestet, but it should work like this:
final Pattern p=Pattern.compile(".*BAL=\\s([\\d,]+(?=/)).*");
final Matcherm m = p.matcher(text);
if(m.find())
return m.group(1);
else
return "";
According to an online tester, the pattern above matches the text:
BAL= 1,234/
If it didn't have to be extracted by the regular expression you could simply do:
// split on any whitespace into a 4-element array
String[] foo = text.split("\\s+");
return foo[3];
If String X contains String Y, then return the entire word that contains String Y. The idea is that RegEx needs to determine what is the entire word, I assume that regex will look for a whitespace.
String whole = "BARA BERE";
String part = "BAR";
if (whole.contains(part)) {
result = whole.replaceAll("\\bBAR", "");
System.out.println(result);
}
The output should be: BARA
Q1: What is the regex in this case?
Q2: What will be the regex, if the words are delimited by new lines?
If you're searching for a word, you shouldn't be using .replaceAll() but .find(). Since you specified at least that's my interpretation) that a "word" should end at the nearest whitespace character, you can do this:
Pattern regex = Pattern.compile("\\bBAR\\S*");
Matcher regexMatcher = regex.matcher(whole);
if (regexMatcher.find()) {
part = regexMatcher.group();
}
\S* matches zero or more non-whitespace characters (which also excludes newlines).
If you want to allow spaces but forbid newlines within a "word", use [^\r\n] instead of \S.
Match with this regex:
(?=\S*?BAR)\S+
This asserts that the non-whitespace sequence includes the word "BAR".
String whole = "foo foobar barfoo baz";
String part = "foo";
Matcher matcher = Pattern.compile("(?=\\S*?" + part + ")\\S+").matcher(whole);
while (matcher.find()) {
System.out.println(matcher.group());
}
You get:
foo
foobar
barfoo
You can also quote a literal section with \Q\E if your string contains meta-characters:
(?=\S*?\Q%s\E)\S+
String whole = "Dr Smith.";
String part = "th.";
Matcher matcher = Pattern.compile(String.format("(?=\\S*?\\Q%s\\E)\\S+", part)).matcher(whole);
while (matcher.find()) {
System.out.println(matcher.group());
}
The regular expression must be
\bBAR\w*?\b
\b asserts a word boundary.
here assuming part string as BAR
.*? matches any number of characters in a lazy manner. That is until it finds the next word boundary \b
I have a String "REC/LESS FEES/CODE/AU013423".
What could be the regEx expression to match "REC" and "AU013423" (anything that is not surrounded by slashes /)
I am using /^>*/, which works and matches the string within slash's i.e. using this I am able to find "/LESS FEES/CODE/", but I want to negate this to find reverse i.e. REC and AU013423.
Need help on this. Thanks
If you know that you're only looking for alphanumeric data you can use the regex ([A-Z0-9]+)/.*/([A-Z0-9]+) If this matches you will have the two groups which contain the first & final text strings.
This code prints RECAU013423
final String s = "REC/LESS FEES/CODE/AU013423";
final Pattern regex = Pattern.compile("([A-Z0-9]+)/.*/([A-Z0-9]+)", Pattern.CASE_INSENSITIVE);
final Matcher matcher = regex.matcher(s);
if (matcher.matches()) {
System.out.println(matcher.group(1) + matcher.group(2));
}
You can tweak the regex groups as necessary to cover valid characters
Here's another option:
String s = "REC/LESS FEES/CODE/AU013423";
String[] results = s.split("/.*/");
System.out.println(Arrays.toString(results));
// [REC, AU013423]
^[^/]+|[^/]+$
matches anything that occurs before the first or after the last slash in the string (or the entire string if there is no slash present).
To iterate over all matches in a string in Java:
Pattern regex = Pattern.compile("^[^/]+|[^/]+$");
Matcher regexMatcher = regex.matcher(subjectString);
while (regexMatcher.find()) {
// matched text: regexMatcher.group()
// match start: regexMatcher.start()
// match end: regexMatcher.end()
}