Store the whole word using RegEx - java

If String X contains String Y, then return the entire word that contains String Y. The idea is that RegEx needs to determine what is the entire word, I assume that regex will look for a whitespace.
String whole = "BARA BERE";
String part = "BAR";
if (whole.contains(part)) {
result = whole.replaceAll("\\bBAR", "");
System.out.println(result);
}
The output should be: BARA
Q1: What is the regex in this case?
Q2: What will be the regex, if the words are delimited by new lines?

If you're searching for a word, you shouldn't be using .replaceAll() but .find(). Since you specified at least that's my interpretation) that a "word" should end at the nearest whitespace character, you can do this:
Pattern regex = Pattern.compile("\\bBAR\\S*");
Matcher regexMatcher = regex.matcher(whole);
if (regexMatcher.find()) {
part = regexMatcher.group();
}
\S* matches zero or more non-whitespace characters (which also excludes newlines).
If you want to allow spaces but forbid newlines within a "word", use [^\r\n] instead of \S.

Match with this regex:
(?=\S*?BAR)\S+
This asserts that the non-whitespace sequence includes the word "BAR".
String whole = "foo foobar barfoo baz";
String part = "foo";
Matcher matcher = Pattern.compile("(?=\\S*?" + part + ")\\S+").matcher(whole);
while (matcher.find()) {
System.out.println(matcher.group());
}
You get:
foo
foobar
barfoo
You can also quote a literal section with \Q\E if your string contains meta-characters:
(?=\S*?\Q%s\E)\S+
String whole = "Dr Smith.";
String part = "th.";
Matcher matcher = Pattern.compile(String.format("(?=\\S*?\\Q%s\\E)\\S+", part)).matcher(whole);
while (matcher.find()) {
System.out.println(matcher.group());
}

The regular expression must be
\bBAR\w*?\b
\b asserts a word boundary.
here assuming part string as BAR
.*? matches any number of characters in a lazy manner. That is until it finds the next word boundary \b

Related

regex find string between 2 characters, seperated by comma

I am new to regular expression and i want to find a string between two characters,
I tried below but it always returns false. May i know whats wrong with this ?
public static void main(String[] args) {
String input = "myFunction(hello ,world, test)";
String patternString = "\\(([^]]+)\\)";
Pattern pattern = Pattern.compile(patternString);
Matcher matcher = pattern.matcher(input);
while (matcher.find()) {
System.out.println(matcher.group());
}
}
Input:
myFunction(hello,world,test) where myFunction can be any characters. before starting ( there can be any characters.
Output:
hello
world
test
You could match make use of the \G anchor which asserts the position at the end of the previous match and and capture your values in a group:
(?:\bmyFunction\(|\G(?!^))([^,]+)(?:\h*,\h*)?(?=[^)]*\))
In Java:
String regex = "(?:\\bmyFunction\\(|\\G(?!^))([^,]+)(?:\\h*,\\h*)?(?=[^)]*\\))";
Explanation
(?: Non capturing group
\bmyFunction\( Word boundary to prevent the match being part of a larger word, match myFunction and an opening parentheses (
| Or
\G(?!^) Assert position at the end of previous match, not at the start of the string
) Close non capturing group
([^,]+) Capture in a group matching 1+ times not a comma
(?:\h*,\h*)? Optionally match a comma surrounded by 0+ horizontal whitespace chars
(?=[^)]*\)) Positive lookahead, assert what is on the right is a closing parenthesis )
Regex demo | Java demo
For example:
String patternString = "(?:\\bmyFunction\\(|\\G(?!^))([^,]+)(?:\\h*,\\h*)?(?=[^)]*\\))";
String input = "myFunction(hello ,world, test)";
Pattern pattern = Pattern.compile(patternString);
Matcher matcher = pattern.matcher(input);
while (matcher.find()) {
System.out.println(matcher.group(1));
}
Result
hello
world
test
I'd suggest you to achieve this in a two-step process:
Step 1: Capture all the content between ( and )
Use the regex: ^\S+\((.*)\)$
Demo
The first and the only capturing group will contain the required text.
Step 2: Split the captured string above on ,, thus yielding all the comma-separated parameters independently.
See this you may get idea
([\w]+),([\w]+),([\w]+)
DEMO: https://rubular.com/r/9HDIwBTacxTy2O

Regex to find a word between $$ sign

I want regular expression to find a word between $$ sign only. It must start and end with $ sign. I have tried below expression
final String regex = "\\$\\w+\\$";
final Pattern pattern = Pattern.compile(regex);
final Matcher matcher = pattern.matcher("$abc$ cde$efg$hij pqr");
This should give me count as 1. But my regular expression also considering second occurrence of (cde$efg$hij) which it should not consider as it is not starting and ending with $$ sign.
You may use non-word boundaries:
final String regex = "\\B\\$\\w+\\$\\B";
The pattern will only match if the $abc$ is not preceded and followed with word chars. See the regex demo.
See Java demo:
String regex = "\\B\\$\\w+\\$\\B";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher("$abc$ cde$efg$hij pqr");
while (matcher.find()){
System.out.println(matcher.group(0));
} // => $abc$
Besides non-word boundaries, you may use whitespace boundaries if you only want to match in between whitespace chars or start/end of string:
String regex = "(?<!\\S)\\$\\w+\\$(?!\\S)";
Or, use unambiguous word boundaries (as I call them):
String regex = "(?<!\\w)\\$\\w+\\$(?!\\w)";
The (?<!\\w) negative lookbehind will fail the match if a word char is found immediately to the left of the current location, and the (?!\w) negative lookahead will fail the match if a word char is found immediately to the right of the current location.
The problem was extracting fields between dollar signs for me.
List<String> getFieldNames(#NotNull String str) {
final String regex = "\\$(\\w+)\\$";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(str);
List<String> fields = new ArrayList<>();
while (matcher.find()) {
fields.add(matcher.group(1));
}
return fields;
}
This will return list of words between dollar signs.

Regex to match the beginning and the end of a string in Java

I want to extract a certain like of string using Regex in Java. I currently have this pattern:
pattern = "^\\a.+\\sed$\n";
Supposed to match on a string that starts with "a" and ends with "sed". This is not working. Did I miss something ?
Removed the \n line at the end of the pattern and replaced it with a "$":
Still doesn't get a match. The regex looks legit from my side.
What I want to extract is the "a sed" from the temp string.
String temp = "afsgdhgd gfgshfdgadh a sed afdsgdhgdsfgdfagdfhh";
pattern = "(?s)^a.*sed$";
pr = Pattern.compile(pattern);
math = pr.matcher(temp);
UPDATE
You want to match a sed, so you can use a\\s+sed if there is only whitespace between a and sed:
String s = "afsgdhgd gfgshfdgadh a sed afdsgdhgdsfgdfagdfhh";
Pattern pattern = Pattern.compile("a\\s+sed");
Matcher matcher = pattern.matcher(s);
while (matcher.find()){
System.out.println(matcher.group(0));
}
See IDEONE demo
Now, if there can be anything between a and sed, use a tempered greedy token:
Pattern pattern = Pattern.compile("(?s)a(?:(?!a|sed).)*sed");
^^^^^^^^^^^^^
See another IDEONE demo.
ORIGINAL ANSWER
The main problem with your regex is the \n at the end. $ is the end of string, and you try to match one more character after a string end, which is impossible. Also, \\s matches a whitespace symbol, but you need a literal s.
You need to remove \\s and \n and make . match a newline, and also it is advisbale to use * quantifier to allow 0 symbols in-between:
pattern = "(?s)^a.*sed$";
See the regex demo
The regex matches:
^ - start of string
a - a literal a
.* - 0 or more any characters (since (?s) modifier makes a . match any character including a newline)
sed - a literal letter sequence sed
$ - end of string
Your temp string cannot match the pattern (?s)^a.*sed$, because this pattern says that your temp string must begin with the character a and end with the sequence sed, which is not the case. Your string has trailing characters after the "sed" sequence.
If you only want to extract that a...sed portion of the whole string, try using the unanchored pattern "a.*sed" and use the find() method of the Matcher class:
Pattern pattern = Pattern.compile("a.*sed");
Matcher m = pattern.matcher(temp);
if (m.find())
{
System.out.println("Found string "+m.group());
System.out.println("From "+m.start()+" to "+m.end());
}

Regular expression java to extract the balance from a string

I have a String which contains " Dear user BAL= 1,234/ ".
I want to extract 1,234 from the String using the regular expression. It can be 1,23, 1,2345, 5,213 or 500
final Pattern p=Pattern.compile("((BAL)=*(\\s{1}\\w+))");
final Matcherm m = p.matcher(text);
if(m.find())
return m.group(3);
else
return "";
This returns 3.
What regular expression should I make? I am new to regular expressions.
You search in your regex for word characters \w+ but you should search for digits with \d+.
Additionally there is the comma, so you need to match that as well.
I'd use
/.BAL=\s([\d,]+(?=/)./
as pattern and get only the number in the resulting group.
Explanation:
.* match anything before
BAL= match the string "BAL="
\s match a whitespace
( start matching group
[\d,]+ matches every digit or comma one ore more times
(?=/) match the former only if followed by a slash
) end matching group
.* matches anything thereaft
This is untestet, but it should work like this:
final Pattern p=Pattern.compile(".*BAL=\\s([\\d,]+(?=/)).*");
final Matcherm m = p.matcher(text);
if(m.find())
return m.group(1);
else
return "";
According to an online tester, the pattern above matches the text:
BAL= 1,234/
If it didn't have to be extracted by the regular expression you could simply do:
// split on any whitespace into a 4-element array
String[] foo = text.split("\\s+");
return foo[3];

RegEX: how to match string which is not surrounded

I have a String "REC/LESS FEES/CODE/AU013423".
What could be the regEx expression to match "REC" and "AU013423" (anything that is not surrounded by slashes /)
I am using /^>*/, which works and matches the string within slash's i.e. using this I am able to find "/LESS FEES/CODE/", but I want to negate this to find reverse i.e. REC and AU013423.
Need help on this. Thanks
If you know that you're only looking for alphanumeric data you can use the regex ([A-Z0-9]+)/.*/([A-Z0-9]+) If this matches you will have the two groups which contain the first & final text strings.
This code prints RECAU013423
final String s = "REC/LESS FEES/CODE/AU013423";
final Pattern regex = Pattern.compile("([A-Z0-9]+)/.*/([A-Z0-9]+)", Pattern.CASE_INSENSITIVE);
final Matcher matcher = regex.matcher(s);
if (matcher.matches()) {
System.out.println(matcher.group(1) + matcher.group(2));
}
You can tweak the regex groups as necessary to cover valid characters
Here's another option:
String s = "REC/LESS FEES/CODE/AU013423";
String[] results = s.split("/.*/");
System.out.println(Arrays.toString(results));
// [REC, AU013423]
^[^/]+|[^/]+$
matches anything that occurs before the first or after the last slash in the string (or the entire string if there is no slash present).
To iterate over all matches in a string in Java:
Pattern regex = Pattern.compile("^[^/]+|[^/]+$");
Matcher regexMatcher = regex.matcher(subjectString);
while (regexMatcher.find()) {
// matched text: regexMatcher.group()
// match start: regexMatcher.start()
// match end: regexMatcher.end()
}

Categories