I have a String which contains " Dear user BAL= 1,234/ ".
I want to extract 1,234 from the String using the regular expression. It can be 1,23, 1,2345, 5,213 or 500
final Pattern p=Pattern.compile("((BAL)=*(\\s{1}\\w+))");
final Matcherm m = p.matcher(text);
if(m.find())
return m.group(3);
else
return "";
This returns 3.
What regular expression should I make? I am new to regular expressions.
You search in your regex for word characters \w+ but you should search for digits with \d+.
Additionally there is the comma, so you need to match that as well.
I'd use
/.BAL=\s([\d,]+(?=/)./
as pattern and get only the number in the resulting group.
Explanation:
.* match anything before
BAL= match the string "BAL="
\s match a whitespace
( start matching group
[\d,]+ matches every digit or comma one ore more times
(?=/) match the former only if followed by a slash
) end matching group
.* matches anything thereaft
This is untestet, but it should work like this:
final Pattern p=Pattern.compile(".*BAL=\\s([\\d,]+(?=/)).*");
final Matcherm m = p.matcher(text);
if(m.find())
return m.group(1);
else
return "";
According to an online tester, the pattern above matches the text:
BAL= 1,234/
If it didn't have to be extracted by the regular expression you could simply do:
// split on any whitespace into a 4-element array
String[] foo = text.split("\\s+");
return foo[3];
Related
I have a requirement where a string needs to be matched and then extract further value from a that string
I will receive a header in request whose value will be a DN name from ssl certificate. Here need to match a specific string 1.2.3.47 in the header and extract remaining text.
Sample String passed to method:
O=ABC Bank Plc/1.2.3.47=ABC12-PQR-121878, CN=7ltM2wQ3bqlDJdBEURGAMq, L=INDIA, C=INDIA, E=xyz#gmail.com
Here is my code:
private String extractDN(String dnHeader) {
if(!ValidatorUtil.isEmpty(dnHeader)){
String tokens[]=dnHeader.split(",");
if(tokens[0].contains("1.2.3.47")){
int index=tokens[0].lastIndexOf("1.2.3.47");
String id=tokens[0].substring(index+9);
System.out.println(id);
}
}
return id;
}
Can a regex pattern be used here to match and extract value? Is there any better way to achieve this? Please help.
If you want to use a pattern and if you know that the value always starts with a forward slash and if followed by one or more digits separated by a dot and then an equals sign, you could use a capturing group:
/[0-9](?:\\.[0-9]+)+=([^,]+)
/ Match /
[0-9]+ Match 1+ digit 0-9
(?: Non capturing group
\\.[0-9]+ match . and 1+ digits 0-9
)+ Close non capturing group and repeat 1+ times
= Match =
([^,]+) Capture group 1, match 1+ times any char except a ,
Regex demo | Java demo
For example
final String regex = "/[0-9]+(?:\\.[0-9]+)+=([^,]+)";
final String string = "O=ABC Bank Plc/1.2.3.47=ABC12-PQR-121878, CN=7ltM2wQ3bqlDJdBEURGAMq, L=INDIA, C=INDIA, E=xyz#gmail.com";
final Pattern pattern = Pattern.compile(regex);
final Matcher matcher = pattern.matcher(string);
if (matcher.find()) {
System.out.println(matcher.group(1));
}
Output
ABC12-PQR-121878
If you want a more precise match, you could also specify the start of the pattern:
\\bO=\\w+(?:\\h+\\w+)*/[0-9]+(?:\\.[0-9]+)+=([^,]+)
Regex demo
I am new to regular expression and i want to find a string between two characters,
I tried below but it always returns false. May i know whats wrong with this ?
public static void main(String[] args) {
String input = "myFunction(hello ,world, test)";
String patternString = "\\(([^]]+)\\)";
Pattern pattern = Pattern.compile(patternString);
Matcher matcher = pattern.matcher(input);
while (matcher.find()) {
System.out.println(matcher.group());
}
}
Input:
myFunction(hello,world,test) where myFunction can be any characters. before starting ( there can be any characters.
Output:
hello
world
test
You could match make use of the \G anchor which asserts the position at the end of the previous match and and capture your values in a group:
(?:\bmyFunction\(|\G(?!^))([^,]+)(?:\h*,\h*)?(?=[^)]*\))
In Java:
String regex = "(?:\\bmyFunction\\(|\\G(?!^))([^,]+)(?:\\h*,\\h*)?(?=[^)]*\\))";
Explanation
(?: Non capturing group
\bmyFunction\( Word boundary to prevent the match being part of a larger word, match myFunction and an opening parentheses (
| Or
\G(?!^) Assert position at the end of previous match, not at the start of the string
) Close non capturing group
([^,]+) Capture in a group matching 1+ times not a comma
(?:\h*,\h*)? Optionally match a comma surrounded by 0+ horizontal whitespace chars
(?=[^)]*\)) Positive lookahead, assert what is on the right is a closing parenthesis )
Regex demo | Java demo
For example:
String patternString = "(?:\\bmyFunction\\(|\\G(?!^))([^,]+)(?:\\h*,\\h*)?(?=[^)]*\\))";
String input = "myFunction(hello ,world, test)";
Pattern pattern = Pattern.compile(patternString);
Matcher matcher = pattern.matcher(input);
while (matcher.find()) {
System.out.println(matcher.group(1));
}
Result
hello
world
test
I'd suggest you to achieve this in a two-step process:
Step 1: Capture all the content between ( and )
Use the regex: ^\S+\((.*)\)$
Demo
The first and the only capturing group will contain the required text.
Step 2: Split the captured string above on ,, thus yielding all the comma-separated parameters independently.
See this you may get idea
([\w]+),([\w]+),([\w]+)
DEMO: https://rubular.com/r/9HDIwBTacxTy2O
I want regular expression to find a word between $$ sign only. It must start and end with $ sign. I have tried below expression
final String regex = "\\$\\w+\\$";
final Pattern pattern = Pattern.compile(regex);
final Matcher matcher = pattern.matcher("$abc$ cde$efg$hij pqr");
This should give me count as 1. But my regular expression also considering second occurrence of (cde$efg$hij) which it should not consider as it is not starting and ending with $$ sign.
You may use non-word boundaries:
final String regex = "\\B\\$\\w+\\$\\B";
The pattern will only match if the $abc$ is not preceded and followed with word chars. See the regex demo.
See Java demo:
String regex = "\\B\\$\\w+\\$\\B";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher("$abc$ cde$efg$hij pqr");
while (matcher.find()){
System.out.println(matcher.group(0));
} // => $abc$
Besides non-word boundaries, you may use whitespace boundaries if you only want to match in between whitespace chars or start/end of string:
String regex = "(?<!\\S)\\$\\w+\\$(?!\\S)";
Or, use unambiguous word boundaries (as I call them):
String regex = "(?<!\\w)\\$\\w+\\$(?!\\w)";
The (?<!\\w) negative lookbehind will fail the match if a word char is found immediately to the left of the current location, and the (?!\w) negative lookahead will fail the match if a word char is found immediately to the right of the current location.
The problem was extracting fields between dollar signs for me.
List<String> getFieldNames(#NotNull String str) {
final String regex = "\\$(\\w+)\\$";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(str);
List<String> fields = new ArrayList<>();
while (matcher.find()) {
fields.add(matcher.group(1));
}
return fields;
}
This will return list of words between dollar signs.
If String X contains String Y, then return the entire word that contains String Y. The idea is that RegEx needs to determine what is the entire word, I assume that regex will look for a whitespace.
String whole = "BARA BERE";
String part = "BAR";
if (whole.contains(part)) {
result = whole.replaceAll("\\bBAR", "");
System.out.println(result);
}
The output should be: BARA
Q1: What is the regex in this case?
Q2: What will be the regex, if the words are delimited by new lines?
If you're searching for a word, you shouldn't be using .replaceAll() but .find(). Since you specified at least that's my interpretation) that a "word" should end at the nearest whitespace character, you can do this:
Pattern regex = Pattern.compile("\\bBAR\\S*");
Matcher regexMatcher = regex.matcher(whole);
if (regexMatcher.find()) {
part = regexMatcher.group();
}
\S* matches zero or more non-whitespace characters (which also excludes newlines).
If you want to allow spaces but forbid newlines within a "word", use [^\r\n] instead of \S.
Match with this regex:
(?=\S*?BAR)\S+
This asserts that the non-whitespace sequence includes the word "BAR".
String whole = "foo foobar barfoo baz";
String part = "foo";
Matcher matcher = Pattern.compile("(?=\\S*?" + part + ")\\S+").matcher(whole);
while (matcher.find()) {
System.out.println(matcher.group());
}
You get:
foo
foobar
barfoo
You can also quote a literal section with \Q\E if your string contains meta-characters:
(?=\S*?\Q%s\E)\S+
String whole = "Dr Smith.";
String part = "th.";
Matcher matcher = Pattern.compile(String.format("(?=\\S*?\\Q%s\\E)\\S+", part)).matcher(whole);
while (matcher.find()) {
System.out.println(matcher.group());
}
The regular expression must be
\bBAR\w*?\b
\b asserts a word boundary.
here assuming part string as BAR
.*? matches any number of characters in a lazy manner. That is until it finds the next word boundary \b
Given this input string:
String input = "some text ERA-00924: table does not exists</div";
How can I match everything between 'ERA-00924' and the first '<' character with a Java regular expression?
I am currently able to capture the 'ERA-00924' part with the following:
Pattern pattern = Pattern.compile("(ERA-\\d\\d\\d\\d\\d)");
Matcher matcher = pattern.matcher(input);
if( matcher.find() )
{
String target = matcher.group();
}
But I am struggling to match all the way to the first '<' character (but not including).
You can use this regex:
ERA-\\d{5}([^<]*)
And use group 1 for your value using:
matcher.group(1)