finding number between 2 parenthesis using regular expression - java

In a line I may have (123,456)
I want to find it using pattern in java. What I did is:
Pattern pattern = Pattern.compile("\\W");
Matcher matcher = pattern.matcher("(");
while (matcher.find()) {
System.out.print("Start index: " + matcher.start());
System.out.print(" End index: " + matcher.end() + " ");
}
Input: This is test (123,456)
Output:Start index: 0 End index: 1 (
Why??

I am not sure how \W is going to match it. \W matches a non word character.
You will also have to escape those backslashes.
Round brackets need to be escaped , as by default they are used for grouping.
Maybe the regex you meant was
Pattern pattern = Pattern.compile("\\([,\\d]+\\)");
Matcher matcher = pattern.matcher(inputString);
while (matcher.find()) {
String matched = matcher.group();
//Do something with it
}
Explanation:
\\( # Match (
[,\\d]+ # Match 1+ digits/commas. Don't be surprised if it matches (,,,,,,)
\\) # Match )

To do it in one line:
String num = str.replaceAll(".*\\(([\\d,]+)\\).*", "$1");

Related

Regular Expression in Java. Splitting a string using pattern and matcher

I am trying to get all the matching groups in my string.
My regular expression is "(?<!')/|/(?!')". I am trying to split the string using regular expression pattern and matcher. string needs to be split by using /, but '/'(surrounded by ') this needs to be skipped. for example "One/Two/Three'/'3/Four" needs to be split as ["One", "Two", "Three'/'3", "Four"] but not using .split method.
I am currently the below
// String to be scanned to find the pattern.
String line = "Test1/Test2/Tt";
String pattern = "(?<!')/|/(?!')";
// Create a Pattern object
Pattern r = Pattern.compile(pattern);
// Now create matcher object.
Matcher m = r.matcher(line);
if (m.matches()) {
System.out.println("Found value: " + m.group(0) );
} else {
System.out.println("NO MATCH");
}
But it always saying "NO MATCH". where i am doing wrong? and how to fix that?
Thanks in advance
To get the matches without using split, you might use
[^'/]+(?:'/'[^'/]*)*
Explanation
[^'/]+ Match 1+ times any char except ' or /
(?: Non capture group
'/'[^'/]* Match '/' followed by optionally matching any char except ' or /
)* Close group and optionally repeat it
Regex demo | Java demo
String regex = "[^'/]+(?:'/'[^'/]*)*";
String string = "One/Two/Three'/'3/Four";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
System.out.println(matcher.group(0));
}
Output
One
Two
Three'/'3
Four
Edit
If you do not want to split don't you might also use a pattern to not match / but only when surrounded by single quotes
[^/]+(?:(?<=')/(?=')[^/]*)*
Regex demo
Try this.
String line = "One/Two/Three'/'3/Four";
Pattern pattern = Pattern.compile("('/'|[^/])+");
Matcher m = pattern.matcher(line);
while (m.find())
System.out.println(m.group());
output:
One
Two
Three'/'3
Four
Here is simple pattern matching all desired /, so you can split by them:
(?<=[^'])\/(?=')|(?<=')\/(?=[^'])|(?<=[^'])\/(?=[^'])
The logic is as follows: we have 4 cases:
/ is sorrounded by ', i.e. `'/'
/ is preceeded by ', i.e. '/
/ is followed by ', i.e. /'
/ is sorrounded by characters other than '
You want only exclude 1. case. So we need to write regex for three cases, so I have written three similair regexes and used alternation.
Explanation of the first part (other two are analogical):
(?<=[^']) - positiva lookbehind, assert what preceeds is differnt frim ' (negated character class [^']
\/ - match / literally
(?=') - positiva lookahead, assert what follows is '\
Demo with some more edge cases
Try something like this:
String line = "One/Two/Three'/'3/Four";
String pattern = "([^/]+'/'\d)|[^/]+";
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(line);
boolean found = false;
while(m.find()) {
System.out.println("Found value: " + m.group() );
found = true;
}
if(!found) {
System.out.println("NO MATCH");
}
Output:
Found value: One
Found value: Two
Found value: Three'/'3
Found value: Four

Java reg expression capture string

I have the following string:
"(1)name1:content1(2)name2:content2(3)name3:content3...(n)namen:contentn"
what I want to do is to capture each of the name_i and content_i, how can I do this? I should mention that name_i is unknown. For example name1 could be "abc", name2 could be "xyz".
What I have tried:
String regex = "\\(\\d\\)(.*):(.*)(?=\\(\\d\\))";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(str);
if (matcher.find()) {
System.out.println(matcher.group(0);
System.out.println(matcher.group(1);
System.out.println(matcher.group(2);
}
But the results is not very good. I also tried matcher.mathes(), nothing will be returned.
You may use
String s = "(1)name1:content1(2)name2:content2(3)name3:content3...(4)namen:content4";
Pattern pattern = Pattern.compile("\\(\\d+\\)([^:]+):([^(]*(?:\\((?!\\d+\\))[^(]*)*)");
Matcher matcher = pattern.matcher(s);
while (matcher.find()){
System.out.println(matcher.group(1));
System.out.println(matcher.group(2));
}
See the Java demo
Details
\\(\\d+\\) - matches (x) substring where x is 1 or more digits
([^:]+) - Group 1: one or more chars other than :
: - a colon
([^(]*(?:\\((?!\\d+\\))[^(]*)*) - Group 2:
[^(]* - zero or more chars other than (
(?:\\((?!\\d+\\))[^(]*)* - zero or more sequences of:
\\((?!\\d+\\)) - a ( that is not followed with 1+ digits and )
[^(]* - 0+ chars other than (
See the regex demo.
This will work if your name and content does not include any non "word"-boundary characters:
public static void test(String input){
String regexpp = "\\(\\d+\\)(\\w+):(\\w+)";
Pattern p = Pattern.compile(regexpp);
Matcher m = p.matcher(input);
while(m.find()){
System.out.println("Name: " + m.group(1));
System.out.println("Content: " + m.group(2));
}
}
Output:
Name: name1
Content: content1
Name: name2
Content: content2
Name: name3
Content: content3
Name: name99
Content: content99
Your expression matches greedily - your first group eats up the colon first so it won't be possible to match the entire expression. You can use non-greedy matching (using the question mark as in *?) to make your pattern match.
String regex = "\\(\\d\\)(.*?):(.*?)(?=\\(\\d\\))";

Java pattern for [j-*]

Please help me with the pattern matching. I want to build a pattern which will match the word starting with j- or c- in the following in a string (Say for example)
[j-test] is a [c-test]'s name with [foo] and [bar]
The pattern needs to find [j-test] and [c-test] (brackets inclusive).
What I have tried so far?
String template = "[j-test] is a [c-test]'s name with [foo] and [bar]";
Pattern patt = Pattern.compile("\\[[*[j|c]\\-\\w\\-\\+\\d]+\\]");
Matcher m = patt.matcher(template);
while (m.find()) {
System.out.println(m.group());
}
And its giving output like
[j-test]
[c-test]
[foo]
[bar]
which is wrong. Please help me, thanks for your time on this thread.
Inside a character class, you don't need to use alternation to match j or c. Character class itself means, match any single character from the ones inside it. So, [jc] itself will match either j or c.
Also, you don't need to match the pattern that is after j- or c-, as you are not bothered about them, as far as they start with j- or c-.
Simply use this pattern:
Pattern patt = Pattern.compile("\\[[jc]-[^\\]]*\\]");
To explain:
Pattern patt = Pattern.compile("(?x) " // Embedded flag for Pattern.COMMENT
+ "\\[ " // Match starting `[`
+ " [jc] " // Match j or c
+ " - " // then a hyphen
+ " [^ " // A negated character class
+ " \\]" // Match any character except ]
+ " ]* " // 0 or more times
+ "\\] "); // till the closing ]
Using (?x) flag in the regex, ignores the whitespaces. It is often helpful, to write readable regexes.

Why this code don't work properly?

Why this code:
String keyword = "pattern";
String text = "sometextpatternsometext";
String patternStr = "^.*" + keyword + ".*$"; //
Pattern pattern = Pattern.compile(patternStr, Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher(text);
while (matcher.find()) {
int start = matcher.start();
int end = matcher.end();
System.out.println("start = " + start + ", end = " + end);
}
start = 0, end = 23
don't work properly.
But, this code:
String keyword = "pattern";
String text = "sometext pattern sometext";
String patternStr = "\\b" + keyword + "\\b"; //
Pattern pattern = Pattern.compile(patternStr, Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher(text);
while (matcher.find()) {
int start = matcher.start();
int end = matcher.end();
System.out.println("start = " + start + ", end = " + end);
}
start = 9, end = 16
work fine.
It does work. Your pattern
^.*pattern.*$
says to match:
start at the beginning
accept any number of characters
followed by the string pattern
followed by any number of characters
until the end of the string
The result is the entire input string. If you wanted to find only the word pattern, then the regex would be just the word by itself, or as you found, bracketed with word-boundary metacharacters.
It is not that the first example didn't work, it is that you inadvertently asked it to match more than you meant.
The .* expressions expand to contain all the characters before "pattern" and all the characters after pattern, so the whole expression matches the whole line.
With your second example, you only specify that it match a blank space before and after "pattern" so the expression matches mostly pattern, plus a couple of spaces.
The problem is in your regex: "^.*" + keyword + ".*$"
The expression .* matches as many characters as there are in the string. It means that it actually matches whole string. After the whole string it cannot find your keyword.
To make it working you have to make it greedy, i.e. add question sign after .*:
"^.*?" + keyword + ".*$"
This time .*? matches minimum characters followed by your keyword.

Java and regular expression, substring

I'm am tottaly lost when coming to regular expressions.
I get generated strings like:
Your number is (123,456,789)
How can I filter out 123,456,789?
You can use this regex for extracting the number including the commas
\(([\d,]*)\)
The first captured group will have your match. Code will look like this
String subjectString = "Your number is (123,456,789)";
Pattern regex = Pattern.compile("\\(([\\d,]*)\\)");
Matcher regexMatcher = regex.matcher(subjectString);
if (regexMatcher.find()) {
String resultString = regexMatcher.group(1);
System.out.println(resultString);
}
Explanation of the regex
"\\(" + // Match the character “(” literally
"(" + // Match the regular expression below and capture its match into backreference number 1
"[\\d,]" + // Match a single character present in the list below
// A single digit 0..9
// The character “,”
"*" + // Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
")" +
"\\)" // Match the character “)” literally
This will get you started http://www.regular-expressions.info/reference.html
String str="Your number is (123,456,789)";
str = str.replaceAll(".*\\((.*)\\).*","$1");
or you can make the replacement a bit faster by doing:
str = str.replaceAll(".*\\(([\\d,]*)\\).*","$1");
try
"\\(([^)]+)\\)"
or
int start = text.indexOf('(')+1;
int end = text.indexOf(')', start);
String num = text.substring(start, end);
private void showHowToUseRegex()
{
final Pattern MY_PATTERN = Pattern.compile("Your number is \\((\\d+),(\\d+),(\\d+)\\)");
final Matcher m = MY_PATTERN.matcher("Your number is (123,456,789)");
if (m.matches()) {
Log.d("xxx", "0:" + m.group(0));
Log.d("xxx", "1:" + m.group(1));
Log.d("xxx", "2:" + m.group(2));
Log.d("xxx", "3:" + m.group(3));
}
}
You'll see the first group is the whole string, and the next 3 groups are your numbers.
String str = "Your number is (123,456,789)";
str = new String(str.substring(16,str.length()-1));

Categories