How to capture multiple groups in regex? - java

I am trying to capture following word, number:
stxt:usa,city:14
I can capture usa and 14 using:
stxt:(.*?),city:(\d.*)$
However, when text is;
stxt:usa
The regex did not work. I tried to apply or condition using | but it did not work.
stxt:(.*?),|city:(\d.*)$

You may use
(stxt|city):([^,]+)
See the regex demo (note the \n added only for the sake of the demo, you do not need it in real life).
Pattern details:
(stxt|city) - either a stxt or city substrings (you may add \b before the ( to only match a whole word) (Group 1)
: - a colon
([^,]+) - 1 or more characters other than a comma (Group 2).
Java demo:
String s = "stxt:usa,city:14";
Pattern pattern = Pattern.compile("(stxt|city):([^,]+)");
Matcher matcher = pattern.matcher(s);
while (matcher.find()){
System.out.println(matcher.group(1));
System.out.println(matcher.group(2));
}

Looking at your string, you could also find the word/digits after the colon.
:(\w+)

Related

Ignore creating beginnings of words in a regular expression

I'm trying to parse all the links in a message.
My Java-Code looks the following:
Pattern URLPATTERN = Pattern.compile(
"([--:\\w?#%&+~#=]*\\.[a-z]{2,4}/{0,2})((?:[?&](?:\\w+)=(?:\\w+))+|[--:\\w?#%&+~#=]+)?",
Pattern.CASE_INSENSITIVE | Pattern.MULTILINE | Pattern.DOTALL);
Matcher matcher = Patterns.URLPATTERN.matcher(message);
ArrayList<int[]> links = new ArrayList<>();
while (matcher.find())
links.add(new int[] {matcher.start(1), matcher.end()});
[...]
The problem now is that the links sometimes start with a colour-code that looks the following: [&§]{1}[a-z0-9]{1}
An example could be: Please use Google: §ehttps://google.com, and don't ask me.
With the regex expression, I found somewhere on the internet it will match the following: ehttps://google.com but it should only match https://google.com
Now how can I change the regular expression above to exclude the following pattern but still match the link that follows just after the color-code?
[&§]{1}[a-z0-9]{1}
You can add a (?:[&§][a-z0-9])? pattern (matching an optional sequence of a & or § and then an ASCII letter or digit) at the beginning of your regex:
Pattern URLPATTERN = Pattern.compile(
"(?:[&§][a-z0-9])?([--:\\w?#%&+~#=]*\\.[a-z]{2,4}/{0,2})((?:[?&]\\w+=\\w+)+|[--:\\w?#%&+~#=]+)?", Pattern.CASE_INSENSITIVE);
See the regex demo.
When the regex finds §ehttps://google.com, the §e is matched with the optional non-capturing group (?:[&§][a-z0-9])?, that is why it is "excluded" from the Group 1 value.
There is no need using Pattern.MULTILINE | Pattern.DOTALL with your regex, there is no . and no ^/$ in the pattern.

Regular expression to match a line and extract file name in Java

I have a String in following format
Index: /aap/guru/asdte/atsAPI.tcl
===================================================================
RCS file: /autons/atsAPI.tcl,v
retrieving revision 1.41
Index: /aap/guru/asdte/atsAPI1.tcl
===================================================================
RCS file: /autons/atsAPI1.tcl,v
retrieving revision 1.41
What I want is to match a line start with Index: and then get the file name from path.
I mean first get Index: /aap/guru/asdte/atsAPI.tcl and then extract atsAPI.tcl as final result.
Currently I am using matching twice, first whole line and then extracting file name.
My question is, how to do it in a single regular expression in java.
Current Code is
String line = "Index: /aap/guru/asdte/atsAPI.tcl\r\n===================================================================\r\nRCS file: /autons/atsAPI.tcl,v\r\nretrieving revision 1.41\r\n\r\nIndex: /aap/guru/asdte/atsAPI1.tcl\r\n===================================================================\r\nRCS file: /autons/atsAPI1.tcl,v\r\nretrieving revision 1.41";
Pattern regex1 = Pattern.compile("Index:.*?\\n", Pattern.DOTALL);
Pattern regex2 = Pattern.compile("[^*/]+$");
Matcher matcher1 = regex1.matcher(line);
while (matcher1.find()) {
String s = matcher1.group(0);
Matcher matcher2 = regex2.matcher(s);
while (matcher2.find()) {
System.out.println(matcher2.group(0));
}
}
how to do it in a single regular expression in java.
Use a capturing group as shown below.
Regular Expression:
^Index:.*\/(.*)
Now the filename can be obtained by using matcher.group(1) and is represented by the last part (.*) in the regex
^ matches starting anchor
Index: matches the literal as-is
.* matches anything (greedy)
\/ matches a slash /
(.*) matches the filename in a capturing group
Make sure (?m) or Pattern.MULTILINE flag is set so that the matching is multi line and matches the starting anchor ^ at the start of every line.
Regex101 Demo
EDIT: Modify your code to use only one regex, like this:
Pattern pattern = Pattern.compile("^Index:.*\\/(.*)", Pattern.MULTILINE);
Matcher matcher = pattern.matcher(line);
while (matcher.find()) {
System.out.println(matcher.group(1));
}
// Output:
atsAPI.tcl
atsAPI1.tcl
Demo
Try this ^Index.+\/([^\.]+\.\w+)$ with the gm flags or Index.+\/([^\.]+\.\w+) without the m flag. The only capturing group is for the name of the file.
Try the following regex, the answer is in the first match group:
Index:.*?\/([\w]+\.[\w]*)
You can debug it in the following link:
Regex link

Regex not filter by delimiters

I want to create a regular expresion where I want match in case my number are separated by a coma.
For example:
1 OK
1,2,3 OK
1\n2,3 OK
1,\n Not OK
1,,2 Not OK
1,\n2 Not Ok
So far I create this expresion
\d+(([,.|\n])+\d+)*
If I change the last * to be at least 1 with +
\d+(([,.|\n])+\d+)+
Then all previous scenarios works but not this one
1 Not OK//And should be ok
I´m using matcher.find()
Matcher matcher = Pattern.compile(pattern).matcher(number);
if (matcher.find()) {
System.out.println("total number:" + matcher.group(0));;
}
Any idea what I´m doing wrong in my regex?
You can use this regex:
^\d+(?:(?:,|\n)\d+)*$
Java regex:
Pattern p = Pattern.compile("^\\d+(?:(?:,|\\n)\\d+)*$");
RegEx Demo
PS: To match literal \n you will need:
^\d+(?:(?:,|\\n)\d+)*$

Substring between lines using Regular Expression Java

Hi I am having following string
abc test ...
interface
somedata ...
xxx ...
!
sdfff as ##
example
yyy sdd ## .
!
I have a requirement that I want to find content between a line having word "interface" or "example" and a line "!".
Required output will be something like below
String[] output= {"somedata ...\nxxx ...\n","yyy sdd ## .\n"} ;
I can do this manually using substring and iteration . But I want to achieve this using regular expression.
Is it possible?
This is what I have tried
String sample="abc\ninterface\nsomedata\nxxx\n!\nsdfff\ninterface\nyyy\n!\n";
Pattern pattern = Pattern.compile("(?m)\ninterface(.*?)\n!\n");
Matcher m =pattern.matcher(sample);
while (m.find()) {
System.out.println(m.group());
}
Am I Right? Please suggest a right way of doing it .
Edit :
A small change : I want to find content between a line "interface" or "example" and a line "!".
Can we achieve this too using regex ?
You could use (?s) DOTALL modifier.
String sample="abc\ninterface\nsomedata\nxxx\n!\nsdfff\ninterface\nyyy\n!\n";
Pattern pattern = Pattern.compile("(?s)(?<=\\ninterface\\n).*?(?=\\n!\\n)");//Pattern.compile("(?m)^.*$");
Matcher m =pattern.matcher(sample);
while (m.find()) {
System.out.println(m.group());
}
Output:
somedata
xxx
yyy
Note that the input in your example is different.
(?<=\\ninterface\\n) Asserts that the match must be preceded by the characters which are matched by the pattern present inside the positive lookbehind.
(?=\\n!\\n) Asserts that the match must be followed by the characters which are matched by the pattern present inside the positive lookahead.
Update:
Pattern pattern = Pattern.compile("(?s)(?<=\\n(?:example|interface)\\n).*?(?=\\n!\\n)");

How to extract CSS color using regex?

I have a CSS style that I need to extract the color from using a Java regex.
eg
color:#000;
I need to extract the thing after : to ;. Can anyone give an example?
I'm not sure how to apply it to Java, but one regex to do this would be:
^color:\s*(#[0-9a-f]+);?$
To just extract from : up to ; do something like:
Pattern pattern = Pattern.compile("[^:]*:(.*);");
Matcher matcher = pattern.matcher(text);
if (matcher.matches()) {
String value = matcher.group(1);
System.out.println("'" + value+ "'"); // do something with value
}
[^:]* - any number of chars that are not ':'
: - one ':'
(...) - a capturing group
.*- any number of any character
;- the terminating ';'
use color:(.*); for only accepting values for 'color'.
/(?<=:).+(?=;)/
That will do it for you
Not sure how you implement regex in Java though.
www.regexr.com to help you text out your regex in real time.
The expression
":(#.+);"
should do it

Categories