Substring between lines using Regular Expression Java - java

Hi I am having following string
abc test ...
interface
somedata ...
xxx ...
!
sdfff as ##
example
yyy sdd ## .
!
I have a requirement that I want to find content between a line having word "interface" or "example" and a line "!".
Required output will be something like below
String[] output= {"somedata ...\nxxx ...\n","yyy sdd ## .\n"} ;
I can do this manually using substring and iteration . But I want to achieve this using regular expression.
Is it possible?
This is what I have tried
String sample="abc\ninterface\nsomedata\nxxx\n!\nsdfff\ninterface\nyyy\n!\n";
Pattern pattern = Pattern.compile("(?m)\ninterface(.*?)\n!\n");
Matcher m =pattern.matcher(sample);
while (m.find()) {
System.out.println(m.group());
}
Am I Right? Please suggest a right way of doing it .
Edit :
A small change : I want to find content between a line "interface" or "example" and a line "!".
Can we achieve this too using regex ?

You could use (?s) DOTALL modifier.
String sample="abc\ninterface\nsomedata\nxxx\n!\nsdfff\ninterface\nyyy\n!\n";
Pattern pattern = Pattern.compile("(?s)(?<=\\ninterface\\n).*?(?=\\n!\\n)");//Pattern.compile("(?m)^.*$");
Matcher m =pattern.matcher(sample);
while (m.find()) {
System.out.println(m.group());
}
Output:
somedata
xxx
yyy
Note that the input in your example is different.
(?<=\\ninterface\\n) Asserts that the match must be preceded by the characters which are matched by the pattern present inside the positive lookbehind.
(?=\\n!\\n) Asserts that the match must be followed by the characters which are matched by the pattern present inside the positive lookahead.
Update:
Pattern pattern = Pattern.compile("(?s)(?<=\\n(?:example|interface)\\n).*?(?=\\n!\\n)");

Related

Ignore creating beginnings of words in a regular expression

I'm trying to parse all the links in a message.
My Java-Code looks the following:
Pattern URLPATTERN = Pattern.compile(
"([--:\\w?#%&+~#=]*\\.[a-z]{2,4}/{0,2})((?:[?&](?:\\w+)=(?:\\w+))+|[--:\\w?#%&+~#=]+)?",
Pattern.CASE_INSENSITIVE | Pattern.MULTILINE | Pattern.DOTALL);
Matcher matcher = Patterns.URLPATTERN.matcher(message);
ArrayList<int[]> links = new ArrayList<>();
while (matcher.find())
links.add(new int[] {matcher.start(1), matcher.end()});
[...]
The problem now is that the links sometimes start with a colour-code that looks the following: [&§]{1}[a-z0-9]{1}
An example could be: Please use Google: §ehttps://google.com, and don't ask me.
With the regex expression, I found somewhere on the internet it will match the following: ehttps://google.com but it should only match https://google.com
Now how can I change the regular expression above to exclude the following pattern but still match the link that follows just after the color-code?
[&§]{1}[a-z0-9]{1}
You can add a (?:[&§][a-z0-9])? pattern (matching an optional sequence of a & or § and then an ASCII letter or digit) at the beginning of your regex:
Pattern URLPATTERN = Pattern.compile(
"(?:[&§][a-z0-9])?([--:\\w?#%&+~#=]*\\.[a-z]{2,4}/{0,2})((?:[?&]\\w+=\\w+)+|[--:\\w?#%&+~#=]+)?", Pattern.CASE_INSENSITIVE);
See the regex demo.
When the regex finds §ehttps://google.com, the §e is matched with the optional non-capturing group (?:[&§][a-z0-9])?, that is why it is "excluded" from the Group 1 value.
There is no need using Pattern.MULTILINE | Pattern.DOTALL with your regex, there is no . and no ^/$ in the pattern.

Regular expression to match a line and extract file name in Java

I have a String in following format
Index: /aap/guru/asdte/atsAPI.tcl
===================================================================
RCS file: /autons/atsAPI.tcl,v
retrieving revision 1.41
Index: /aap/guru/asdte/atsAPI1.tcl
===================================================================
RCS file: /autons/atsAPI1.tcl,v
retrieving revision 1.41
What I want is to match a line start with Index: and then get the file name from path.
I mean first get Index: /aap/guru/asdte/atsAPI.tcl and then extract atsAPI.tcl as final result.
Currently I am using matching twice, first whole line and then extracting file name.
My question is, how to do it in a single regular expression in java.
Current Code is
String line = "Index: /aap/guru/asdte/atsAPI.tcl\r\n===================================================================\r\nRCS file: /autons/atsAPI.tcl,v\r\nretrieving revision 1.41\r\n\r\nIndex: /aap/guru/asdte/atsAPI1.tcl\r\n===================================================================\r\nRCS file: /autons/atsAPI1.tcl,v\r\nretrieving revision 1.41";
Pattern regex1 = Pattern.compile("Index:.*?\\n", Pattern.DOTALL);
Pattern regex2 = Pattern.compile("[^*/]+$");
Matcher matcher1 = regex1.matcher(line);
while (matcher1.find()) {
String s = matcher1.group(0);
Matcher matcher2 = regex2.matcher(s);
while (matcher2.find()) {
System.out.println(matcher2.group(0));
}
}
how to do it in a single regular expression in java.
Use a capturing group as shown below.
Regular Expression:
^Index:.*\/(.*)
Now the filename can be obtained by using matcher.group(1) and is represented by the last part (.*) in the regex
^ matches starting anchor
Index: matches the literal as-is
.* matches anything (greedy)
\/ matches a slash /
(.*) matches the filename in a capturing group
Make sure (?m) or Pattern.MULTILINE flag is set so that the matching is multi line and matches the starting anchor ^ at the start of every line.
Regex101 Demo
EDIT: Modify your code to use only one regex, like this:
Pattern pattern = Pattern.compile("^Index:.*\\/(.*)", Pattern.MULTILINE);
Matcher matcher = pattern.matcher(line);
while (matcher.find()) {
System.out.println(matcher.group(1));
}
// Output:
atsAPI.tcl
atsAPI1.tcl
Demo
Try this ^Index.+\/([^\.]+\.\w+)$ with the gm flags or Index.+\/([^\.]+\.\w+) without the m flag. The only capturing group is for the name of the file.
Try the following regex, the answer is in the first match group:
Index:.*?\/([\w]+\.[\w]*)
You can debug it in the following link:
Regex link

How to capture multiple groups in regex?

I am trying to capture following word, number:
stxt:usa,city:14
I can capture usa and 14 using:
stxt:(.*?),city:(\d.*)$
However, when text is;
stxt:usa
The regex did not work. I tried to apply or condition using | but it did not work.
stxt:(.*?),|city:(\d.*)$
You may use
(stxt|city):([^,]+)
See the regex demo (note the \n added only for the sake of the demo, you do not need it in real life).
Pattern details:
(stxt|city) - either a stxt or city substrings (you may add \b before the ( to only match a whole word) (Group 1)
: - a colon
([^,]+) - 1 or more characters other than a comma (Group 2).
Java demo:
String s = "stxt:usa,city:14";
Pattern pattern = Pattern.compile("(stxt|city):([^,]+)");
Matcher matcher = pattern.matcher(s);
while (matcher.find()){
System.out.println(matcher.group(1));
System.out.println(matcher.group(2));
}
Looking at your string, you could also find the word/digits after the colon.
:(\w+)

Using regex pattern in Java

I created a regex pattern that works perfect, but I can't get it working in Java:
(\\"|[^" ])+|"(\\"|[^"])*"
applied to
robocopy "C:\test" "C:\test2" /R:0 /MIR /NP
gives (as it should)
[0] => robocopy
[1] => "C:\test"
[2] => "C:\test2"
[3] => /R:0
[4] => /MIR
[5] => /NP
in group 0 according to http://myregextester.com/index.php
Now, how do I get those 6 values in Java?
I tried
Pattern p = Pattern.compile(" (\\\"|[^\" ])+ | \"(\\\"|[^\"])*\" ");
Matcher m = p.matcher(command);
System.out.println(m.matches()); // returns false
but the pattern doesn't even match anything at all?
Update
The original perl regex was:
(\\"|[^" ])+|"(\\"|[^"])*"
Since the regexp string is first processed by the compiler before making it to the regexp processor, you need to double every backslahs in the expression, and add additional slashes for every doublequote.
Pattern p = Pattern.compile("(\\\\\"|[^\" ])+|\"(\\\\\"|[^\"])*\"");
The matches() method is matching the whole string to the regex - it returns true only if the entire string is matching
What you are looking for is the find() method, and get the substring using the group() method.
It is usually done by iterating:
while (m.find()) {
.... = m.group();
//post processing
}
matches() tries to match the pattern on entire string. You should use find() method of the Matcher object for your case.
So the solution is:
System.out.println(m.find());

How to split this string using Java Regular Expressions

I want to split the string
String fields = "name[Employee Name], employeeno[Employee No], dob[Date of Birth], joindate[Date of Joining]";
to
name
employeeno
dob
joindate
I wrote the following java code for this but it is printing only name other matches are not printing.
String fields = "name[Employee Name], employeeno[Employee No], dob[Date of Birth], joindate[Date of Joining]";
Pattern pattern = Pattern.compile("\\[.+\\]+?,?\\s*" );
String[] split = pattern.split(fields);
for (String string : split) {
System.out.println(string);
}
What am I doing wrong here?
Thank you
This part:
\\[.+\\]
matches the first [, the .+ then gobbles up the entire string (if no line breaks are in the string) and then the \\] will match the last ].
You need to make the .+ reluctant by placing a ? after it:
Pattern pattern = Pattern.compile("\\[.+?\\]+?,?\\s*");
And shouldn't \\]+? just be \\] ?
The error is that you are matching greedily. You can change it to a non-greedy match:
Pattern.compile("\\[.+?\\],?\\s*")
^
There's an online regular expression tester at http://gskinner.com/RegExr/?2sa45 that will help you a lot when you try to understand regular expressions and how they are applied to a given input.
WOuld it be better to use Negated Character Classes to match the square brackets? \[(\w+\s)+\w+[^\]]\]
You could also see a good example how does using a negated character class work internally (without backtracking)?

Categories