I'm trying to parse all the links in a message.
My Java-Code looks the following:
Pattern URLPATTERN = Pattern.compile(
"([--:\\w?#%&+~#=]*\\.[a-z]{2,4}/{0,2})((?:[?&](?:\\w+)=(?:\\w+))+|[--:\\w?#%&+~#=]+)?",
Pattern.CASE_INSENSITIVE | Pattern.MULTILINE | Pattern.DOTALL);
Matcher matcher = Patterns.URLPATTERN.matcher(message);
ArrayList<int[]> links = new ArrayList<>();
while (matcher.find())
links.add(new int[] {matcher.start(1), matcher.end()});
[...]
The problem now is that the links sometimes start with a colour-code that looks the following: [&§]{1}[a-z0-9]{1}
An example could be: Please use Google: §ehttps://google.com, and don't ask me.
With the regex expression, I found somewhere on the internet it will match the following: ehttps://google.com but it should only match https://google.com
Now how can I change the regular expression above to exclude the following pattern but still match the link that follows just after the color-code?
[&§]{1}[a-z0-9]{1}
You can add a (?:[&§][a-z0-9])? pattern (matching an optional sequence of a & or § and then an ASCII letter or digit) at the beginning of your regex:
Pattern URLPATTERN = Pattern.compile(
"(?:[&§][a-z0-9])?([--:\\w?#%&+~#=]*\\.[a-z]{2,4}/{0,2})((?:[?&]\\w+=\\w+)+|[--:\\w?#%&+~#=]+)?", Pattern.CASE_INSENSITIVE);
See the regex demo.
When the regex finds §ehttps://google.com, the §e is matched with the optional non-capturing group (?:[&§][a-z0-9])?, that is why it is "excluded" from the Group 1 value.
There is no need using Pattern.MULTILINE | Pattern.DOTALL with your regex, there is no . and no ^/$ in the pattern.
Related
I have a String in following format
Index: /aap/guru/asdte/atsAPI.tcl
===================================================================
RCS file: /autons/atsAPI.tcl,v
retrieving revision 1.41
Index: /aap/guru/asdte/atsAPI1.tcl
===================================================================
RCS file: /autons/atsAPI1.tcl,v
retrieving revision 1.41
What I want is to match a line start with Index: and then get the file name from path.
I mean first get Index: /aap/guru/asdte/atsAPI.tcl and then extract atsAPI.tcl as final result.
Currently I am using matching twice, first whole line and then extracting file name.
My question is, how to do it in a single regular expression in java.
Current Code is
String line = "Index: /aap/guru/asdte/atsAPI.tcl\r\n===================================================================\r\nRCS file: /autons/atsAPI.tcl,v\r\nretrieving revision 1.41\r\n\r\nIndex: /aap/guru/asdte/atsAPI1.tcl\r\n===================================================================\r\nRCS file: /autons/atsAPI1.tcl,v\r\nretrieving revision 1.41";
Pattern regex1 = Pattern.compile("Index:.*?\\n", Pattern.DOTALL);
Pattern regex2 = Pattern.compile("[^*/]+$");
Matcher matcher1 = regex1.matcher(line);
while (matcher1.find()) {
String s = matcher1.group(0);
Matcher matcher2 = regex2.matcher(s);
while (matcher2.find()) {
System.out.println(matcher2.group(0));
}
}
how to do it in a single regular expression in java.
Use a capturing group as shown below.
Regular Expression:
^Index:.*\/(.*)
Now the filename can be obtained by using matcher.group(1) and is represented by the last part (.*) in the regex
^ matches starting anchor
Index: matches the literal as-is
.* matches anything (greedy)
\/ matches a slash /
(.*) matches the filename in a capturing group
Make sure (?m) or Pattern.MULTILINE flag is set so that the matching is multi line and matches the starting anchor ^ at the start of every line.
Regex101 Demo
EDIT: Modify your code to use only one regex, like this:
Pattern pattern = Pattern.compile("^Index:.*\\/(.*)", Pattern.MULTILINE);
Matcher matcher = pattern.matcher(line);
while (matcher.find()) {
System.out.println(matcher.group(1));
}
// Output:
atsAPI.tcl
atsAPI1.tcl
Demo
Try this ^Index.+\/([^\.]+\.\w+)$ with the gm flags or Index.+\/([^\.]+\.\w+) without the m flag. The only capturing group is for the name of the file.
Try the following regex, the answer is in the first match group:
Index:.*?\/([\w]+\.[\w]*)
You can debug it in the following link:
Regex link
I am trying to capture following word, number:
stxt:usa,city:14
I can capture usa and 14 using:
stxt:(.*?),city:(\d.*)$
However, when text is;
stxt:usa
The regex did not work. I tried to apply or condition using | but it did not work.
stxt:(.*?),|city:(\d.*)$
You may use
(stxt|city):([^,]+)
See the regex demo (note the \n added only for the sake of the demo, you do not need it in real life).
Pattern details:
(stxt|city) - either a stxt or city substrings (you may add \b before the ( to only match a whole word) (Group 1)
: - a colon
([^,]+) - 1 or more characters other than a comma (Group 2).
Java demo:
String s = "stxt:usa,city:14";
Pattern pattern = Pattern.compile("(stxt|city):([^,]+)");
Matcher matcher = pattern.matcher(s);
while (matcher.find()){
System.out.println(matcher.group(1));
System.out.println(matcher.group(2));
}
Looking at your string, you could also find the word/digits after the colon.
:(\w+)
Hi I am having following string
abc test ...
interface
somedata ...
xxx ...
!
sdfff as ##
example
yyy sdd ## .
!
I have a requirement that I want to find content between a line having word "interface" or "example" and a line "!".
Required output will be something like below
String[] output= {"somedata ...\nxxx ...\n","yyy sdd ## .\n"} ;
I can do this manually using substring and iteration . But I want to achieve this using regular expression.
Is it possible?
This is what I have tried
String sample="abc\ninterface\nsomedata\nxxx\n!\nsdfff\ninterface\nyyy\n!\n";
Pattern pattern = Pattern.compile("(?m)\ninterface(.*?)\n!\n");
Matcher m =pattern.matcher(sample);
while (m.find()) {
System.out.println(m.group());
}
Am I Right? Please suggest a right way of doing it .
Edit :
A small change : I want to find content between a line "interface" or "example" and a line "!".
Can we achieve this too using regex ?
You could use (?s) DOTALL modifier.
String sample="abc\ninterface\nsomedata\nxxx\n!\nsdfff\ninterface\nyyy\n!\n";
Pattern pattern = Pattern.compile("(?s)(?<=\\ninterface\\n).*?(?=\\n!\\n)");//Pattern.compile("(?m)^.*$");
Matcher m =pattern.matcher(sample);
while (m.find()) {
System.out.println(m.group());
}
Output:
somedata
xxx
yyy
Note that the input in your example is different.
(?<=\\ninterface\\n) Asserts that the match must be preceded by the characters which are matched by the pattern present inside the positive lookbehind.
(?=\\n!\\n) Asserts that the match must be followed by the characters which are matched by the pattern present inside the positive lookahead.
Update:
Pattern pattern = Pattern.compile("(?s)(?<=\\n(?:example|interface)\\n).*?(?=\\n!\\n)");
This works just fine for normal string literal ("hello").
"([^"]*)"
But I also want my regex to match literal such as "hell\"o".
This what i have been able to come up with but it doesn't work.
("(?=(\\")*)[^"]*")
here I have tried to look ahead for <\">.
How about
Pattern.compile("\"((\\\\\"|[^\"])*)\"")//
^^ - to match " literal
^^^^ - to match \ literal
^^^^^^ - will match \" literal
or
Pattern.compile("\"((?:\\\\\"|[^\"])*)\"")//
if you don't want to add more capturing groups.
This regex accept \" or any non " between quotation marks.
Demo:
String input = "ab \"cd\" ef \"gh \\\"ij\"";
Matcher m = Pattern.compile("\"((?:\\\\\"|[^\"])*)\"").matcher(input);
while (m.find())
System.out.println(m.group(1));
Output:
cd
gh \"ij
Use this method:
"((?:[^"\\\\]*|\\\\.)*)"
[^"\\\\]* now will not match \ anymore either. But on the other alternation, you get to match any escaped character.
Try with this one:
Pattern pattern = Pattern.compile("((?:\\\"|[^\"])*)");
\\\" to match \" or,
[^\"] to match anything by "
Who can help me to translate this XML Schema pattern "[0-9]+-([0-9]|K)" to java regular expression?
Here is the pattern with a snippet on how to use it.
\\ is there to escape the \ in a string. \d represents [0-9]. I do not recall if the - has to be escaped so I did it just in case.
Pattern p = Pattern.compile("\\d+\\-[\\d|K]"); //The string is the pattern
Matcher m = p.matcher(whatYouWantToMatch);
boolean b = m.matches();
At least this case is compatible with java regex...
String s = "test cases here";
s.matches("[0-9]+-([0-9]|K)") //works OK.