Why does regex doesn't match - java

I have wrote the following code:
public static void main(String[] args) {
// String to be scanned to find the pattern.
String line = "'7858','1194','FSP,FRB,FWF,FBVS,FRRC','15'\n"
+ "'7859','1194','FIRM','21'";
String pattern = "^'*','*','*','*'$";
// Create a Pattern object
Pattern r = Pattern.compile(pattern);
// Now create matcher object.
Matcher m = r.matcher(line);
if (m.find()) {
System.out.println("Found value: " + m.group(0));
System.out.println("Found value: " + m.group(1));
} else {
System.out.println("NO MATCH");
}
}
it returns NO MATCH always.
expected result - 2 rows
What do I wrong ?

There are several problems in your code :
A single "star" (*) in matches 0-N times the character it follows - in your code, '*' means "match 0-N times a single quote, followed by another single quote"
Also, the "star" qualifier is "greedy" by default, meaning it will eat as many matching chars as possible, including the ending quote in your groups. In your case, you may want to set it in "reluctant" mode (by appending a ? to it : *?), so that it matches only the text inside the single quotes.
The lines must be matched one by one, so the initial multi-line must be split on the line-separator character (\n). Unless you use the multi-line match option, but I think this is not what you want here.
Matching groups start at 1, not 0, so groups would be numbered 1 to 4 in your case.
Here is your code, corrected as explained above :
public static void main(String[] args) {
String line = "'7858','1194','FSP,FRB,FWF,FBVS,FRRC','15'\n" +
"'7859','1194','FIRM','21'";
Pattern r = Pattern.compile("'(.*?)','(.*?)','(.*?)','(.*?)'");
String[] lines = line.split("\n");
for (String l : lines) {
System.out.println("Line : " + l);
Matcher m = r.matcher(l);
if (m.find()) {
System.out.println("Found value: " + m.group(1));
System.out.println("Found value: " + m.group(2));
System.out.println("Found value: " + m.group(3));
System.out.println("Found value: " + m.group(4));
} else {
System.out.println("NO MATCH");
}
}
}
And here is the result :
Line : '7858','1194','FSP,FRB,FWF,FBVS,FRRC','15'
Found value: 7858
Found value: 1194
Found value: FSP,FRB,FWF,FBVS,FRRC
Found value: 15
Line : '7859','1194','FIRM','21'
Found value: 7859
Found value: 1194
Found value: FIRM
Found value: 21

"^'*','*','*','*'$" does not match anything because '* searches for as many 's as possible. It does not match what you want.
Also, the ^ and $ won't work.
I think that this regex is what you need:
'[0-9A-Z,]*','[0-9A-Z,]*','[0-9A-Z,]*','[0-9A-Z,]*'
Here I have added the character class [0-9A-Z,] to match numbers, letters and ,s. I think that this will give you what you need.

You could try with this expression:
(?<=^|[\r\n]+)'([^']*)','([^']*)','([^']*)','([^']*)'(?=[\r\n]+|$)
Breakdown:
(?<=^|[\r\n]+) is a positive look-behind checking for either the start of the input or a sequence of linebreak characters
'([^']*)' matches and captures one of your groups. You could use '(.*?)' (i.e. a reluctant qualifier) instead but the former version is safer since it won't match if your input lines contain more than 4 groups
(?=[\r\n]+|$) is a positive look-ahead checking your groups are followed by either a sequence of linebreak characters of the end of the input sequence
I also made the following assumptions about your code:
Your input contains multiple lines which you can't or don't want to split (otherwise String[] lines = input.split("[\\r\\n]+") would be better).
A matching line always consists of 4 groups which you want to access using group(1) etc.
Your groups can contain any character except a single quote. If a group is only allowed to contain certain characters (e.g. digits), it would be safer to reflect that in the expression (e.g. '[0-9]+')

Related

Java Regex. group excluding delimiters

I'm trying to split my string using regex. It should include even zero-length matches before and after every delimiter. For example, if delimiter is ^ and my string is ^^^ I expect to get to get 4 zero-length groups.
I can not use just regex = "([^\\^]*)" because it will include extra zero-length matches after every true match between delimiters.
So I have decided to use not-delimiter symbols following after beginning of line or after delimiter. It works perfect on https://regex101.com/ (I'm sorry, i couldn't find a share option on this web-site to share my example) but in Intellij IDEa it skips one match.
So, now my code is:
final String regex = "(^|\\^)([^\\^]*)";
final String string = "^^^^";
final Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
final Matcher matcher = pattern.matcher(string);
while (matcher.find())
System.out.println("[" + matcher.start(2) + "-" + matcher.end(2) + "]: \"" + matcher.group(2) + "\"");
and I expect 5 empty-string matches. But I have only 4:
[0-0]: ""
[2-2]: ""
[3-3]: ""
[4-4]: ""
The question is why does it skip [1-1] match and how can I fix it?
Your regex matches either the start of string or a ^ (capturing that into Group 1) and then any 0+ chars other than ^ into Group 2. When the first match is found (the start of the string), the first group keeps an empty string (as it is the start of string) and Group 2 also holds an empty string (as the first char is ^ and [^^]* can match an empty string before a non-matching char. The whole match is zero-length, and the regex engine moves the regex index to the next position. So, after the first match, the regex index is moved from the start of the string to the position after the first ^. Then, the second match is found, the second ^ and the empty string after it. Hence, the the first ^ is not matched, it is skipped.
The solution is a simple split one:
String[] result = string.split("\\^", -1);
The second argument makes the method output all empty matches at the end of the resulting array.
See a Java demo:
String str = "^^^^";
String[] result = str.split("\\^", -1);
System.out.println("Number of items: " + result.length);
for (String s: result) {
System.out.println("\"" + s+ "\"");
}
Output:
Number of items: 5
""
""
""
""
""

Regular Expressions - Find hexadecimal numbers' matches excluding the 0 of the next hexadecimal number

Goal - I need to retrieve all hexadecimal numbers from my input String.
Example inputs and matches --
1- Input = "0x480x8600x89dfh0x89BABCE" (The "" are not included in the input).
should produce following matches:
0x48 ( as opposed to 0x480)
0x860
0x89df
0x89BABCE
I have tried this Pattern:
"0[xX][\\da-fA-F]+"
But it results in the following matches:
0x480
0x89df
0x89BABCE
2- Input = "0x0x8600x89dfh0x89BABCE" (The "" are not included in the input).
Should produce following matches:
0x860
0x89df
0x89BABCE
Is such a regex possible?
I know that I can first split my input using the String.split("0[xX]"), and then for each String I can write logic to retrieve the first valid match, if there is one.
But I want to know if I can achieve the desired result using just a Pattern and a Matcher.
Here's my current code.
package toBeDeleted;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegexTest {
public static void main(String[] args) {
String pattern = "0[xX][\\da-fA-F]+";
String input = "0x480x860x89dfh0x89BABCE";
Pattern p = Pattern.compile(pattern);
Matcher m = p.matcher(input);
while (m.find()) {
System.out.println("Starting at : " + m.start()
+ ", Ending at : " + m.end()
+ ", element matched : " + m.group());
}
}
}
Some people, when confronted with a problem, think "I know, I'll use
regular expressions." Now they have two problems.
-- Jamie Zawinski
If you just use .split("0x"), and tack 0x back on each (non-empty) result, you'll be done.
You can use a lookahead to check that the next character is not an "x"
String pattern = "0[xX]([1-9a-fA-F]|0(?![xX]))+";
Doesn't provide a match as 0x0 for the second example, though. However, you did state that matches should exclude the "0" preceding the next hex number, so not really sure why that would be matched.

Java regex to parse any number of Markdown-style links

I'm trying to parse a string for any occurrences of markdown style links, i.e. [text](link). I'm able to get the first of the links in a string, but if I have multiple links I can't access the rest. Here is what I've tried, you can run it on ideone:
Pattern p;
try {
p = Pattern.compile("[^\\[]*\\[(?<text>[^\\]]*)\\]\\((?<link>[^\\)]*)\\)(?:.*)");
} catch (PatternSyntaxException ex) {
System.out.println(ex);
throw(ex);
}
Matcher m1 = p.matcher("Hello");
Matcher m2 = p.matcher("Hello [world](ladies)");
Matcher m3 = p.matcher("Well, [this](that) has [two](too many) keys.");
System.out.println("m1 matches: " + m1.matches()); // false
System.out.println("m2 matches: " + m2.matches()); // true
System.out.println("m3 matches: " + m3.matches()); // true
System.out.println("m2 text: " + m2.group("text")); // world
System.out.println("m2 link: " + m2.group("link")); // ladies
System.out.println("m3 text: " + m3.group("text")); // this
System.out.println("m3 link: " + m3.group("link")); // that
System.out.println("m3 end: " + m3.end()); // 44 - I want 18
System.out.println("m3 count: " + m3.groupCount()); // 2 - I want 4
System.out.println("m3 find: " + m3.find()); // false - I want true
I know I can't have repeating groups, but I figured find would work, however it does not work as I expected it to. How can I modify my approach so that I can parse each link?
Can't you go through the matches one by one and do the next match from an index after the previous match? You can use this regex:
\[(?<text>[^\]]*)\]\((?<link>[^\)]*)\)
The method Find() tries to find all matches even if the match is a substring of the entire string. Each call to find gets the next match. Matches() tries to match the entire string and fails if it doesn't match. Use something like this:
while (m.find()) {
String s = m.group(1);
// s now contains "BAR"
}
The regular expression I've used to match what you need (without groups) is \[\w+\]\(.+\)
It is just to show you it simple. Basically it does:
Filter a square: \[
Followed by any word char (at least 1): \w+
Then the square: \]
This will look for these pattern [blabla]
Then the same with parenthesis...
Filter a parenthesis: \(
Followed by any char (at least 1): .+
Then the parenthesis: \)
So it filters (ble...ble...)
Now if you want to store the matches on groups you can use additional parenthesis like this:
(\[\w+\])(\(.+\)) in this way you can have stored the words and links.
Hope to help.
I've tried on regexplanet.com and it's working
Update: workaround .*(\[\w+\])(\(.+\))*.*

Java pattern for [j-*]

Please help me with the pattern matching. I want to build a pattern which will match the word starting with j- or c- in the following in a string (Say for example)
[j-test] is a [c-test]'s name with [foo] and [bar]
The pattern needs to find [j-test] and [c-test] (brackets inclusive).
What I have tried so far?
String template = "[j-test] is a [c-test]'s name with [foo] and [bar]";
Pattern patt = Pattern.compile("\\[[*[j|c]\\-\\w\\-\\+\\d]+\\]");
Matcher m = patt.matcher(template);
while (m.find()) {
System.out.println(m.group());
}
And its giving output like
[j-test]
[c-test]
[foo]
[bar]
which is wrong. Please help me, thanks for your time on this thread.
Inside a character class, you don't need to use alternation to match j or c. Character class itself means, match any single character from the ones inside it. So, [jc] itself will match either j or c.
Also, you don't need to match the pattern that is after j- or c-, as you are not bothered about them, as far as they start with j- or c-.
Simply use this pattern:
Pattern patt = Pattern.compile("\\[[jc]-[^\\]]*\\]");
To explain:
Pattern patt = Pattern.compile("(?x) " // Embedded flag for Pattern.COMMENT
+ "\\[ " // Match starting `[`
+ " [jc] " // Match j or c
+ " - " // then a hyphen
+ " [^ " // A negated character class
+ " \\]" // Match any character except ]
+ " ]* " // 0 or more times
+ "\\] "); // till the closing ]
Using (?x) flag in the regex, ignores the whitespaces. It is often helpful, to write readable regexes.

How to split a string which contains multiple key value pairs

I have a string:
Single line : Some text
Multi1: multi (Va1) Multi2 : multi (Va2) Multi3 : multi (Val3)
Dots....20/12/2013 (EOY)
and I am trying to retrieve all the key value pairs. My first attempt
(Single line|Multi[0-9]{1}|Dots)( *:? [.] *| *:? )(.)
seems to work but does not handle multiple key value pairs on one line. Is there any way to achieve this?
Try this:
String text = "Single line : Some text\r\n" +
"Multi1: multi (Va1) Multi2 : multi (Va2) Multi3 : multi (Val3)\r\n" +
"Dots....20/12/2013 (EOY)";
Pattern pattern = Pattern.compile("(\\p{Alnum}[\\p{Alnum}\\s/]+?)\\s?(:|\\.+)\\s?(\\p{Alnum}[\\p{Alnum}\\s/]+?)(?=($|\\()|(\\s\\())", Pattern.MULTILINE);
Matcher matcher = pattern.matcher(text);
while (matcher.find()) {
System.out.println(matcher.group(1) + "-->" + matcher.group(3));
}
Output:
Single line-->Some text
Multi1-->multi
Multi2-->multi
Multi3-->multi
Dots-->20/12/2013
Explanation:
I am limiting the keys and values to "starts with alphanumeric",
"contains any number of alphanumerics, spaces or slashes".
I am limiting the separator to "optional space, :, optional space" or
"optional space, any number of consecutive dots, optional space".
I am using groups 1 and 3 to define the key and value in the
Pattern.
Group 2 is used to provide alternate separators as above.
Finally, the Pattern is delimited at the end, either with a new
line, or with an open round bracket, or, with a space followed by an
open round bracket.
Note that you can't use quantifiers in a lookahead or lookbehind group, hence the repetition.
You can use this pattern:
public static void main(String[] args) {
String s = "Single line : Some text\n"
+ "Multi1: multi (Va1) Multi2 : multi (Va2) "
+ "Multi3 : multi (Val3)\n"
+ "Dots....20/12/2013 (EOY)";
String wd = "[^\\s.:]+(?:[^\\S\\n]+[^\\s.:]+)*";
Pattern p = Pattern.compile("(?<key>" + wd + ")"
+ "\\s*(?::|\\.+)\\s*"
+ "(?<value>" + wd + "(?:\\s*\\([^)]+\\))?)"
+ "(?!\\s*:)(?=\\s|$)");
Matcher m = p.matcher(s);
while (m.find()) {
System.out.println(m.group("key")+"->"+m.group("value"));
}
}
I don't recall the exact syntax, but I think it's something like this:
while (matcher.find()) {
String match = matcher.group();
}
The goal here is that you need to iterate over the current line and tell it "while you are still finding stuff, return to me the string on this line that matched." Since you have multiple matches on the same line, it should keep pulling out findings for you. Here is the JavaDoc for Matcher as a reference.
This is sadly another reason why Java is really not well-suited for this sort of thing, and before anyone downmods me understand I say that as a criticism of the Java APIs here, not the language.

Categories