How can I know how many groups I will get using regex? - java

Lets say that i'm getting a text and I need to have some regex on it which goes as follows:
String aContent = " title='111' title='222' ";
Pattern pattern = Pattern.compile("\\s{1,}(title=){1}+(.){1,}'{1}");
Matcher matcher = pattern.matcher(aTagContent);
And the data is being found/matched by using find()
How can I know how many groups I assume to get from this regex?
I know that there is matcher.groupCount() so this is not the answer i'm looking for.
What i'm actully asking is:
How this text will be splitted? how can I know that without using matcher.groupCount() ?

Matcher.groupCount() returns the number of groups in your Pattern, not in the result.
Matcher.matches() tries to match the entire input string against your pattern, Matcher.find() will sequentially try to match only part of your input string. The latter typically being used in a while-loop, so there's no prior knowledge about the amount of matches.
You can remove the trivial {1} quantifier, it makes your pattern overly verbose. Also, {1,} can be replaced by +. The first quote is missing from your pattern so it won't match your input string. Maybe something like this works for you:
Pattern pattern = Pattern.compile("\\s+(title)='([^']+)'");
Matcher matcher = pattern.matcher(" title='111' title='222' ");
while (matcher.find()) {
System.out.println("attribute: " + matcher.group(1) + ", value: " + matcher.group(2));
}
Can you consider using String.split("\\s") first and iterate over the returned String array? At least you'll know the number of attribute-value pairs in advance.

Related

Regex in java to extract specific pattern

I want to match the pattern (including the square brackets, equals, quotes)
[fixedtext="sometext"]
What would be a correct regex expression?
Anything can occur inside quotes. 'fixedtext' is fixed.
Your basic solution (although I'd be skeptical of this, per the comments) is essentially:
"\\[fixedtext=\\\"(.*)\\\"\\]"
which resolves to:
"\[fixedtext=\"(.*)\"\]"
Simple escaping of [] and quotes. The (.*) says capture everything in quotes as a capture group (matcher.group(1)).
But if you had a string of, for example '[fixedtext="abc\"]def"]' you'd get the an answer of abc\ instead of abc\"]def.
If you know the ending bracket ends the line, then use:
"\\[fixedtext=\\\"(.*)\\\"\\]$"
(add the $ at the end to mark end of line) and that should be fairly reliable.
My suggestion is using named-capturing groups.
You can find more details here:
https://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html
Here's an example for your input:
String input = "[fixedtext=\"sometext\"]";
Pattern pattern = Pattern.compile("\\[(?<field>.*)=\"(?<value>.*)\"]");
Matcher matcher = pattern.matcher(input);
if (matcher.matches()) {
System.out.println(matcher.group("field"));
System.out.println(matcher.group("value"));
} else {
System.err.println(input + " doesn't match " + pattern);
}

Split string between words and quotation marks

I currently have this string:
"display_name":"test","game":"test123"
and I want to split the string so I can get the value test. I have looked all over the internet and tried some things, but I couldn't get it to work.
I found that splitting using quotation marks could be done using this regex: \"([^\"]*)\". So I tried this regex: display_name:\":\"([^\"]*)\"game\", but this returned null. I hope that someone could explain me why my regex didn't work and how it should be done.
You forget to include the ",comma before "game" and also you need to remove the extra colon after display_name
display_name\":\"([^\"]*)\",\"game\"
or
\"display_name\":\"([^\"]*)\",\"game\"
Now, print the group index 1.
DEMO
Matcher m = Pattern.compile("\"display_name\":\"([^\"]*)\",\"game\"").matcher(str);
while(m.find())
{
System.out.println(m.group(1))
}
I think you could do it easier, like this:
/(\w)+/g
This little regex will take all your strings.
Your java code should be something like:
Pattern pattern = Pattern.compile("(\w)+");
Matcher matcher = pattern.matcher(yourText);
while (matcher.find()) {
System.out.println("Result: " + matcher.group(2));
}
I also want to note as #AbishekManoharan noted that it looks like JSON

Check if String contains regex match

What i want is to check if there is a number followed by spaces and another number, without any "," in between, anywhere in the String
Currently i am doing this:
Pattern pattern = Pattern.compile("[0-9][\" \"]+[0-9]");
Matcher matcher = pattern.matcher(input);
if(matcher.find()) return false;
and it works just fine. But i was wondering if there is any other simpler way of achieving this?
Since it's an assignment, I won't write out the code, but an alternative solution is:
Split the string on the , token (using String.split())
For each member of the resulting split array:
Trim the leading and trailing spaces from the member
If the trimmed member is an integer (I'll let you figure out how to determine that):
It doesn't meet the criteria you specified
Else:
It's possible that the token could meet your criteria (of containing multiple integers and spaces but no commas. There are several ways you could determine this: do a split on " "; use a while loop, or maybe something else. I'll let you figure that out.
Your solution is good enough, you could try by the positive way like
Pattern pattern = Pattern.compile("^[1-9](,[1-9])*$");
Matcher matcher = pattern.matcher(input);
if(matcher.matches()) return true;

Extract substring after a certain pattern

I have the following string:
http://xxx/Content/SiteFiles/30/32531a5d-b0b1-4a8b-9029-b48f0eb40a34/05%20%20LEISURE.mp3?&mydownloads=true
How can I extract the part after 30/? In this case, it's 32531a5d-b0b1-4a8b-9029-b48f0eb40a34.I have another strings having same part upto 30/ and after that every string having different id upto next / which I want.
You can do like this:
String s = "http://xxx/Content/SiteFiles/30/32531a5d-b0b1-4a8b-9029-b48f0eb40a34/05%20%20LEISURE.mp3?&mydownloads=true";
System.out.println(s.substring(s.indexOf("30/")+3, s.length()));
split function of String class won't help you in this case, because it discards the delimiter and that's not what we want here. you need to make a pattern that looks behind. The look behind synatax is:
(?<=X)Y
Which identifies any Y that is preceded by a X.
So in you case you need this pattern:
(?<=30/).*
compile the pattern, match it with your input, find the match, and catch it:
String input = "http://xxx/Content/SiteFiles/30/32531a5d-b0b1-4a8b-9029-b48f0eb40a34/05%20%20LEISURE.mp3?&mydownloads=true";
Matcher matcher = Pattern.compile("(?<=30/).*").matcher(input);
matcher.find();
System.out.println(matcher.group());
Just for this one, or do you want a generic way to do it ?
String[] out = mystring.split("/")
return out[out.length - 2]
I think the / is definitely the delimiter you are searching for.
I can't see the problem you are talking about Alex
EDIT : Ok, Python got me with indexes.
Regular expression is the answer I think. However, how the expression is written depends on the data (url) format you want to process. Like this one:
Pattern pat = Pattern.compile("/Content/SiteFiles/30/([a-z0-9\\-]+)/.*");
Matcher m = pat.matcher("http://xxx/Content/SiteFiles/30/32531a5d-b0b1-4a8b-9029-b48f0eb40a34/05%20%20LEISURE.mp3?&mydownloads=true");
if (m.find()) {
System.out.println(m.group(1));
}

pattern.matcher() vs pattern.matches()

I am wondering why the results of the java regex pattern.matcher() and pattern.matches() differ when provided the same regular expression and same string
String str = "hello+";
Pattern pattern = Pattern.compile("\\+");
Matcher matcher = pattern.matcher(str);
while (matcher.find()) {
System.out.println("I found the text " + matcher.group() + " starting at "
+ "index " + matcher.start() + " and ending at index " + matcher.end());
}
System.out.println(java.util.regex.Pattern.matches("\\+", str));
The result of the above are:
I found the text + starting at index 5 and ending at index 6
false
I found that using an expression to match the full string works fine in case of matches(".*\\+").
pattern.matcher(String s) returns a Matcher that can find patterns in the String s. pattern.matches(String str) tests, if the entire String (str) matches the pattern.
In brief (just to remember the difference):
pattern.matcher - test if the string contains-a pattern
pattern.matches - test if the string is-a pattern
Matcher.find() attempts to find the next subsequence of the input sequence that matches the pattern.
Pattern.matches(String regex, CharSequence input) compiles the regex into a Matcher and returns Matcher.matches().
Matcher.matches attempts to match the entire region (string) against the pattern (Regex).
So, in your case, the Pattern.matches("\\+", str) returns a false since str.equals("+") is false.
From the Javadoc, see the if, and only if, the entire region section
/**
* Attempts to match the entire region against the pattern.
*
* <p> If the match succeeds then more information can be obtained via the
* <tt>start</tt>, <tt>end</tt>, and <tt>group</tt> methods. </p>
*
* #return <tt>true</tt> if, and only if, <b>the entire region</b> sequence
* matches this matcher's pattern
*/
public boolean matches() {
return match(from, ENDANCHOR);
}
So if your String was just "+", you'd get a true result.
matches tries to match the expression against the entire string. Meaning, it checks whether the entire string is a patern or not.
conceptually think it like this, it implicitly adds a ^ at the start and $ at the end of your pattern.
For, String str = "hello+", if you want matches() to return true, you need to have pattern like ".\+."
I hope this answered your question.
Pattern.matches is testing the whole String, in your case you should use:
System.out.println(java.util.regex.Pattern.matches(".*\\+", str));
Meaning any string and a + symbol
I think your question should really be "When should I use the Pattern.matches() method?", and the answer is "Never." Were you expecting it to return an array of the matched substrings, like .NET's Matches methods do? That's a perfectly reasonable expectation, but no, Java has nothing like that.
If you just want to do a quick-and-dirty match, adorn the regex with .* at either end, and use the string's own matches() method:
System.out.println(str.matches(".*\\+.*"));
If you want to extract multiple matches, or access information about a match afterward, create a Matcher instance and use its methods, like you did in your question. Pattern.matches() is nothing but a wasted opportunity.
Matcher matcher = pattern.matcher(text);
In this case, a matcher object instance will be returned which performs match operations on the input text by interpreting the pattern. Then we can use,matcher.find() to match no. of patterns from the input text.
(java.util.regex.Pattern.matches("\\+", str))
Here, the matcher object will be created implicitly and a boolean will be returned which matches the whole text with the pattern. This will work as same as the str.matches(regex) function in String.
The code equivalent to java.util.regex.Pattern.matches("\\+", str) would be:
Pattern.compile("\\+").matcher(str).matches();
method find will find the first occurrence of the pattern in the string.

Categories