I am trying to match the word Salvage in this string, but the code is not picking it up. Where am I going wrong?
//String to match
String titleString = "<td><i>Salvage</i></td>";
System.out.println(titleString);
//Template
String template = ">(.*)</a>";
//
Pattern p=Pattern.compile(template);
Matcher matcher = p.matcher(titleString);
System.out.println(matcher.group(1));
Try to put a matcher.find() just before the matcher.group(1).
The group takes the "Group from the last match". But as there was no match yet, you found nothing.
Related
I have a String which looks like "<name><address> and <Phone_1>". I have get to get the result like
1) <name>
2) <address>
3) <Phone_1>
I have tried using regex "<(.*)>" but it returns just one result.
The regex you want is
<([^<>]+?)><([^<>]+?)> and <([^<>]+?)>
Which will then spit out the stuff you want in the 3 capture groups. The full code would then look something like this:
Matcher m = Pattern.compile("<([^<>]+?)><([^<>]+?)> and <([^<>]+?)>").matcher(string);
if (m.find()) {
String name = m.group(1);
String address = m.group(2);
String phone = m.group(3);
}
The pattern .* in a regex is greedy. It will match as many characters as possible between the first < it finds and the last possible > it can find. In the case of your string it finds the first <, then looks for as much text as possible until a >, which it will find at the very end of the string.
You want a non-greedy or "lazy" pattern, which will match as few characters as possible. Simply <(.+?)>. The question mark is the syntax for non-greedy. See also this question.
This will work if you have dynamic number of groups.
Pattern p = Pattern.compile("(<\\w+>)");
Matcher m = p.matcher("<name><address> and <Phone_1>");
while (m.find()) {
System.out.println(m.group());
}
I have a string containing a number. Something like "Incident #492 - The Title Description".
I need to extract the number from this string.
Tried
Pattern p = Pattern.compile("\\d+");
Matcher m = p.matcher(theString);
String substring =m.group();
By getting an error
java.lang.IllegalStateException: No match found
What am I doing wrong?
What is the correct expression?
I'm sorry for such a simple question, but I searched a lot and still not found how to do this (maybe because it's too late here...)
You are getting this exception because you need to call find() on the matcher before accessing groups:
Matcher m = p.matcher(theString);
while (m.find()) {
String substring =m.group();
System.out.println(substring);
}
Demo.
There are two things wrong here:
The pattern you're using is not the most ideal for your scenario, it's only checking if a string only contains numbers. Also, since it doesn't contain a group expression, a call to group() is equivalent to calling group(0), which returns the entire string.
You need to be certain that the matcher has a match before you go calling a group.
Let's start with the regex. Here's what it looks like now.
Debuggex Demo
That will only ever match a string that contains all numbers in it. What you care about is specifically the number in that string, so you want an expression that:
Doesn't care about what's in front of it
Doesn't care about what's after it
Only matches on one occurrence of numbers, and captures it in a group
To that, you'd use this expression:
.*?(\\d+).*
Debuggex Demo
The last part is to ensure that the matcher can find a match, and that it gets the correct group. That's accomplished by this:
if (m.matches()) {
String substring = m.group(1);
System.out.println(substring);
}
All together now:
Pattern p = Pattern.compile(".*?(\\d+).*");
final String theString = "Incident #492 - The Title Description";
Matcher m = p.matcher(theString);
if (m.matches()) {
String substring = m.group(1);
System.out.println(substring);
}
You need to invoke one of the Matcher methods, like find, matches or lookingAt to actually run the match.
Given a token in the format "word_suffix", I want to match and capture the "suffix" part.
For instance, in "Peter_NNP" I want to capture "NNP". I wrote:
String p="Peter_NNP";
Matcher matcher=Pattern.compile(".+_(.*\\s)").matcher(p);
System.out.println(matcher.group(1));
instead of printing "NNP" as I would expect, it arises the following exception:
Exception in thread "main" java.lang.IllegalStateException: No match found
at java.util.regex.Matcher.group(Unknown Source)
Note that "word" and "suffix" part can be made of any character.
You need to call find() to grab your match group. Also, your capture group expects that there should be whitespace at the end of the string, in "Peter_NNP" there is none, .* is enough here.
String s = "Peter_NNP";
Pattern p = Pattern.compile(".+_(.*)");
Matcher m = p.matcher(s);
if (m.find()) {
System.out.println(m.group(1)); //=> "NNP"
}
But, I would think a simple split would be fine here:
String s = "Peter_NNP";
String[] parts = s.split("_");
System.out.println(parts[1]); //=> "NNP"
Just to add on hwnd answer, If you want to capture anything after first underscore(even if there is no character before underscore). Thanks hwnd for making me understand this.
String s="_NNP";
Matcher matcher=Pattern.compile(".*?_(.*)").matcher(s);
while (matcher.find()) {
System.out.println(matcher.group(1));
}
i am new to regular expressions in Java. I like to extract a string by using regular expressions.
This is my String: "Hello,World"
I like to extract the text after ",". The result would be "World". I tried this:
final Pattern pattern = Pattern.compile(",(.+?)");
final Matcher matcher = pattern.matcher("Hello,World");
matcher.find();
But what would be the next step?
You don't need Regex for this. You can simply split on comma and get the 2nd element from the array: -
System.out.println("Hello,World".split(",")[1]);
OUTPUT: -
World
But if you want to use Regex, you need to remove ? from your Regex.
? after + is used for Reluctant matching. It will only match W and stop there.
You don't need that here. You need to match until it can match.
So use greedy matching instead.
Here's the code with modified Regex: -
final Pattern pattern = Pattern.compile(",(.+)");
final Matcher matcher = pattern.matcher("Hello,World");
if (matcher.find()) {
System.out.println(matcher.group(1));
}
OUTPUT: -
World
Extending what you have, you need to remove the ? sign from your pattern to use the greedy matching and then process the matched group:
final Pattern pattern = Pattern.compile(",(.+)"); // removed your '?'
final Matcher matcher = pattern.matcher("Hello,World");
while (matcher.find()) {
String result = matcher.group(1);
// work with result
}
Other answers suggest different approaches to your problem and might offer better solution for what you need.
System.out.println( "Hello,World".replaceAll(".*,(.*)","$1") ); // output is "World"
You are using a reluctant expression and will only select a single character W, whereas you can use a greedy one and print your matched group content:
final Pattern pattern = Pattern.compile(",(.+)");
final Matcher matcher = pattern.matcher("Hello,World");
if (matcher.find()) {
System.out.println(matcher.group(1));
}
Output:
World
See Regex Pattern doc
I want to remove a part of a string following what matches my regex.
I am trying to make a TV show organization program and I want to cut off anything in the name following the season and episode marker in the form SXXEXX where X is a digit.
I grasped the regex model fairly easily to create "[Ss]\d\d[Ee]\d\d" which should match properly.
I want to use the Matcher method end() to get the last index in the string of the match but it does not seem to be working as I think it should.
Pattern p = Pattern.compile("[Ss]\\d\\d[Ee]\\d\\d");
Matcher m = p.matcher(name);
if(m.matches())
return name.substring(0, m.end());
If someone could tell me why this doesn't work and suggest a proper way to do it, that would be great. Thanks.
matches() tries to match the whole string again the pattern. If you want to find your pattern within a string, use find(), find() will search for the next match in the string.
Your code could be quite the same:
if(m.find())
return name.substring(0, m.end());
matches matches the entire string, try find()
You could capture the name as well:
String name = "a movie S01E02 with some stuff";
Pattern p = Pattern.compile("(.*[Ss]\\d\\d[Ee]\\d\\d)");
Matcher m = p.matcher(name);
if (m.find())
System.out.println(m.group());
else
System.out.println("No match");
Will capture and print:
a movie S01E02
This should work
.*[Ss]\d\d[Ee]\d\d
In java (I'm rusty) this will be
String ResultString = null;
Pattern regex = Pattern.compile(".*[Ss]\\d\\d[Ee]\\d\\d");
Matcher regexMatcher = regex.matcher("Title S11E11Blah");
if (regexMatcher.find()) {
ResultString = regexMatcher.group();
}
Hope this helps