I want to match an HTML file:
If the file starts with spaces and then an end tag </sometag>, return true.
Else return false.
I used the "(\\s)*</(\\w)*>.*", but it doesn't match \n </p>\n </blockquote> ....
Thanks to Gabe's help. Gabe is correct. The . doesn't match \n by default. I need to set the DOTALL mode on.
To do it, add the (?s) to the beginning of the regex, i.e. (?s)(\\s)*</(\\w)*>.*.
You can also do this:
Pattern p = Pattern.compile("(\\s)*</(\\w)*>");
Matcher m = p.matcher(s);
return m.lookingAt();
It just checks if the string starts with the pattern, rather than checking the whole string matches the pattern.
Related
When trying to match "??M?E???" with pattern "^([\\?]+)M([\\?]+)E([\\?]+)$", I get "No match", although it works for "?M?E??" fine.
My code snippet is
Pattern p = Pattern.compile("^([?]+)[M]{1}([?]+)[E]{1}([?]+)$");
Matcher m = p.matcher(input);
if ( !m.find() ) {
System.out.println("No Match");
continue ;
}
x = m.group(1).length();
y = m.group(2).length();
z = m.group(3).length();
You should be fine, your regex is good. Note that you don't need to escape ? inside a character class, you can simply:
^([?]+)M([?]+)E([?]+)$
Or, escape it but get it out the class*:
^(\?+)M(\?+)E(\?+)$
* Note that in Java, \ is represented as \\
After you've edited your question and posted the actual code, looks like you have a different regex (please don't do that next time), you should use matches instead of find. Your problem must be because of the content of input, it simply doesn't make find return true, note that find tries to find the next occurrence within the substring that matches the regex.
What does your code look like? This works fine:
Pattern pattern = Pattern.compile("^([\\?]+)M([\\?]+)E([\\?]+)$");
System.out.println(pattern.matcher("??M?E???").matches());
System.out.println(pattern.matcher("?M?E??").matches());
Output:
true
true
this is my Java code:
String patternParticipants = "([\\w\\.=-]+#[\\w\\.-]+\\.[\\w]{2,3}($|\n))*";
Pattern p = Pattern.compile(patternParticipants, Pattern.MULTILINE);
boolean matchesParticipants = p.matcher(reservation.getParticipants().trim()).matches();
And I want to match the following string:
john.wales#gmail.com
david.chrome#gmail.com
david.mika#gui.co
For some reason, matcher returns true only if one email address is given.
I've tried to set it for MULTILINE but this seems not to be working too. Any ideas?
Strip the new line character first, and then run your RegEx on it.
Use a code like this to strip the '\n'.
String text = readFileAsString("textfile.txt");
text.replace("\n", "");
P.S: Your data to be in textfile.txt.
Later, use this RegEx.
"^[a-zA-Z0-9_.+-]+#[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$"
This looks for a newline (\n) or the end of the string ($) after each address:
static boolean isValidEmailList(String str)
{
return str.matches("([\\w\\.=-]+#[\\w\\.-]+\\.[\\w]{2,3}($|\n))+");
}
Your regular expression will erroneously invalidate many valid email addresses, and allow some invalid addresses. But this is one way to make it work on your multi-line input.
^[_A-Za-z0-9-\+]+(\.[_A-Za-z0-9-]+)*
#[A-Za-z0-9-]+(\.[A-Za-z0-9]+)*(\.[A-Za-z]{2,})$;
You should use this one for Email
I have the following input:
-- input --
Keep this
Chomp this
ChompHere:
Anything below gets chomped
And I need the output to look like:
-- output (expected) --
Keep this
Right now I get the following based on the code below:
-- output (actual) --
Keep this
Chomp this
ASK: How can I delete the previous line of a regex match (Chomp this):
public void chompPreviousLine() {
String text = "Keep this\n"
+ "Chomp this\nChompHere:\nAnything below gets chomped";
Pattern CHOMP= Pattern.compile("^(ChompHere:(.*))$", Pattern.MULTILINE | Pattern.DOTALL);
Matcher m = CHOMP.matcher(text);
if (m.find()) {
// chomp everything below and one line above!
text = m.replaceAll("");
// but....??? how to delete the previous line ???
text = text .replaceAll("[\n]+$", ""); // delete any remaining /n
System.out.println(text);
}
}
You can modify the regex so that it also gets the previous line:
Pattern CHOMP= Pattern.compile("[^\n]+\nChompHere:(.*)", Pattern.MULTILINE | Pattern.DOTALL);
[^\n]+\n will match any consecutive character that is not an end-of-line character then the end-of-line itself. Since it is before ChompHere in the regex, it will match the complete line before ChompHere.
I have removed parenthesis since you don't really use groups in your algorithm; you are indeed replacing the whole matching text.
You could use a positive look-ahead:
^(.*)(?=ChompHere:)
Depending on whether you want the line break matched or not, you have to add it to the lookahead.
But would a simple parser not be easier for this?
Hello I have this regex in Javascript :
var urlregex = new RegExp("((www.)(([a-zA-Z0-9-]){2,}\.){1,4}([a-zA-Z]){2,6}(\/([a-zA-Z-_\/\.0-9#:?=&;,]*)?)?)");
And when I try to put it on a Java String I have this error :
Description Resource Path Location Type
Invalid escape sequence (valid ones are \b \t \n \f \r \" \' \\ ) CreateActivity.java /SupLink/src/com/supinfo/suplink/activities line 43 Java Problem
So I just want to know what I have to change to render it in Java in order to do this(this function runs fine) :
private boolean IsMatch(String s, String pattern)
{
try {
Pattern patt = Pattern.compile(pattern);
Matcher matcher = patt.matcher(s);
return matcher.matches();
} catch (PatternSyntaxException e){
return false;
}
}
EDIT :
Thank you for your help, now I have this :
private String regex = "((www.)(([a-zA-Z0-9-]){2,}\\.){1,4}([a-zA-Z]){2,6}(\\/([a-zA-Z-_\\/\\.0-9#:?=&;,]*)?)?)";
But I don't match what I really want (regex are horrible ^^), I would like to match thes types of urls :
www.something.com
something.com
something.com/anotherthing/anything
Can you help me again ?
really thanks
When you create the Java string, you need to escape the backslashes a second time so that Java understands that they are literal backslashes. You can replace all existing backslashes with \\. You also need to escape any Java characters that normally need to be escaped.
Currently your regex require www at start. If you want to make it optional add ? after (www.). Also you probably want to escape . after www part. Your regex should probably look like.
"((www\\.)?(([a-zA-Z0-9-]){2,}\\.){1,4}([a-zA-Z]){2,6}(\\/([a-zA-Z-_\\/\\.0-9#:?=&;,]*)?)?)"
You should scape \
something like this
"((www.)(([a-zA-Z0-9-]){2,}\\.){1,4}([a-zA-Z]){2,6}(\\/([a-zA-Z-_\\/\\.0-9#:?=&;,]*)?)?)"
I have the following code
private String anchorRegex = "\\<\\s*?a\\s+.*?href\\s*?=\\s*?([^\\s]*?).*?\\>";
private Pattern anchorPattern = Pattern.compile(anchorRegex, Pattern.CASE_INSENSITIVE);
String content = getContentAsString();
Matcher matcher = anchorPattern.matcher(content);
while(matcher.find()) {
System.out.println(matcher.group(1));
}
The call to getContentAsString() returns the HTML content from a web page. The problem I'm having is that the only thing that gets printed in my System.out is a space. Can anyone see what's wrong with my regex?
Regex drives me crazy sometimes.
You need to delimit your capturing group from the following .*?. There's probably double quotes " around the href, so use those:
<\s*a\s+.*?href\s*=\s*"(\S*?)".*?>
Your regex contains:
([^\s]*?).*?
The ([^\s]*?) says to reluctantly find all non-whitespace characters and save them in a group. But the reluctant *? depends on the next part, which is .; any character. So the matching of the href aborts at the first possible chance and it is the .*? which matches the rest of the URL.
The regex you should be using is this:
String anchorRegex = "(?s)<\\s*a\\s+.*?href\\s*=\\s*['\"]([^\\s>]*)['\"]";
This should be able to pull out the href without too much trouble.
The link is in capture group 2, its expanded and assumes dot-all.
Use Java delimiters as necessary.
(?s)
<a
(?=\s)
(?:[^>"']|"[^"]*"|'[^']*')*? (?<=\s) href \s*=\s* (['"]) (.*?) \1
(?:".*?"|'.*?'|[^>]*?)+
>
or not expanded, not dot-all.
<a(?=\s)(?:[^>"']|"[^"]*"|'[^']*')*?(?<=\s)href\s*=\s*(['"])([\s\S]*?)\1(?:"[\s\S]*?"|'[\s\S]*?'|[^>]*?)+>