Regular expression not returning the .group() value

Regular expression not returning the .group() value - java

I'm new with java and using regular expressions. The method seems to be OK, and it's finding results on the subject string, but when I try to get the actual string using .group(), it's empty. here's the code:
public String TestRegularExpression(){
try{
Pattern regex = Pattern.compile(pattern, Pattern.CASE_INSENSITIVE | Pattern.MULTILINE);
Matcher regexMatcher = regex.matcher(sourceCode);
while (regexMatcher.find()) {
results += "<li>" + regexMatcher.group() + "</li>";
matches ++;
}
} catch (PatternSyntaxException ex) {
results = "<li><strong class='ibm-important'>Syntax error in the regular expression</strong></li>";
}
if(results == null){results = "<li><strong class='ibm-important'>No meta tags found</strong></li>";}
return "<h3>" + h3Title + " (" + matches + " found)</h3><ul>" + results + "</ul>";
}
Any help will be much appreciated!!!

Couldn't it be that you're just not seeing the output? If you output the match directly to HTML without quoting it, that'll just insert the META tag in the HTML code, and the web browser won't render it.

Related

Two separate patterns and matchers (java)

I'm working on a simple bot for discord and the first pattern reading works fine and I get the results I'm looking for, but the second one doesn't seem to work and I can't figure out why.
Any help would be appreciated
public void onMessageReceived(MessageReceivedEvent event) {
if (event.getMessage().getContent().startsWith("!")) {
String output, newUrl;
String word, strippedWord;
String url = "http://jisho.org/api/v1/search/words?keyword=";
Pattern reading;
Matcher matcher;
word = event.getMessage().getContent();
strippedWord = word.replace("!", "");
newUrl = url + strippedWord;
//Output contains the raw text from jisho
output = getUrlContents(newUrl);
//Searching through the raw text to pull out the first "reading: "
reading = Pattern.compile("\"reading\":\"(.*?)\"");
matcher = reading.matcher(output);
//Searching through the raw text to pull out the first "english_definitions: "
Pattern def = Pattern.compile("\"english_definitions\":[\"(.*?)]");
Matcher matcher2 = def.matcher(output);
event.getTextChannel().sendMessage(matcher2.toString());
if (matcher.find() && matcher2.find()) {
event.getTextChannel().sendMessage("Reading: "+matcher.group(1)).queue();
event.getTextChannel().sendMessage("Definition: "+matcher2.group(1)).queue();
}
else {
event.getTextChannel().sendMessage("Word not found").queue();
}
}
}

You had to escape the [ character to \\[ (once for the Java String and once for the Regex). You also did forget the closing \".
the correct pattern looks like this:
Pattern def = Pattern.compile("\"english_definitions\":\\[\"(.*?)\"]");
At the output, you might want to readd \" and start/end.
event.getTextChannel().sendMessage("Definition: \""+matcher2.group(1) + "\"").queue();

Need help to form a regex in java

I want to find a regx and occurrences of it in the page source using language Java. The value I am trying to search is as given in the program below.
There might be one or more spaces between tags. I am not able to form a regx for this value. Can some one please help me to find the regx for this value?
My program which checks regx is as given below-
String regx=""<img height=""1"" width=""1"" style=""border-style:none;"" alt="""" src=""//api.adsymptotic.com/api/s/trackconversion?_pid=12170&_psign=3841da8d95cc1dbcf27a696f27ccab0b&_aid=1376&_lbl=RT_LampsPlus_Retargeting_Pixel""/>";
WebDrive driver = new FirefoxDriver();
driver.navigate().to("abc.xom");
int count=0, found=0;
source = driver.getPageSource();
source = source.replaceAll("\\s+", " ").trim();
pattern = Pattern.compile(regx);
matcher = pattern.matcher(source);
while(matcher.find())
{
count++;
found=1;
}
if(found==0)
{
System.out.println("Maximiser not found");
pixelData[rowNumber][2] = String.valueOf(count) ;
pixelData[rowNumber][3] = "Fail";
}
else
{
System.out.println("Maximiser is found" + count);
pixelData[rowNumber][2] = String.valueOf(count) ;
pixelData[rowNumber][3] = "Pass";
}
count=0; found=0;

Hard to tell without the original text and expected result, but your Pattern clearly won't compile as is.
You should single-escape double quotes (\") and double-escape special characters (i.e. \\?) for your code and your Pattern to compile.
Something in the lines of:
String regx="<img height=\"1\" width=\"1\" style=\"border-style:none;\" " +
"alt=\"\" src=\"//api.adsymptotic.com/api/s/trackconversion" +
"\\?_pid=12170&_psign=3841da8d95cc1dbcf27a696f27ccab0b" +
"&_aid=1376&_lbl=RT_LampsPlus_Retargeting_Pixel\"/>";
Also consider scraping markup with appropriate framework (i.e. JSoup for HTML) instead of regex.

Regular expression on a string

I have a String like below
String phone = (123) 456-7890
Now I would like my program to verify if that my input is the same pattern as string 'phone'
I did the following
if(phone.contains("([0-9][0-9][0-9]) [0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]")) {
//display pass
}
else {
//display fail
}
It didn't work. I tried with other combinations too. nothing worked.
Question :
1. How can I achieve this without using 'Pattern' like above?
2. How to do this with pattern. I tried with pattern as below
Pattern pattern = Pattern.compile("(\d+)");
Matcher match = pattern.matcher(phone);
if (match.find()) {
//Displaypass
}

String#matches checks if a string matches a pattern:
if (phone.matches("\\(\\d{3}\\) \\d{3}-\\d{4}")) {
//Displaypass
}
The pattern is a regular expression. Therefor I had to escape the round brackets, as they have a special meaning in regex (they denote capturing groups).
contains() only checks if a string contains the substring passed to it.

I'm not going to dive too deeply into regex syntax, but there definitely is something off with your regex.
"([0-9][0-9][0-9]) [0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]"
it containes ( and ) and those have special meaning. Escape them
"\([0-9][0-9][0-9]\) [0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]"
and you'll also have to escape your \ for the final
"\\([0-9][0-9][0-9]\\) [0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]"

You can write like:
Pattern pattern = Pattern.compile("\\(\\d{3}\\) \\d{3}-\\d{4}");
Matcher matcher = pattern.matcher(sPhoneNumber);
if (matcher.matches()) {
System.out.println("Phone Number Valid");
}
For more information you can visit this article.

It appears that your problem is that you didn't escape the parentheses, so your Regex is failing. Try this:
\([0-9][0-9][0-9]\) [0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]

This works
String PHONE_REGEX = "[(]\\b[0-9]{3}\\b[)][ ]\\b[0-9]{3}\\b[-]\\b[0-9]{4}\\b";
String phone1 = "(1234) 891-6762";
Boolean b = phone1.matches(PHONE_REGEX);
System.out.println("is e-mail: " + phone1 + " :Valid = " + b);
String phone2 = "(143) 456-7890";
b = phone2.matches(PHONE_REGEX);
System.out.println("is e-mail: " + phone2 + " :Valid = " + b);
Output:
is phone: (1234) 891-6762 :Valid = false
is phone: (143) 456-7890 :Valid = true

Find URL in String

hi im tring to find a URL in a string, i founded many topics about this using regex but i have a problem. Using this pattern:
String regex = "\\b(((ht|f)tp(s?)\\:\\/\\/|~\\/|\\/)|www.)" +
"(\\w+:\\w+#)?(([-\\w]+\\.)+(com|org|net|gov" +
"|mil|biz|info|mobi|name|aero|jobs|museum" +
"|travel|[a-z]{2}))(:[\\d]{1,5})?" +
"(((\\/([-\\w~!$+|.,=]|%[a-f\\d]{2})+)+|\\/)+|\\?|#)?" +
"((\\?([-\\w~!$+|.,*:]|%[a-f\\d{2}])+=?" +
"([-\\w~!$+|.,*:=]|%[a-f\\d]{2})*)" +
"(&(?:[-\\w~!$+|.,*:]|%[a-f\\d{2}])+=?" +
"([-\\w~!$+|.,*:=]|%[a-f\\d]{2})*)*)*" +
"(#([-\\w~!$+|.,*:=]|%[a-f\\d]{2})*)?\\b";
Its works pretty well in most of pages, but i have an issue with other. For example:
http://hello.com/hello world
returns
http://hello.com/hello
The problems is that space.
Anyone have a nice pattern that solve this?
Thanks.
EDIT:: this is my code
private ArrayList<String> pullLinks(String text) {
ArrayList<String> links = new ArrayList<String>();
String regex = "\\b(((ht|f)tp(s?)\\:\\/\\/|~\\/|\\/)|www.)" +
"(\\w+:\\w+#)?(([-\\w]+\\.)+(com|org|net|gov" +
"|mil|biz|info|mobi|name|aero|jobs|museum" +
"|travel|[a-z]{2}))(:[\\d]{1,5})?" +
"(((\\/([-\\w~!$+|.,=]|%[a-f\\d]{2})+)+|\\/)+|\\?|#)?" +
"((\\?([-\\w~!$+|.,*:]|%[a-f\\d{2}])+=?" +
"([-\\w~!$+|.,*:=]|%[a-f\\d]{2})*)" +
"(&(?:[-\\w~!$+|.,*:]|%[a-f\\d{2}])+=?" +
"([-\\w~!$+|.,*:=]|%[a-f\\d]{2})*)*)*" +
"(#([-\\w~!$+|.,*:=]|%[a-f\\d]{2})*)?\\b";
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(text);
while(m.find()) {
String urlStr = m.group();
if (urlStr.startsWith("(") && urlStr.endsWith(")"))
{
urlStr = urlStr.substring(1, urlStr.length() - 1);
}
links.add(urlStr);
}
return links;
}

Spaces are not allowed in URLs (they need to be replaced by %20). See for instance the answer to this question:
Spaces in URLs?
If you allow URLs to include spaces anyway, then how would you interpret for instance http://www.google.com/ig is a nice webpage? Clearly the part after /ig should not be included!

Space is not a valid URL character.
Also, if you don't use whitespace as your terminator how are you going to find the end of the URL?
Your regex is also failing to account for other top level domains (like .int). I'm not actually sure why it is looking for specific TLDs at all as they are not required to form a valid URL.

How can I change Regex search in java to overlook case

How can I change the following code so it will not care about case?
public static String tagValue(String inHTML, String tag)
throws DataNotFoundException {
String value = null;
Matcher m = null;
int count = 0;
try {
String searchFor = "<" + tag + ">(.*?)</" + tag + ">";
Pattern pattern = Pattern.compile(searchFor);
m = pattern.matcher(inHTML);
while (m.find()) {
count++;
return inHTML.substring(m.start(), m.end());
// System.out.println(inHTML.substring(m.start(), m.end()));
}
} catch (Exception e) {
throw new DataNotFoundException("Can't Find " + tag + "Tag.");
}
if (count == 0) {
throw new DataNotFoundException("Can't Find " + tag + "Tag.");
}
return inHTML.substring(m.start(), m.end());
}

Give the Pattern.CASE_INSENSITIVE flag to Pattern.compile:
String searchFor = "<" + tag + ">(.*?)</" + tag + ">";
Pattern pattern = Pattern.compile(searchFor, Pattern.CASE_INSENSITIVE);
m = pattern.matcher(inHTML);
(Oh, and consider parsing XML/HTML instead of using a regular expression to match a nonregular language.)

You can also compile the pattern with the case-insensitive flag:
Pattern pattern = Pattern.compile(searchFor, Pattern.CASE_INSENSITIVE);

First, read Using regular expressions to parse HTML: why not?
To answer your question though, in general, you can just put (?i) at the beginning of the regular expression:
String searchFor = "(?i)" + "<" + tag + ">(.*?)</" + tag + ">";
The Pattern Javadoc explains
Case-insensitive matching can also be enabled via the embedded flag expression (?i).
Since you're using Pattern.compile you can also just pass the CASE_INSENSITIVE flag:
String searchFor = "<" + tag + ">(.*?)</" + tag + ">";
Pattern pattern = Pattern.compile(searchFor, Pattern.CASE_INSENSITIVE);
You should know what case-insensitive means in Java regular expressions.
By default, case-insensitive matching assumes that only characters in the US-ASCII charset are being matched. Unicode-aware case-insensitive matching can be enabled by specifying the UNICODE_CASE flag in conjunction with this flag.
It looks like you're matching tags, so you only want US-ASCII.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Regular expression not returning the .group() value - java

Couldn't it be that you're just not seeing the output? If you output the match directly to HTML without quoting it, that'll just insert the META tag in the HTML code, and the web browser won't render it.

Related

Two separate patterns and matchers (java)

Need help to form a regex in java

Regular expression on a string

Find URL in String

How can I change Regex search in java to overlook case

Categories

Resources