How can I change the following code so it will not care about case?
public static String tagValue(String inHTML, String tag)
throws DataNotFoundException {
String value = null;
Matcher m = null;
int count = 0;
try {
String searchFor = "<" + tag + ">(.*?)</" + tag + ">";
Pattern pattern = Pattern.compile(searchFor);
m = pattern.matcher(inHTML);
while (m.find()) {
count++;
return inHTML.substring(m.start(), m.end());
// System.out.println(inHTML.substring(m.start(), m.end()));
}
} catch (Exception e) {
throw new DataNotFoundException("Can't Find " + tag + "Tag.");
}
if (count == 0) {
throw new DataNotFoundException("Can't Find " + tag + "Tag.");
}
return inHTML.substring(m.start(), m.end());
}
Give the Pattern.CASE_INSENSITIVE flag to Pattern.compile:
String searchFor = "<" + tag + ">(.*?)</" + tag + ">";
Pattern pattern = Pattern.compile(searchFor, Pattern.CASE_INSENSITIVE);
m = pattern.matcher(inHTML);
(Oh, and consider parsing XML/HTML instead of using a regular expression to match a nonregular language.)
You can also compile the pattern with the case-insensitive flag:
Pattern pattern = Pattern.compile(searchFor, Pattern.CASE_INSENSITIVE);
First, read Using regular expressions to parse HTML: why not?
To answer your question though, in general, you can just put (?i) at the beginning of the regular expression:
String searchFor = "(?i)" + "<" + tag + ">(.*?)</" + tag + ">";
The Pattern Javadoc explains
Case-insensitive matching can also be enabled via the embedded flag expression (?i).
Since you're using Pattern.compile you can also just pass the CASE_INSENSITIVE flag:
String searchFor = "<" + tag + ">(.*?)</" + tag + ">";
Pattern pattern = Pattern.compile(searchFor, Pattern.CASE_INSENSITIVE);
You should know what case-insensitive means in Java regular expressions.
By default, case-insensitive matching assumes that only characters in the US-ASCII charset are being matched. Unicode-aware case-insensitive matching can be enabled by specifying the UNICODE_CASE flag in conjunction with this flag.
It looks like you're matching tags, so you only want US-ASCII.
Related
I am using a regex to get word from string its working fine in alphanumeric case but return wrong answer if we are used arithmetic operator.
Matcher oMatcher;
Pattern oPattern;
String key = "a++";
oPattern = Pattern.compile("\\b" + key + "\\b");
oMatcher = oPattern.matcher("max winzer® build-a-chair cocktailsessel »luisa« in runder form, zum selbstgestalten");
if (oMatcher.find()) {
System.out.println("True");
}
You have to escape any potential regex special characters in key with Pattern.quote:
oPattern = Pattern.compile("\\b" + Pattern.quote(key) + "\\b");
^^^^^^^^^^^^^
I am fairly new regex but I am trying to learn it. I'm not doing anything complicated; I have some XML:
<root>
<friendlyName>Hello, I'm friendly</friendlyName>
<URL>http://localhost</URL>
</root>
I am trying to get the value of friendlyName but it doesn't appear to be working. I've used an online regex tester from https://regex101.com/ which seems to match against what I'm expecting. However, when I try it in Java I get back N/A, N/A being what I return if the string was not found.
Below is my code:
public String getXMLTagValue(String tagName)
{
Pattern pattern = Pattern.compile("<" + tagName + ">(.*?)</" + tagName + ">/s");
Matcher matcher = pattern.matcher(xmlString);
while (matcher.find())
{
return matcher.group();
}
return "N/A";
}
I'm expecting the above code to return Hello, I'm friendly but instead I get N/A.
You regex is wrongly defined, it must be:
"<" + tagName + ">(.*?)</" + tagName + ">\\s"
and not
"<" + tagName + ">(.*?)</" + tagName + ">/s"
Change
"<" + tagName + ">(.*?)</" + tagName + ">/s"
to
"<" + tagName + ">(.*?)</" + tagName + ">\\s"
Reason:
The \s metacharacter is used to find a whitespace character.
A whitespace character can be:
A space character
A tab character
A carriage return character
A new line character
A vertical tab character
A form feed character
So the true form is \s which when coming to java regex becomes \\s (Because \ is a special character in java)
Also I (and some others) think that using \\s is not necessary. You can just use this pattern:
"<" + tagName + ">(.*?)</" + tagName + ">"
Start correcting your XML <friendlyName> ends with </friendly> , it is not well formed. Then regex is wrong, you can replace:
"<" + tagName + ">(.*?)</" + tagName + ">/s"
with:
"<" + tagName + ">(.*?)</" + tagName + ">\\s"
but really you don't need the "\\s".
If you want only the text between the tag you also need to remove the open and close tag before returning the result string.
Below is the working code, I also added an improved method the use the javax.xml.parsers.DocumentBuilder to parse the XML instead of regex.
private static String xmlString =
"<root>"
+"<friendly>Hello, I'm friendly</friendly>"
+"<url>http://localhost</url>"
+"</root>";
public static void main(String[] args) throws Exception {
String value = getXMLTagValue("friendly");
System.out.println(value);
String out = getXMLTagValueImproved("friendly");
System.out.println(out);
}
public static String getXMLTagValue(String tagName)
{
String openTag = "<" + tagName + ">";
String closeTag = "</" + tagName + ">";
Pattern pattern = Pattern.compile(openTag + "(.*?)" + closeTag);
Matcher matcher = pattern.matcher(xmlString);
while (matcher.find())
{
return matcher.group().replaceAll(openTag, "").replace(closeTag, "");
}
return "N/A";
}
public static String getXMLTagValueImproved(String tagName) throws Exception {
InputSource is = new InputSource(new StringReader(xmlString));
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();;
Document doc = dBuilder.parse(is);
NodeList nl = doc.getDocumentElement().getElementsByTagName(tagName);
return nl.getLength() > 0 ? nl.item(0).getTextContent() : "N/A" ;
}
hope this can help.
I have a String like below
String phone = (123) 456-7890
Now I would like my program to verify if that my input is the same pattern as string 'phone'
I did the following
if(phone.contains("([0-9][0-9][0-9]) [0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]")) {
//display pass
}
else {
//display fail
}
It didn't work. I tried with other combinations too. nothing worked.
Question :
1. How can I achieve this without using 'Pattern' like above?
2. How to do this with pattern. I tried with pattern as below
Pattern pattern = Pattern.compile("(\d+)");
Matcher match = pattern.matcher(phone);
if (match.find()) {
//Displaypass
}
String#matches checks if a string matches a pattern:
if (phone.matches("\\(\\d{3}\\) \\d{3}-\\d{4}")) {
//Displaypass
}
The pattern is a regular expression. Therefor I had to escape the round brackets, as they have a special meaning in regex (they denote capturing groups).
contains() only checks if a string contains the substring passed to it.
I'm not going to dive too deeply into regex syntax, but there definitely is something off with your regex.
"([0-9][0-9][0-9]) [0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]"
it containes ( and ) and those have special meaning. Escape them
"\([0-9][0-9][0-9]\) [0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]"
and you'll also have to escape your \ for the final
"\\([0-9][0-9][0-9]\\) [0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]"
You can write like:
Pattern pattern = Pattern.compile("\\(\\d{3}\\) \\d{3}-\\d{4}");
Matcher matcher = pattern.matcher(sPhoneNumber);
if (matcher.matches()) {
System.out.println("Phone Number Valid");
}
For more information you can visit this article.
It appears that your problem is that you didn't escape the parentheses, so your Regex is failing. Try this:
\([0-9][0-9][0-9]\) [0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]
This works
String PHONE_REGEX = "[(]\\b[0-9]{3}\\b[)][ ]\\b[0-9]{3}\\b[-]\\b[0-9]{4}\\b";
String phone1 = "(1234) 891-6762";
Boolean b = phone1.matches(PHONE_REGEX);
System.out.println("is e-mail: " + phone1 + " :Valid = " + b);
String phone2 = "(143) 456-7890";
b = phone2.matches(PHONE_REGEX);
System.out.println("is e-mail: " + phone2 + " :Valid = " + b);
Output:
is phone: (1234) 891-6762 :Valid = false
is phone: (143) 456-7890 :Valid = true
I have
String content= "<a data-hovercard=\"/ajax/hovercard/group.php?id=180552688740185\">
<a data-hovercard=\"/ajax/hovercard/group.php?id=21392174\">"
I want to get all the id between "group.php?id=" and "\""
Ex:180552688740185
Here is my code:
String content1 = "";
Pattern script1 = Pattern.compile("group.php?id=.*?\"");
Matcher mscript1 = script1.matcher(content);
while (mscript1.find()) {
content1 += mscript1.group() + "\n";
}
But for some reason it does not work.
Can you give me some advice?
Why are you using .*? to match the id. .*? will match every character. You just need to check for digits. So, just use \\d.
Also, you need to capture the id and then print it.
// To consider special characters as literals
String str = Pattern.quote("group.php?id=") + "(\\d*)";
Pattern script1 = Pattern.compile(str);
// Your matcher line
while (mscript1.find()) {
content += mscript1.group(1) + "\n"; // Capture group 1 contains your id
}
I'm new with java and using regular expressions. The method seems to be OK, and it's finding results on the subject string, but when I try to get the actual string using .group(), it's empty. here's the code:
public String TestRegularExpression(){
try{
Pattern regex = Pattern.compile(pattern, Pattern.CASE_INSENSITIVE | Pattern.MULTILINE);
Matcher regexMatcher = regex.matcher(sourceCode);
while (regexMatcher.find()) {
results += "<li>" + regexMatcher.group() + "</li>";
matches ++;
}
} catch (PatternSyntaxException ex) {
results = "<li><strong class='ibm-important'>Syntax error in the regular expression</strong></li>";
}
if(results == null){results = "<li><strong class='ibm-important'>No meta tags found</strong></li>";}
return "<h3>" + h3Title + " (" + matches + " found)</h3><ul>" + results + "</ul>";
}
Any help will be much appreciated!!!
Couldn't it be that you're just not seeing the output? If you output the match directly to HTML without quoting it, that'll just insert the META tag in the HTML code, and the web browser won't render it.