I am unable to parse Multi-lined XML message payload using Pattern.compile(regex).However If I make same message Single line it Gives me expected result.For Example,IF I parse
<Document> <RGOrdCust50K5s0F> AccName AccNo AccAddress </RGOrdCust50K50F> </Document>
It gives me RGOrdCust50K50F> tag value as : AccName AccNo AccAddress but if I use multiple lines like
<Document> <RGOrdCust50K50F>AccNo
AccName
AccAddress </RGOrdCust50K50F></Document>
it through ava.lang.IllegalStateException: No match found
The Testcase code I am using to test this is as below
public class ParseXMLMessage {
public static void main(String[] args) {
String fldName = "RGOrdCust50K50F";
String message="<?xml version=1.0 encoding=UTF-8?> <Document><RGOrdCust50K50F>1234
ABCD
LONDON,UK </RGOrdCust50K50F></Document>";
String fldValue = getTagValue(fldName, message);
System.out.println("fldValue:"+fldValue);
}
private static String getTagValue(String tagName, String message) {
String regex = "(?<=<" + tagName + ">).*?(?=</" + tagName + ">)";
System.out.println("regex:"+regex);
Pattern pattern = Pattern.compile(regex);
System.out.println("pattern:"+pattern);
Matcher matcher = pattern.matcher(message);
System.out.println("matcher:"+matcher);
matcher.find(0);
String tagValue = null;
try {
tagValue = matcher.group();
} catch (IllegalStateException isex) {
System.out.println("No Tag/Match found " + isex.getMessage());
}
return tagValue;
}
}
As a business requirment I need to make message muli-lined but when i make message mutiple lined I get exception.
I am unable to fix this issue Kindly suggest if there IS ANY ISSUE WITH 'REGEX' expression I am using do I need to Use '/n' in Regex express to resolve this issue.Kindly assist
If you are parsing XML, use an XML parser to do it - your REGEX will get increasingly complex and frail as you find more and more situations that it can't handle adequately.
There are a large number of mature and stable XML processing libraries. I tend to stick with what I know and jdom has a very shallow learning curve and will handle this sort of processing very easily.
Issue depends on '.' metacharacter. See http://docs.oracle.com/javase/tutorial/essential/regex/pre_char_classes.html
. Any character (may or may not match line terminators)
Try to use following code:
Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE| Pattern.DOTALL);
Check following topic: java regex string matches and multiline delimited with new line
Related
I'm successfully reading outlook email from JAVAX mail. But when i try to get the "Link" available in email body it's not giving the exact URL, instead it gives the URL with some extra characters like "=3D?*/". I tried to use below code but it didn't help me.
public List<String> getUrlsFromMessage(Message message, String linkText) throws Exception {
String html = getMessageContent(message);
List<String> allMatches = new ArrayList<String>();
// (<a [^>]+>)
Matcher matcher = Pattern.compile(" (<a [^>]+>)" + linkText + "</a>").matcher(html);
while (matcher.find()) {
String aTag = matcher.group(1);
allMatches.add(aTag.substring(aTag.indexOf("http"), aTag.indexOf("\">")));
}
return allMatches;
}
Also I changed the pattern to
Pattern linkPattern = Pattern.compile(" <a\\b[^>]*href=\"([^\"]*)[^>]*>(.*?)</a>",
Pattern.CASE_INSENSITIVE | Pattern.DOTALL);`
But still it gives me the wrong URL.
Finally I found a solution to retrieve the exact URL using StringBuilder. What i did was i removed the unwanted characters from the string until i get the correct URL. This may not be a good coding practice but this was the only work around which works for me.
StringBuilder build = new StringBuilder(link);
build.deleteCharAt(43);// Shift the positions front.
build.deleteCharAt(51);
build.deleteCharAt(51);
driver.get(build.toString());
I'm working on a simple bot for discord and the first pattern reading works fine and I get the results I'm looking for, but the second one doesn't seem to work and I can't figure out why.
Any help would be appreciated
public void onMessageReceived(MessageReceivedEvent event) {
if (event.getMessage().getContent().startsWith("!")) {
String output, newUrl;
String word, strippedWord;
String url = "http://jisho.org/api/v1/search/words?keyword=";
Pattern reading;
Matcher matcher;
word = event.getMessage().getContent();
strippedWord = word.replace("!", "");
newUrl = url + strippedWord;
//Output contains the raw text from jisho
output = getUrlContents(newUrl);
//Searching through the raw text to pull out the first "reading: "
reading = Pattern.compile("\"reading\":\"(.*?)\"");
matcher = reading.matcher(output);
//Searching through the raw text to pull out the first "english_definitions: "
Pattern def = Pattern.compile("\"english_definitions\":[\"(.*?)]");
Matcher matcher2 = def.matcher(output);
event.getTextChannel().sendMessage(matcher2.toString());
if (matcher.find() && matcher2.find()) {
event.getTextChannel().sendMessage("Reading: "+matcher.group(1)).queue();
event.getTextChannel().sendMessage("Definition: "+matcher2.group(1)).queue();
}
else {
event.getTextChannel().sendMessage("Word not found").queue();
}
}
}
You had to escape the [ character to \\[ (once for the Java String and once for the Regex). You also did forget the closing \".
the correct pattern looks like this:
Pattern def = Pattern.compile("\"english_definitions\":\\[\"(.*?)\"]");
At the output, you might want to readd \" and start/end.
event.getTextChannel().sendMessage("Definition: \""+matcher2.group(1) + "\"").queue();
I want to find a regx and occurrences of it in the page source using language Java. The value I am trying to search is as given in the program below.
There might be one or more spaces between tags. I am not able to form a regx for this value. Can some one please help me to find the regx for this value?
My program which checks regx is as given below-
String regx=""<img height=""1"" width=""1"" style=""border-style:none;"" alt="""" src=""//api.adsymptotic.com/api/s/trackconversion?_pid=12170&_psign=3841da8d95cc1dbcf27a696f27ccab0b&_aid=1376&_lbl=RT_LampsPlus_Retargeting_Pixel""/>";
WebDrive driver = new FirefoxDriver();
driver.navigate().to("abc.xom");
int count=0, found=0;
source = driver.getPageSource();
source = source.replaceAll("\\s+", " ").trim();
pattern = Pattern.compile(regx);
matcher = pattern.matcher(source);
while(matcher.find())
{
count++;
found=1;
}
if(found==0)
{
System.out.println("Maximiser not found");
pixelData[rowNumber][2] = String.valueOf(count) ;
pixelData[rowNumber][3] = "Fail";
}
else
{
System.out.println("Maximiser is found" + count);
pixelData[rowNumber][2] = String.valueOf(count) ;
pixelData[rowNumber][3] = "Pass";
}
count=0; found=0;
Hard to tell without the original text and expected result, but your Pattern clearly won't compile as is.
You should single-escape double quotes (\") and double-escape special characters (i.e. \\?) for your code and your Pattern to compile.
Something in the lines of:
String regx="<img height=\"1\" width=\"1\" style=\"border-style:none;\" " +
"alt=\"\" src=\"//api.adsymptotic.com/api/s/trackconversion" +
"\\?_pid=12170&_psign=3841da8d95cc1dbcf27a696f27ccab0b" +
"&_aid=1376&_lbl=RT_LampsPlus_Retargeting_Pixel\"/>";
Also consider scraping markup with appropriate framework (i.e. JSoup for HTML) instead of regex.
I want to find all the "code" matches in my input string (With GWT RegExp). When I call the "regExp.exec(inputStr)" method it only returns the first match, even when I call it multiple times:
String input = "ff <code>myCode</code> ff <code>myCode2</code> dd <code>myCode3</code>";
String patternStr = "<code[^>]*>(.+?)</code\\s*>";
// Compile and use regular expression
RegExp regExp = RegExp.compile(patternStr);
MatchResult matcher = regExp.exec(inputStr);
boolean matchFound = (matcher != null); // equivalent to regExp.test(inputStr);
if (matchFound) {
// Get all groups for this match
for (int i=0; i<matcher.getGroupCount(); i++) {
String groupStr = matcher.getGroup(i);
System.out.println(groupStr);
}
}
How can I get all the matches?
Edit: Like greedybuddha noted: A regex is not really suited to parse (X)HTML. I gave JSOUP a try and it is much more convienient than with a regex. My code with jsoup now looks like this. I am renaming all code tags and apply them a CSS-Class:
String input = "ff<code>myCode</code>ff<code>myCode2</code>";
Document doc = Jsoup.parse(input, "UTF-8");
Elements links = doc.select("code"); // a with href
for(Element link : links){
System.out.println(link.html());
link.tagName("pre");
link.addClass("prettify");
}
System.out.println(doc);
Compile the regular expression with the "g" flag, for global matching.
RegExp regExp = RegExp.compile(patternStr,"g");
I think you will also want "m" for multiline matching, "gm".
That being said, for HTML/XML parsing you should consider using JSoup or another alternative.
Is there a simple solution to parse a String by using regex in Java?
I have to adapt a HTML page. Therefore I have to parse several strings, e.g.:
href="/browse/PJBUGS-911"
=>
href="PJBUGS-911.html"
The pattern of the strings is only different corresponding to the ID (e.g. 911). My first idea looks like this:
String input = "";
String output = input.replaceAll("href=\"/browse/PJBUGS\\-[0-9]*\"", "href=\"PJBUGS-???.html\"");
I want to replace everything except the ID. How can I do this?
Would be nice if someone can help me :)
You can capture substrings that were matched by your pattern, using parentheses. And then you can use the captured things in the replacement with $n where n is the number of the set of parentheses (counting opening parentheses from left to right). For your example:
String output = input.replaceAll("href=\"/browse/PJBUGS-([0-9]*)\"", "href=\"PJBUGS-$1.html\"");
Or if you want:
String output = input.replaceAll("href=\"/browse/(PJBUGS-[0-9]*)\"", "href=\"$1.html\"");
This does not use regexp. But maybe it still solves your problem.
output = "href=\"" + input.substring(input.lastIndexOf("/")) + ".html\"";
This is how I would do it:
public static void main(String[] args)
{
String text = "href=\"/browse/PJBUGS-911\" blahblah href=\"/browse/PJBUGS-111\" " +
"blahblah href=\"/browse/PJBUGS-34234\"";
Pattern ptrn = Pattern.compile("href=\"/browse/(PJBUGS-[0-9]+?)\"");
Matcher mtchr = ptrn.matcher(text);
while(mtchr.find())
{
String match = mtchr.group(0);
String insMatch = mtchr.group(1);
String repl = match.replaceFirst(match, "href=\"" + insMatch + ".html\"");
System.out.println("orig = <" + match + "> repl = <" + repl + ">");
}
}
This just shows the regex and replacements, not the final formatted text, which you can get by using Matcher.replaceAll:
String allRepl = mtchr.replaceAll("href=\"$1.html\"");
If just interested in replacing all, you don't need the loop -- I used it just for debugging/showing how regex does business.