Why isn't this regex finding a match? - java

Here's my class I'm in the middle of writing:
public class MtomParser {
private static final String HEADER_REGEX = "^\\s*Content-ID:";
public boolean isMtom(String payload) {
return payload.contains("--uuid");
}
public String parseMtom(String mtomResponse) {
while (mtomResponse.matches(HEADER_REGEX)) {
System.out.println("header found");
}
return mtomResponse;
}
}
I'm expecting my input to make this code cause an infinite loop since it should find a match and there's no way to escape the loop. But, mtomResponse.matches(HEADER_REGEX) returns false every time and I'm not sure why. Here's the mtomResponse:
--uuid:b6bd1ef2-63e2-4d8d-8bac-eabbe7588373
Content-Type: application/xop+xml; charset=UTF-8; type="application/soap+xml";
Content-Transfer-Encoding: binary
Content-ID: <root.message#cxf.apache.org>
<soap:Envelope xmlns:soap="http://www.w3.org/2003/05/soap-envelope"><soap:Header/><soap:Body><RetrieveDocumentSetResponse xmlns="urn:ihe:iti:xds-b:2007" xmlns:ns10="http://docs.oasis-open.org/wsrf/bf-2" xmlns:ns11="http://docs.oasis-open.org/wsn/t-1" xmlns:ns12="urn:gov:hhs:fha:nhinc:common:subscriptionb2overridefordocuments" xmlns:ns13="http://nhinc.services.com/schema/auditmessage" xmlns:ns14="urn:oasis:names:tc:emergency:EDXL:DE:1.0" xmlns:ns15="http://www.hhs.gov/healthit/nhin/cdc" xmlns:ns16="urn:gov:hhs:fha:nhinc:common:subscriptionb2overrideforcdc" xmlns:ns2="urn:gov:hhs:fha:nhinc:common:nhinccommon" xmlns:ns3="urn:gov:hhs:fha:nhinc:common:nhinccommonentity" xmlns:ns4="urn:oasis:names:tc:ebxml-regrep:xsd:rim:3.0" xmlns:ns5="urn:oasis:names:tc:ebxml-regrep:xsd:rs:3.0" xmlns:ns6="urn:oasis:names:tc:ebxml-regrep:xsd:query:3.0" xmlns:ns7="urn:oasis:names:tc:ebxml-regrep:xsd:lcm:3.0" xmlns:ns8="http://docs.oasis-open.org/wsn/b-2" xmlns:ns9="http://www.w3.org/2005/08/addressing"><ns5:RegistryResponse status="urn:oasis:names:tc:ebxml-regrep:ResponseStatusType:Success"/><DocumentResponse><HomeCommunityId>urn:oid:422.422</HomeCommunityId><RepositoryUniqueId>422.422</RepositoryUniqueId><DocumentUniqueId>422.422^C4n2hv7z_5Ofa37W</DocumentUniqueId><mimeType>text/xml</mimeType><Document><xop:Include xmlns:xop="http://www.w3.org/2004/08/xop/include" href="cid:3511c0cc-5e20-46b7-8ae0-406c3b1ea95f-6#urn%3Aihe%3Aiti%3Axds-b%3A2007"/></Document></DocumentResponse></RetrieveDocumentSetResponse></soap:Body></soap:Envelope>
--uuid:b6bd1ef2-63e2-4d8d-8bac-eabbe7588373
Content-Type: text/xml
Content-Transfer-Encoding: binary
Content-ID: <3511c0cc-5e20-46b7-8ae0-406c3b1ea95f-6#urn:ihe:iti:xds-b:2007>
<?xml version="1.0" encoding="UTF-8"?>
<ClinicalDocument xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="urn:hl7-org:v3" xsi:schemaLocation="urn:hl7-org:v3 http://hit-testing.nist.gov:11080/hitspValidation/schema/cdar2c32/infrastructure/cda/C32_CDA.xsd">
<realmCode code="US"/>
<typeId root="2.16.840.1.113883.1.3" extension="POCD_HD000040"/>
In my IDE, if I search by the regex of ^\s*Content-ID:, it finds 2 results. So why doesn't this java code find any matches?

You need to enable MULTILINE mode, to allow ^ to match each line instead of the entire string.
Pattern pattern = Pattern.compile(yourRegex, Pattern.MULTILINE);
Matcher matcher = pattern.matcher(s);
while (matcher.find()) {
System.out.println(matcher.group());
}
See: http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html

Related

StringUtils replace text in between two patterns

Hi I found really useful the apache operator
StringUtils.substringBetween(fileContent, "<![CDATA[", "]]>")
to extract information inside
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<envelope>
<xxxx>
<yyyy>
<![CDATA[
<?xml version="1.0" encoding="UTF-8" ?>
<Document >
<eee>
<tt>
<ss>zzzzzzz</ss>
<aa>2021-09-09T10:39:29.850Z</aa>
<aaaa>
<Cd>cccc</Cd>
</aaaa>
<dd>ssss</dd>
<ff></ff>
</tt>
</eee>
</Document>
]]>
</yyyy>
</xxxx>
</envelope>
But now what I'm looking is another operator or regex that allow me to replace a dynamic xml
![CDATA["old_xml"]]
by another xml
![CDATA["new_xml"]]
Any idea idea how to accomplish this?
Regards.
Instead of StringUtils, you can use String#replaceAll method:
fileContent = fileContent
.replaceAll("(?s)(<!\\[CDATA\\[).+?(]]>)", "$1foo$2");
Explanation:
(?s): Enable DOTALL mode so that . can match line breaks as well in .+?
(<!\\[CDATA\\[): Match opening <![CDATA[ substring and capture in group #1
.+?: Match 0 or more of any characters including line break
(]]>): Match closing ]]? substring and capture in group #2
$1foo$2: Replace with foo surrounded with back-references of capture group 1 and 2 on both sides
You can use the regex, (\<!\[CDATA\[).*?(\]\]>).
Demo:
public class Main {
public static void main(String[] args) {
String xml = """
...
<data><![CDATA[a < b]]></data>
...
""";
String replacement = "foo";
xml = xml.replaceAll("(\\<!\\[CDATA\\[).*?(\\]\\]>)", "$1" + replacement + "$2");
System.out.println(xml);
}
}
Output:
...
<data><![CDATA[foo]]></data>
...
Explanation of the regex:
( : Start of group#1
\<!\[CDATA\[ : String <![CDATA[
) : End of group#1
.*? : Any character any number of times
( : Start of group#2
\]\]>: String ]]>
) : End of group#2

Regex get text between tags

I try to get the text between a tag in JAVA.
`
<td colspan="2" style="font-weight:bold;">HELLO TOTO</td>
<td>Function :</td>
`
I would like to use a regex to extract "HELLO TOTO" but not "Function :"
I already tried something like this
`
String btwTags = "<td colspan=\"2\" style=\"font-weight:bold;\">HELLO TOTO</td>\n" + "<td>Function :</td>";
Pattern pattern = Pattern.compile("<td(.*?)>(.*?)</td>");
Matcher matcher = pattern.matcher(btwTags);
while (matcher.find()) {
String group = matcher.group();
System.out.println(group);
}
`
but the result is the same as the input.
Any ideas ?
I tried this regex (?<=<td>)(.*?)(?=</td>) too but it only catch "Function:"
I don't know of to set that he could be something after the open <td ...>
Already thanks in advance
Don't use RegEx to parse HTML, its a very bad idea...
to know why check this link:
RegEx match open tags except XHTML self-contained tags
you can use Jsoup to achieve this :
String html; // your html code
Document doc = Jsoup.parse(html);
System.out.println(doc.select("td[colspan=2]").text());
You can use a Regex for very basic HTML parsing. Here's the easiest Java regex I could find :
"(?i)<td[^>]+>([^<]+)<\\/td>"
It matches the first td tag with attributes and a value. "HELLO TOTO" is in group 1.
Here's an example.
For anything more complex, a parser like Jsoup would be better.
But even a parser could fail if the HTML isn't valid or if the structure for which you wrote the code has been changed.
I had provided solution without using REGEX Hope that would be helpful..
public class Solution{
public static void main(String ...args){
String str = "<td colspan=\"2\" style=\"font-weight:bold;\">HELLO TOTO</td><td>Function :</td>";
String [] garray = str.split(">|</td>");
for(int i = 1;i < garray.length;i+=2){
System.out.println(garray[i]);
}
}
}
Output :: HELLO TOTO
Function :
I am just using split function to delimit at given substrings .Regex is slow and often confuse.
cheers happy coding...

Changing XML values in Android/Java

I want to take an XML file as input which contains the following:
<?xml version='1.0' encoding='utf-8' standalone='yes'>
<map>
<int name="count" value="10" />
</map>
and, read and change the value from 10 to any other integer value.
How can I do this in Android/Java. I'm new to Android and Java and all the tutorials available on the internet are way too complicated.
Thank You
You can change the value by matching the pattern and replacing the string as like below,
String xmlString = "<int name=\"count\" value=\"10\" />";
int newValue = 100;
Pattern pattern = Pattern.compile("(<int name=\"count\" value=\")([0-9]{0,})(\" />)");
Matcher matcher = pattern.matcher(xmlString);
while (matcher.find()) {
String match = matcher.group(2);
xmlString = xmlString.replace(match, String.valueOf(newValue));
}
System.out.println(xmlString);
You can find your answer here. It is like parsing json. You can cast your string(from file) to object and do anything with parameters

String.matches fails every test?

I have an really odd behaviour of String.matches:
requestString.matches(".*")
(boolean) false
while requestString is something like
"HTTP/1.1 200 OK - OK
[...]
Content-Type: text/xml; Charset=iso-8859-1
Content-Length: 1545" + more...
Of cause, I want to test against "HTTP/\\d\\.\\d
but obviously this fails eighter:
requestString.matches("HTTP/\\d\\.\\d")
The String in requestString comes in via Socket connection and is send in iso-8859-1 encoding. Here is the code,
StringBuilder result = new StringBuilder();
int ch;
while ( ! timeoutExceeded() && (ch = reader.read()) != -1) {
result.append((char)ch);
}
String requestString = result.toString()
The code is running on android sdk.
What am I missing? Is the encoding the problem?
Solution:
thanks to the hints I tried the DotAll flag (again!) and it works:
requestString.matches("(?s).*HTTP/\\d\\.\\d.*")
First, see here.
Second, by default, the dot does not match newlines. As your input is multiline, this means the regex cannot match.
You have to use a Pattern and compile with Pattern.DOTALL:
final Pattern p = Pattern.compile(".*", Pattern.DOTALL);
p.matcher(anything).matches(); // always returns true
Illustration:
public static void main(final String... args)
{
final String input = "a\nb";
System.out.println(input.matches(".*"));
System.out.println(Pattern.compile(".*", Pattern.DOTALL)
.matcher(input).matches());
}
Result:
false
true
matches must match the entire string and since you are trying to match a multi-line string your pattern is not matching the complete string
eg.
System.out.println("HTTP/1.1 200 OK - OK".matches(".*")); //true
System.out.println("HTTP/1.1 200 OK - OK\nContent-Type: text/xml".matches(".*")); // false

Java Matcher Class

I need a pattern matcher to get the page id value in the below text which is coming from a http response body.
<meta name="ajs-page-id" content="262250">
What i'm after is to get the content value from this line that will always be generated in responsebody.
Pattern pat = Pattern.compile("<meta\\sname=\"ajs-page-id\"\\scontent=\"(\\d+)\">");
That is obviously a very literal pattern... but group(1) should return the number as a string.
Haven't tested.
Use an HTML parser like jsoup to parse and search for the part. You should not be using regular expressions for this.
e.g.,
String htmlStr = "<meta name=\"ajs-page-id\" content=\"262250\">";
Document doc = Jsoup.parse(htmlStr);
Element meta = doc.select("meta[name=ajs-page-id]").first();
if (meta != null)
{
System.out.println(meta.attr("content"));
}

Categories