Hi I found really useful the apache operator
StringUtils.substringBetween(fileContent, "<![CDATA[", "]]>")
to extract information inside
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<envelope>
<xxxx>
<yyyy>
<![CDATA[
<?xml version="1.0" encoding="UTF-8" ?>
<Document >
<eee>
<tt>
<ss>zzzzzzz</ss>
<aa>2021-09-09T10:39:29.850Z</aa>
<aaaa>
<Cd>cccc</Cd>
</aaaa>
<dd>ssss</dd>
<ff></ff>
</tt>
</eee>
</Document>
]]>
</yyyy>
</xxxx>
</envelope>
But now what I'm looking is another operator or regex that allow me to replace a dynamic xml
![CDATA["old_xml"]]
by another xml
![CDATA["new_xml"]]
Any idea idea how to accomplish this?
Regards.
Instead of StringUtils, you can use String#replaceAll method:
fileContent = fileContent
.replaceAll("(?s)(<!\\[CDATA\\[).+?(]]>)", "$1foo$2");
Explanation:
(?s): Enable DOTALL mode so that . can match line breaks as well in .+?
(<!\\[CDATA\\[): Match opening <![CDATA[ substring and capture in group #1
.+?: Match 0 or more of any characters including line break
(]]>): Match closing ]]? substring and capture in group #2
$1foo$2: Replace with foo surrounded with back-references of capture group 1 and 2 on both sides
You can use the regex, (\<!\[CDATA\[).*?(\]\]>).
Demo:
public class Main {
public static void main(String[] args) {
String xml = """
...
<data><![CDATA[a < b]]></data>
...
""";
String replacement = "foo";
xml = xml.replaceAll("(\\<!\\[CDATA\\[).*?(\\]\\]>)", "$1" + replacement + "$2");
System.out.println(xml);
}
}
Output:
...
<data><![CDATA[foo]]></data>
...
Explanation of the regex:
( : Start of group#1
\<!\[CDATA\[ : String <![CDATA[
) : End of group#1
.*? : Any character any number of times
( : Start of group#2
\]\]>: String ]]>
) : End of group#2
Related
I try to get the text between a tag in JAVA.
`
<td colspan="2" style="font-weight:bold;">HELLO TOTO</td>
<td>Function :</td>
`
I would like to use a regex to extract "HELLO TOTO" but not "Function :"
I already tried something like this
`
String btwTags = "<td colspan=\"2\" style=\"font-weight:bold;\">HELLO TOTO</td>\n" + "<td>Function :</td>";
Pattern pattern = Pattern.compile("<td(.*?)>(.*?)</td>");
Matcher matcher = pattern.matcher(btwTags);
while (matcher.find()) {
String group = matcher.group();
System.out.println(group);
}
`
but the result is the same as the input.
Any ideas ?
I tried this regex (?<=<td>)(.*?)(?=</td>) too but it only catch "Function:"
I don't know of to set that he could be something after the open <td ...>
Already thanks in advance
Don't use RegEx to parse HTML, its a very bad idea...
to know why check this link:
RegEx match open tags except XHTML self-contained tags
you can use Jsoup to achieve this :
String html; // your html code
Document doc = Jsoup.parse(html);
System.out.println(doc.select("td[colspan=2]").text());
You can use a Regex for very basic HTML parsing. Here's the easiest Java regex I could find :
"(?i)<td[^>]+>([^<]+)<\\/td>"
It matches the first td tag with attributes and a value. "HELLO TOTO" is in group 1.
Here's an example.
For anything more complex, a parser like Jsoup would be better.
But even a parser could fail if the HTML isn't valid or if the structure for which you wrote the code has been changed.
I had provided solution without using REGEX Hope that would be helpful..
public class Solution{
public static void main(String ...args){
String str = "<td colspan=\"2\" style=\"font-weight:bold;\">HELLO TOTO</td><td>Function :</td>";
String [] garray = str.split(">|</td>");
for(int i = 1;i < garray.length;i+=2){
System.out.println(garray[i]);
}
}
}
Output :: HELLO TOTO
Function :
I am just using split function to delimit at given substrings .Regex is slow and often confuse.
cheers happy coding...
I have a long string which contains ~" & "~ etc.
but when I am trying to write it in xml the o/p is: ~"
please suggest a way to write the complete string including "(double quotes)
Below are my code:
for(String str:Parser.queryList){
Element query = doc.createElement("query");
view.appendChild(screen);
screen.appendChild(query);
query.setAttribute("query", str);
}
output: OP =~"=~"
I have an object which I write to an xml. The xml has escape characters like "&", "<" etc. Now before I process this xml I want a utility to escape these special characters so that the resultant xml has & followed by "amp;" for "&" and "&" followed by "lt;" for "<". I tried StringUtils, XMLWriter and few more but they convert the "<" in opening and closing tags as well which I dont want. I only want "<" in the attribute values to be replaced. Please help.
Example;
I have the input xml as this
<?xml version="1.0" encoding="UTF-8"?>
<personName><firstName>Sam & Pat </firstName>
<sal> > than 10000 </sal>
</personName>
And the expected xml should be `
<?xml version="1.0" encoding="UTF-8"?>
<personName><firstName>Sam & Pat </firstName>
<sal> < than 10000 </sal>
</personName>
If I am using StringUtils, it converts all the "<" characters like this
<sal> < than 10000 </sal>
EDIT: I can't actually use JaxB. I am using FreeMarkerTemplate to do this. Here is the code .
File tempFile = File.createTempFile(fileName, ".tmp");
try (FileWriter writer = new FileWriter(tempFile)) {
freeMarkerConfig.setOutputEncoding(UTF_8);
Template template = freeMarkerConfig.getTemplate(templateName);
template.process(data, writer);
} `
The resultant file which get created should have the handled escape characters.
You can also use Apache Commons Lang Library for escaping the characters:
Example:
String escapeString1 = "Sam & Pat ";
System.out.println("Escaped : " + StringEscapeUtils.escapeXml11(escapeString1));
String escapeString2 = " > than 10000";
System.out.println("Escaped : " + StringEscapeUtils.escapeXml11(escapeString2));
Output:
Escaped : Sam & Pat
Escaped : > than 10000
You can use JAXB for the XML generation. Annotate your Model-Class with #XmlRootElement
Then you can use JAXB for marshalling the XML-Object:
try {
JAXBContext context = JAXBContext.newInstance(Person.class);
Marshaller m = context.createMarshaller();
m.setProperty(Marshaller.JAXB_FORMATTED_OUTPUT, true);
Person object = new Person();
object.setPersonName("Sam & Pat");
object.setSal("> than 10000");
m.marshal(object, System.out);
} catch (JAXBException e) {
e.printStackTrace();
}
The output will be
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<person>
<personName>Sam & Pat</personName>
<sal>> than 10000</sal>
</person>
Using CDATA will fix your problem.
Such as <![CDATA[abc]]>
You can include XML special characters in XPL. XPL has exactly the same structure as XML, but allows the special characters in text fields. http://hll.nu/
I want to take an XML file as input which contains the following:
<?xml version='1.0' encoding='utf-8' standalone='yes'>
<map>
<int name="count" value="10" />
</map>
and, read and change the value from 10 to any other integer value.
How can I do this in Android/Java. I'm new to Android and Java and all the tutorials available on the internet are way too complicated.
Thank You
You can change the value by matching the pattern and replacing the string as like below,
String xmlString = "<int name=\"count\" value=\"10\" />";
int newValue = 100;
Pattern pattern = Pattern.compile("(<int name=\"count\" value=\")([0-9]{0,})(\" />)");
Matcher matcher = pattern.matcher(xmlString);
while (matcher.find()) {
String match = matcher.group(2);
xmlString = xmlString.replace(match, String.valueOf(newValue));
}
System.out.println(xmlString);
You can find your answer here. It is like parsing json. You can cast your string(from file) to object and do anything with parameters
Here's my class I'm in the middle of writing:
public class MtomParser {
private static final String HEADER_REGEX = "^\\s*Content-ID:";
public boolean isMtom(String payload) {
return payload.contains("--uuid");
}
public String parseMtom(String mtomResponse) {
while (mtomResponse.matches(HEADER_REGEX)) {
System.out.println("header found");
}
return mtomResponse;
}
}
I'm expecting my input to make this code cause an infinite loop since it should find a match and there's no way to escape the loop. But, mtomResponse.matches(HEADER_REGEX) returns false every time and I'm not sure why. Here's the mtomResponse:
--uuid:b6bd1ef2-63e2-4d8d-8bac-eabbe7588373
Content-Type: application/xop+xml; charset=UTF-8; type="application/soap+xml";
Content-Transfer-Encoding: binary
Content-ID: <root.message#cxf.apache.org>
<soap:Envelope xmlns:soap="http://www.w3.org/2003/05/soap-envelope"><soap:Header/><soap:Body><RetrieveDocumentSetResponse xmlns="urn:ihe:iti:xds-b:2007" xmlns:ns10="http://docs.oasis-open.org/wsrf/bf-2" xmlns:ns11="http://docs.oasis-open.org/wsn/t-1" xmlns:ns12="urn:gov:hhs:fha:nhinc:common:subscriptionb2overridefordocuments" xmlns:ns13="http://nhinc.services.com/schema/auditmessage" xmlns:ns14="urn:oasis:names:tc:emergency:EDXL:DE:1.0" xmlns:ns15="http://www.hhs.gov/healthit/nhin/cdc" xmlns:ns16="urn:gov:hhs:fha:nhinc:common:subscriptionb2overrideforcdc" xmlns:ns2="urn:gov:hhs:fha:nhinc:common:nhinccommon" xmlns:ns3="urn:gov:hhs:fha:nhinc:common:nhinccommonentity" xmlns:ns4="urn:oasis:names:tc:ebxml-regrep:xsd:rim:3.0" xmlns:ns5="urn:oasis:names:tc:ebxml-regrep:xsd:rs:3.0" xmlns:ns6="urn:oasis:names:tc:ebxml-regrep:xsd:query:3.0" xmlns:ns7="urn:oasis:names:tc:ebxml-regrep:xsd:lcm:3.0" xmlns:ns8="http://docs.oasis-open.org/wsn/b-2" xmlns:ns9="http://www.w3.org/2005/08/addressing"><ns5:RegistryResponse status="urn:oasis:names:tc:ebxml-regrep:ResponseStatusType:Success"/><DocumentResponse><HomeCommunityId>urn:oid:422.422</HomeCommunityId><RepositoryUniqueId>422.422</RepositoryUniqueId><DocumentUniqueId>422.422^C4n2hv7z_5Ofa37W</DocumentUniqueId><mimeType>text/xml</mimeType><Document><xop:Include xmlns:xop="http://www.w3.org/2004/08/xop/include" href="cid:3511c0cc-5e20-46b7-8ae0-406c3b1ea95f-6#urn%3Aihe%3Aiti%3Axds-b%3A2007"/></Document></DocumentResponse></RetrieveDocumentSetResponse></soap:Body></soap:Envelope>
--uuid:b6bd1ef2-63e2-4d8d-8bac-eabbe7588373
Content-Type: text/xml
Content-Transfer-Encoding: binary
Content-ID: <3511c0cc-5e20-46b7-8ae0-406c3b1ea95f-6#urn:ihe:iti:xds-b:2007>
<?xml version="1.0" encoding="UTF-8"?>
<ClinicalDocument xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="urn:hl7-org:v3" xsi:schemaLocation="urn:hl7-org:v3 http://hit-testing.nist.gov:11080/hitspValidation/schema/cdar2c32/infrastructure/cda/C32_CDA.xsd">
<realmCode code="US"/>
<typeId root="2.16.840.1.113883.1.3" extension="POCD_HD000040"/>
In my IDE, if I search by the regex of ^\s*Content-ID:, it finds 2 results. So why doesn't this java code find any matches?
You need to enable MULTILINE mode, to allow ^ to match each line instead of the entire string.
Pattern pattern = Pattern.compile(yourRegex, Pattern.MULTILINE);
Matcher matcher = pattern.matcher(s);
while (matcher.find()) {
System.out.println(matcher.group());
}
See: http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html