How to use Java XPath with KML files and namespaces on Android - java

I'm struggling with how to use XPath on KML files that contain the new gx:Track and gx:coord tags. The problem is with how to use XPath with namespaces under Android.
I've looked at a number of examples, including these
https://www.ibm.com/developerworks/library/x-nmspccontext/index.html
https://howtodoinjava.com/xml/xpath-namespace-resolution-example/
XPath with namespace in Java
NamespaceContext and using namespaces with XPath
but I can't seem to get even those examples to work.
The following code and output illustrates my problem:
public App() {
super();
try {
test( testDoc1() );
test( testDoc2() );
} catch( Exception e ) {
e.printStackTrace();
} finally {
Log.d( "TEST-FINISHED", "test is finished" );
}
}
private String toXmlString( Document document ) throws TransformerException {
DOMSource domSource = new DOMSource( document );
StringWriter writer = new StringWriter();
StreamResult result = new StreamResult( writer );
TransformerFactory tf = TransformerFactory.newInstance();
Transformer transformer = tf.newTransformer();
transformer.transform( domSource, result );
return writer.toString();
}
private Document testDoc1() throws ParserConfigurationException {
DocumentBuilderFactory documentBuilderFactory = DocumentBuilderFactory.newInstance();
documentBuilderFactory.setNamespaceAware( true );
Document mDocument = documentBuilderFactory.newDocumentBuilder().newDocument();
String XMLNS_NAMESPACE_URI = "http://www.w3.org/2000/xmlns/";
Element mKmlElement = mDocument.createElement( "kml" );
mKmlElement.setAttributeNS( XMLNS_NAMESPACE_URI, "xmlns", "http://www.opengis.net/kml/2.2" );
mKmlElement.setAttributeNS( XMLNS_NAMESPACE_URI, "xmlns:gx", "http://www.google.com/kml/ext/2.2" );
mDocument.appendChild( mKmlElement );
Element mPlacemarkElement = mDocument.createElement( "Placemark" );
mKmlElement.appendChild( mPlacemarkElement );
Element gxTrackElement = mDocument.createElement( "gx:Track" );
mPlacemarkElement.appendChild( gxTrackElement );
Element gxCoordElement = mDocument.createElement( "gx:coord" );
gxCoordElement.setTextContent( "-122.207881 37.371915 156.000000" );
gxTrackElement.appendChild( gxCoordElement );
return mDocument;
}
private Document testDoc2() throws ParserConfigurationException, IOException, SAXException {
String kmlString = "<?xml version=\"1.0\" encoding=\"UTF-8\"?><kml xmlns=\"http://www.opengis.net/kml/2.2\" xmlns:gx=\"http://www.google.com/kml/ext/2.2\"><Placemark><gx:Track><gx:coord>-122.207881 37.371915 156.000000</gx:coord></gx:Track></Placemark></kml>";
DocumentBuilderFactory documentBuilderFactory = DocumentBuilderFactory.newInstance();
documentBuilderFactory.setNamespaceAware( true );
Document mDocument = documentBuilderFactory.newDocumentBuilder().parse( new InputSource( new StringReader( kmlString ) ) );
return mDocument;
}
private void test( Document mDocument ) throws Exception {
String xml = toXmlString( mDocument );
Log.d( "TEST-XML", xml );
XPath xPath = XPathFactory.newInstance().newXPath();
xPath.setNamespaceContext( new NamespaceContext() {
#Override
public String getNamespaceURI( String prefix ) {
switch( prefix ) {
case XMLConstants.DEFAULT_NS_PREFIX:
return "http://www.opengis.net/kml/2.2";
case "gx":
return "http://www.google.com/kml/ext/2.2";
}
return XMLConstants.NULL_NS_URI;
}
#Override
public String getPrefix( String namespaceURI ) {
return null;
}
#Override
public Iterator getPrefixes( String namespaceURI ) {
return null;
}
} );
NodeList result1 = (NodeList) xPath.evaluate( "/kml", mDocument, XPathConstants.NODESET );
Log.d( "TEST-RESULT1", String.valueOf( result1.getLength() ) );
NodeList result2 = (NodeList) xPath.evaluate( "/kml/Placemark", mDocument, XPathConstants.NODESET );
Log.d( "TEST-RESULT2", String.valueOf( result2.getLength() ) );
NodeList result3 = (NodeList) xPath.evaluate( "/kml/Placemark/gx:Track", mDocument, XPathConstants.NODESET );
Log.d( "TEST-RESULT3", String.valueOf( result3.getLength() ) );
}
The test() method executes 3 XPath statements/patterns and is called once for each of two test documents. The 2 documents are constructed using different methods but the contents should be identical. However, the results I get from the 3 XPath statements are different.
These are the results with document 1:
2018-11-17 17:51:28.289 22837-22837/ca.csdesigninc.offroadtracker D/TEST-XML: <?xml version="1.0" encoding="UTF-8"?><kml xmlns="http://www.opengis.net/kml/2.2" xmlns:gx="http://www.google.com/kml/ext/2.2"><Placemark><gx:Track><gx:coord>-122.207881 37.371915 156.000000</gx:coord></gx:Track></Placemark></kml>
2018-11-17 17:51:28.324 22837-22837/ca.csdesigninc.offroadtracker D/TEST-RESULT1: 1
2018-11-17 17:51:28.334 22837-22837/ca.csdesigninc.offroadtracker D/TEST-RESULT2: 1
2018-11-17 17:51:28.343 22837-22837/ca.csdesigninc.offroadtracker D/TEST-RESULT3: 0
and these are the results with document 2:
2018-11-17 17:51:28.348 22837-22837/ca.csdesigninc.offroadtracker D/TEST-XML: <?xml version="1.0" encoding="UTF-8"?><kml xmlns="http://www.opengis.net/kml/2.2" xmlns:gx="http://www.google.com/kml/ext/2.2"><Placemark><gx:Track><gx:coord>-122.207881 37.371915 156.000000</gx:coord></gx:Track></Placemark></kml>
2018-11-17 17:51:28.358 22837-22837/ca.csdesigninc.offroadtracker D/TEST-RESULT1: 0
2018-11-17 17:51:28.363 22837-22837/ca.csdesigninc.offroadtracker D/TEST-RESULT2: 0
2018-11-17 17:51:28.372 22837-22837/ca.csdesigninc.offroadtracker D/TEST-RESULT3: 0
There are at least 2 problems:
since the 2 documents are identical (I think), why are the results of the tests different? (i.e., the first 2 XPath statements succeed with document 1 but neither succeeds with document 2.)
and why does the 3rd XPath statement fail to find the gx:Track element in both document 1 and document 2?
UPDATE: This problem seems to have something to do with having
xmlns="http://www.opengis.net/kml/2.2"
included in document 2. If I remove it, the results of the first 2 XPath tests are the correct (for both documents) - and in fact XPath test 3 now works on document 2. Unfortunately, I still don't have a handle on this behavior.
I'm probably missing something obvious and would appreciate any help.

The differences are due to namespaces. Both in how the XML is being produced, and when you are selecting content in the XPath.
Unfortunately, it is difficult to see the difference because the XML that happens to be serialized by the toXmlString() for testDoc1() doesn't exactly match the state of the in-memory document.
When you construct the kml element, using createElement() it creates an element that is bound to the "no namespace". Then, you added namespace attributes, which happen to come out when serializing with toXmlString() and make the kml element appear to be in the http://www.opengis.net/kml/2.2 namespace.
If you were to marshal that XML back to a new Document object, the kml element would be bound to that namespace. However, the current in-memory object for that element is not.
You can observe this by adding some additional diagnostics println messages:
NodeList result1 = (NodeList) xPath.evaluate("/kml", mDocument, XPathConstants.NODESET);
System.out.println(String.valueOf(result1.getLength()));
System.out.println("Namespace URI: " + result1.item(0).getNamespaceURI());
System.out.println("Prefix: " + result1.item(0).getPrefix());
You can round-trip your XML and observe that it behaves different when you marshall the serialized XML:
private void test(Document mDocument) throws Exception {
String xml = toXmlString(mDocument);
System.out.println( xml);
DocumentBuilderFactory documentBuilderFactory = DocumentBuilderFactory.newInstance();
documentBuilderFactory.setNamespaceAware(true);
mDocument = documentBuilderFactory.newDocumentBuilder().parse(new InputSource(new StringReader(xml)));
However, that's cheating. What you really want to do is ensure that the elements are created properly in the first place. When you create an element that you want to be bound to a namespace, use the createElementNS() method, as indicated in the JavaDoc comments for createElement():
To create an element with a qualified name and namespace URI, use the createElementNS method.
So, to create an element that is bound to the http://www.opengis.net/kml/2.2 namespace, you would want to use:
Element mKmlElement = mDocument.createElementNS("http://www.opengis.net/kml/2.2", "kml");
and:
Element mKmlElement = mDocument.createElementNS("http://www.opengis.net/kml/2.2", "Placemark");
and the same goes for the gx:Track element:
Element gxTrackElement = mDocument.createElementNS("http://www.google.com/kml/ext/2.2","gx:Track");
Once you get your Document objects truly equal and correct, you then need to adjust your XPath.
With XPath, if you don't apply a namespace prefix, it will select elements bound to the "no namespace". So, /kml will only select kml elements that are not bound to a namespace. But since your kml elements are bound to the http://www.opengis.net/kml/2.2 namespace, it won't select them.
In your override of the getNamespaceURI() function, you could reserve gx for the Google KML Extension namespace, and then default any other namespace-prefix to resolve to http://www.opengis.net/kml/2.2:
#Override
public String getNamespaceURI(String prefix) {
return "gx".equals(prefix) ? "http://www.google.com/kml/ext/2.2" : "http://www.opengis.net/kml/2.2";
}
Then, adjust your XPath statements to use a prefix for those KML elements. If you use the above code, it doesn't matter what prefix you use. Anything other than gx will return the http://www.opengis.net/kml/2.2 namespace.
NodeList result1 = (NodeList) xPath.evaluate("/k:kml", mDocument, XPathConstants.NODESET);
System.out.println(String.valueOf(result1.getLength()));
System.out.println("Namespace URI: " + result1.item(0).getNamespaceURI());
System.out.println("Prefix: " + result1.item(0).getPrefix());
NodeList result2 = (NodeList) xPath.evaluate("/k:kml/k:Placemark", mDocument, XPathConstants.NODESET);
System.out.println( String.valueOf(result2.getLength()));
NodeList result3 = (NodeList) xPath.evaluate("/k:kml/k:Placemark/gx:Track", mDocument, XPathConstants.NODESET);
System.out.println(String.valueOf(result3.getLength()));
Putting it all together:
public App() {
super();
try {
test( testDoc1() );
test( testDoc2() );
} catch( Exception e ) {
e.printStackTrace();
} finally {
Log.d( "TEST-FINISHED", "test is finished" );
}
}
private String toXmlString(Document document) throws TransformerException {
DOMSource domSource = new DOMSource(document);
StringWriter writer = new StringWriter();
StreamResult result = new StreamResult(writer);
TransformerFactory tf = TransformerFactory.newInstance();
Transformer transformer = tf.newTransformer();
transformer.transform(domSource, result);
return writer.toString();
}
private Document testDoc1() throws ParserConfigurationException {
DocumentBuilderFactory documentBuilderFactory = DocumentBuilderFactory.newInstance();
documentBuilderFactory.setNamespaceAware(true);
Document mDocument = documentBuilderFactory.newDocumentBuilder().newDocument();
String XMLNS_NAMESPACE_URI = "http://www.w3.org/2000/xmlns/";
//Element mKmlElement = mDocument.createElement("kml");
Element mKmlElement = mDocument.createElementNS("http://www.opengis.net/kml/2.2", "kml");
//mKmlElement.setAttributeNS(XMLNS_NAMESPACE_URI, "xmlns", "http://www.opengis.net/kml/2.2");
mKmlElement.setAttributeNS(XMLNS_NAMESPACE_URI, "xmlns:gx", "http://www.google.com/kml/ext/2.2");
mDocument.appendChild(mKmlElement);
//Element mPlacemarkElement = mDocument.createElement("Placemark");
Element mPlacemarkElement = mDocument.createElementNS("http://www.opengis.net/kml/2.2", "Placemark");
//mPlacemarkElement.setAttributeNS(XMLNS_NAMESPACE_URI, "xmlns", "http://www.opengis.net/kml/2.2");
mKmlElement.appendChild(mPlacemarkElement);
//Element gxTrackElement = mDocument.createElement("gx:Track");
Element gxTrackElement = mDocument.createElementNS("http://www.google.com/kml/ext/2.2","gx:Track");
mPlacemarkElement.appendChild(gxTrackElement);
//Element gxCoordElement = mDocument.createElement("gx:coord");
Element gxCoordElement = mDocument.createElementNS("http://www.google.com/kml/ext/2.2", "gx:coord");
gxCoordElement.setTextContent("-122.207881 37.371915 156.000000");
gxTrackElement.appendChild(gxCoordElement);
return mDocument;
}
private Document testDoc2() throws ParserConfigurationException, IOException, SAXException {
String kmlString = "<?xml version=\"1.0\" encoding=\"UTF-8\"?><kml xmlns=\"http://www.opengis.net/kml/2.2\" xmlns:gx=\"http://www.google.com/kml/ext/2.2\"><Placemark><gx:Track><gx:coord>-122.207881 37.371915 156.000000</gx:coord></gx:Track></Placemark></kml>";
DocumentBuilderFactory documentBuilderFactory = DocumentBuilderFactory.newInstance();
documentBuilderFactory.setNamespaceAware(true);
Document mDocument = documentBuilderFactory.newDocumentBuilder().parse(new
InputSource(new StringReader(kmlString)));
return mDocument;
}
private void test(Document mDocument) throws Exception {
String xml = toXmlString(mDocument);
System.out.println( xml);
XPath xPath = XPathFactory.newInstance().newXPath();
xPath.setNamespaceContext(new NamespaceContext() {
#Override
public String getNamespaceURI(String prefix) {
return "gx".equals(prefix) ? "http://www.google.com/kml/ext/2.2" : "http://www.opengis.net/kml/2.2";
}
#Override
public String getPrefix(String namespaceURI) {
if ("http://www.google.com/kml/ext/2.2".equals(namespaceURI)) {
return "gx";
}
return null;
}
#Override
public Iterator getPrefixes(String namespaceURI) {
List<String> ns = new ArrayList<>();
ns.add("gx");
return ns.iterator();
}
});
NodeList result1 = (NodeList) xPath.evaluate("/k:kml", mDocument, XPathConstants.NODESET);
System.out.println(String.valueOf(result1.getLength()));
System.out.println("Namespace URI: " + result1.item(0).getNamespaceURI());
System.out.println("Prefix: " + result1.item(0).getPrefix());
NodeList result2 = (NodeList) xPath.evaluate("/k:kml/k:Placemark", mDocument, XPathConstants.NODESET);
System.out.println( String.valueOf(result2.getLength()));
NodeList result3 = (NodeList) xPath.evaluate("/k:kml/k:Placemark/gx:Track", mDocument, XPathConstants.NODESET);
System.out.println(String.valueOf(result3.getLength()));
}

Related

how to parse xml to java in nodelist

that is my xml
<?xml version = "1.0" encoding = "UTF-8"?>
<ns0:GetADSLProfileResponse xmlns:ns0 = "http://">
<ns0:Result>
<ns0:eCode>0</ns0:eCode>
<ns0:eDesc>Success</ns0:eDesc>
</ns0:Result>
</ns0:GetADSLProfileResponse>
that is my code in java I need to know how to start in this
I tried some code online but still did not solve my problem
how to get the values in the result to loop in it and get 0 in ecode and Success in eDesc
CustomerProfileResult pojo = new CustomerProfileResult();
String body = readfile();
System.out.println(body);
try {
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document dom = db.parse(new InputSource(new StringReader(body)));
XPath xpath =XPathFactory.newInstance().newXPath();
XPathExpression name = xpath.compile("/xml/GetADSLProfileResponse/Result");
NodeList nodeName = (NodeList) name.evaluate(dom, XPathConstants.NODESET);
if(nodeName!=null){
}
Summary
You can try to following expression which allows you to select nodes without caring the namespace ns0:
/*[local-name()='GetADSLProfileResponse']/*[local-name()='Result']/*
Explanation
In your syntax, several parts were incorrect. Let's take a look together. XPath syntax /xml means that the root node of the document is <xml>, but the root element is <ns0:GetADSLProfileResponse>; GetADSLProfileResponse is incorrect too, because your XML file contains a namespace. Same for Result:
/xml/GetADSLProfileResponse/Result
In my solution, I ignored the namespace, because your namespace provided is incomplet. Here's a full program to get started:
String XML =
"<?xml version = \"1.0\" encoding = \"UTF-8\"?>\n"
+ "<ns0:GetADSLProfileResponse xmlns:ns0 = \"http://\">\n"
+ " <ns0:Result>\n"
+ " <ns0:eCode>0</ns0:eCode>\n"
+ " <ns0:eDesc>Success</ns0:eDesc>\n"
+ " </ns0:Result>\n"
+ "</ns0:GetADSLProfileResponse> ";
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document document;
try (InputStream in = new ByteArrayInputStream(XML.getBytes(StandardCharsets.UTF_8))) {
document = builder.parse(in);
}
XPath xPath = XPathFactory.newInstance().newXPath();
XPathExpression expr = xPath.compile("/*[local-name()='GetADSLProfileResponse']/*[local-name()='Result']/*");
NodeList nodeList = (NodeList) expr.evaluate(document, XPathConstants.NODESET);
for (int i = 0; i < nodeList.getLength(); i++) {
Node node = nodeList.item(i);
System.out.println(node.getNodeName() + ": " + node.getTextContent());
}
It prints:
ns0:eCode: 0
ns0:eDesc: Success
See also:
How to query XML using namespaces in Java with XPath?
Node (Java Platform SE 8)

Parsing a SOAP response using XPath Java

I am new to XPath. I have the following SOAP response:
<?xml version="1.0" encoding="utf-8"?>
<soap:Envelope xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
<soap:Body>
<addParentResponse xmlns="urn:JadeWebServices/NetsuiteCustomer/">
<addParentResult>Organisation xxxxx already exists - use UpdateParent method instead</addParentResult>
</addParentResponse>
</soap:Body>
</soap:Envelope>
Can anyone kindly give me some code which will read the value of "addParentResult"?
Regards,
Anirban.
The following xpath should give the desired result :
/soap:Envelope/soap:Body/parentns:addParentResponse/parentns:addParentResult/text()
The reason I added parentns to xpath is that your xml has namespaces and your xpath processor should know about them. But the addParentResponse has no prefix and has default namespace. In this case add a prefix in xpath expression and before doing that tell xpath processor that for the parentns prefix there is a value which is "urn:JadeWebServices/NetsuiteCustomer/". It is done via a NamespaceContext.
Also be sure to tell the DocumentBuilderFactory that it should be aware of namespaces by using setNamespaceAware( true );
Code in Java would be :
try
{
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setNamespaceAware(true);
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse( new File( "soapResponse.xml" ) );
XPathFactory xPathFactory = XPathFactory.newInstance();
XPath xpath = xPathFactory.newXPath();
javax.xml.namespace.NamespaceContext ns = new javax.xml.namespace.NamespaceContext()
{
#Override
public String getNamespaceURI(String prefix)
{
if ( "soap".equals( prefix ) )
{
return "http://schemas.xmlsoap.org/soap/envelope/";
}
else if ( "xsi".equals( prefix ) )
{
return "http://www.w3.org/2001/XMLSchema-instance";
}
else if ( "xsd".equals( prefix ) )
{
return "http://www.w3.org/2001/XMLSchema";
}
else if ( "xml".equals( prefix ) )
{
return javax.xml.XMLConstants.XML_NS_URI;
}
else if ( "parentns".equals( prefix ) )
{
return "urn:JadeWebServices/NetsuiteCustomer/";
}
return javax.xml.XMLConstants.NULL_NS_URI;
}
#Override
public String getPrefix(String namespaceURI)
{
return null;
}
#Override
public Iterator<?> getPrefixes(String namespaceURI)
{
return null;
}
};
xpath.setNamespaceContext(ns);
XPathExpression expr = xpath.compile( "/soap:Envelope/soap:Body/parentns:addParentResponse/parentns:addParentResult/text()" );
Object exprEval = expr.evaluate( doc, XPathConstants.STRING );
if ( exprEval != null )
{
System.out.println( "The text of addParentResult is : " + exprEval );
}
}
catch ( Exception e )
{
e.printStackTrace();
}
}
To test this code, put your xml in a file called soapResponse.xml at the same level as your java file.
Output from System.out.println() is :
The text of addParentResult is : Organisation xxxxx already exists - use UpdateParent method instead

How to get xml attribute values using Document builder factory

How to get attribute values by using the following code i am getting ; as output for msg . I want to print MSID,type,CHID,SPOS,type,PPOS values can any one solve this issue .
String xml1="<message MSID='20' type='2635'>"
+"<che CHID='501' SPOS='2'>"
+"<pds type='S'>"
+"<position PPOS='S01'/>"
+"</pds>"
+"</che>"
+"</message>";
InputSource source = new InputSource(new StringReader(xml1));
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document document = db.parse(source);
XPathFactory xpathFactory = XPathFactory.newInstance();
XPath xpath = xpathFactory.newXPath();
String msg = xpath.evaluate("/message/che/CHID", document);
String status = xpath.evaluate("/pds/position/PPOS", document);
System.out.println("msg=" + msg + ";" + "status=" + status);
You need to use # in your XPath for an attribute, and also your path specifier for the second element is wrong:
String msg = xpath.evaluate("/message/che/#CHID", document);
String status = xpath.evaluate("/message/che/pds/position/#PPOS", document);
With those changes, I get an output of:
msg=501;status=S01
You can use Document.getDocumentElement() to get the root element and Element.getElementsByTagName() to get child elements:
Document document = db.parse(source);
Element docEl = document.getDocumentElement(); // This is <message>
String msid = docEl.getAttribute("MSID");
String type = docEl.getAttribute("type");
Element position = (Element) docEl.getElementsByTagName("position").item(0);
String ppos = position.getAttribute("PPOS");
System.out.println(msid); // Prints "20"
System.out.println(type); // Prints "2635"
System.out.println(ppos); // Prints "S01"

how to get attribute of given node?

I am trying to write DOM XML parsing.
My Xml file
<?xml version="1.0"?>
<BLAH>
<AgentNm type="citi1">
<accName>accName1</accName>
<accType>accType1</accType>
<someThing>someThing1</someThing>
<amt>100000</amt>
</AgentNm>
<AgentNm type="citi2">
<accName>accName2</accName>
<accType>accType2</accType>
<someThing>someThing2</someThing>
<amt>200000</amt>
</AgentNm>
</BLAH>
And i tried following java code
DocumentBuilderFactory docBuilderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder docBuilder = docBuilderFactory.newDocumentBuilder();
Document doc = docBuilder.parse (new File("c:\\file.xml"));
// normalize text representation
doc.getDocumentElement ().normalize ();
System.out.println ("Root element of the doc is " +doc.getDocumentElement().getNodeName());
NodeList agentNm = doc.getElementsByTagName("AgentNm");
int totalAgentNm = agentNm.getLength();
System.out.println("Total no of Agents : " + totalAgentNm);
for(int s=0; s<agentNm.getLength() ; s++){
Node firstPersonNode = agentNm.item(s);
if(firstPersonNode.getNodeType() == Node.ELEMENT_NODE){
Element firstPersonElement = (Element)firstPersonNode;
PrintNodeElem(firstPersonElement,"type");
}//end of if clause
}//end of for loop with s var
static void PrintNodeElem(Element nodeElem,String elem){
NodeList someThingList = nodeElem.getElementsByTagName(elem);
Element ageElement = (Element)someThingList.item(0);
NodeList textAgeList = ageElement.getChildNodes();
System.out.println(elem+" : " +((Node)textAgeList.item(0)).getNodeValue().trim());
}
But, when i tried to execute above method,
i am getting null pointer exception.
can any one explain me, how to fix this.
if you want an attribute of a given node, I would suggest XPath. It is much easier.
http://onjava.com/onjava/2005/01/12/xpath.html

How to remove elements of a page in htmlunit

Normally in PHP, I would just parse the old document and write to the new document while ignoring the unwanted elements.
This was the first solution I came up with:
DocumentBuilder builder = DocumentBuilderFactory
.newInstance()
.newDocumentBuilder();
StringReader reader = new StringReader( xml );
Document document = builder.parse( new InputSource(reader) );
XPathExpression expr = XPathFactory
.newInstance()
.newXPath()
.compile( ... );
Object result = expr.evaluate(document, XPathConstants.NODESET);
Element el = document.getDocumentElement();
NodeList nodes = (NodeList) result;
for (int i = 0; i < nodes.getLength(); i++) {
el.removeChild( nodes.item(i) );
}
As you can see it's kinda long. Being a coder who strives for simplicity, I decided to take Ahmed's advice hoping I'll find a better solution and I came up with this:
List<?> elements = page.getByXPath( ... );
DomNode node = null;
for( Object o : elements ) {
node = (DomNode)o;
node.getParentNode().removeChild( node );
}
Please note these are just snippets, I omitted the imports and the XPath expressions but you get the idea.
Have a look at the DOM methods, you can remove nodes.
http://htmlunit.sourceforge.net/apidocs/com/gargoylesoftware/htmlunit/html/DomNode.html

Categories