Select a node using xpath and jdom - java

I have an xform document
<?xml version="1.0" encoding="UTF-8"?><h:html xmlns:h="http://www.w3.org/1999/xhtml" xmlns="http://www.w3.org/1999/xhtml"
xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:jr="http://openrosa.org/javarosa">
<h:head>
<h:title>Summary</h:title>
<model>
<instance>
<data vaultType="nsp_inspection.4.1">
<metadata vaultType="metadata.1.1">
<form_start_time type="dateTime" />
<form_end_time type="dateTime" />
<device_id type="string" />
<username type="string" />
</metadata>
<date type="date" />
<monitor type="string" />
</data>
</instance>
</model>
</h:head>
I would like to select the data element from the xform using xpath and jdom
XPath xpath = XPath.newInstance("h:html/h:head/h:title/");
seems to work fine and selects the title element but
XPath xpath = XPath.newInstance("h:html/h:head/model");
does not select the model element.
I guess it has something to do with the namespace.

A few things. You really should be using JDOM 2.0.x ... (2.0.5 is latest release). The XPath API in the 2.0.x versions is far better than the one in JDOM 1.x: see https://github.com/hunterhacker/jdom/wiki/JDOM2-Feature-XPath-Upgrade
#wds is right about not having the correct namespace for the xforms elements too.... and that is why you XPath is working, because it has the same namespace as the xhtml elements with the 'h' prefix. Your code is likely to be broken still.
Namespaces in XPaths often confuse people, because every namespace in an XPath has to have a prefix. Even if something is the default namespace in the XML (no prefix like your 'model' element), it has to have one in the XPath. queries with no prefix in the XPath always reference the 'no namespace' namespace.... (XPath specification: http://www.w3.org/TR/xpath/#node-tests )
A QName in the node test is expanded into an expanded-name using the namespace
declarations from the expression context. This is the same way expansion is done
for element type names in start and end-tags except that the default namespace
declared with xmlns is not used: if the QName does not have a prefix, then the
namespace URI is null (this is the same way attribute names are expanded). It is
an error if the QName has a prefix for which there is no namespace declaration in
the expression context
Assuming #wds is correct, and the namespace for the model element is supposed to be "http://www.w3.org/2002/xforms" then your namespace delcaration in your document should be xmlns="http://www.w3.org/2002/xforms". But, this namespace is the 'default' namespace, and the URI for the no-prefix namespace in your XPath query is "".
To access the http://www.w3.org/2002/xforms namespace in your XPath you have to give it a prefix fo the context of the XPath, let's say xpns (for xpath namespace). In JDOM 1.x you add that namespace with:
XPath xpath = XPath.newInstance("/h:html/h:head/xpns:model");
xpath.addNamespace(Namespace.getNamespace("xpns", "http://www.w3.org/2002/xforms");
Element model = (Element)xpath.selectSingleNode(mydoc)
Note how that adds the xpns to the query. Also, note that I have 'anchored' the h:/html reference to the '/' root of the document, which will improve the performance of the query evaluation.
IN JDOM 2.x, the XPath API is significanty better (even though in some cases it may seem overkill).
XPathFactory xpf = XPathFactory.instance();
XPathExpression<Element> xpath = xpf.compile("/h:html/h:head/xpns:model",
Filters.element(), null,
Namespace.getNamesace("xpns", "http://www.w3.org/2002/xforms"));
Element model = xpath.evaluateFirst(mydoc);
See more about the new XPath API in the JDOM 2.x javadoc: XPathFactory.compile(...) javadoc

Related

How get attribute content by another attribute value with XPath?

I have XML like:
<?xml version='1.0' encoding='UTF-8'?>
<ClinicalDocument xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance'
xmlns='urn:hl7-org:v3'
xmlns:ext='urn:hl7-RU-EHR:v1'
xsi:schemaLocation='urn:hl7-org:v3'>
<author>
<time value='20160809000000+0300'/>
<assignedAuthor>
<id root='1.2.643.5.1.13.3.25.1.1.100.1.1.70' extension='1'/>
<id root='1.2.643.100.3' extension='03480134121'/>
<id nullFlavor='NI'/>
</assignedAuthor>
</author>
</ClinicalDocument>
I have to get extension in id with root's value = 1.2.643.100.3.
I must use XPath 2.0.
I have tried:
*[name()='ClinicalDocument']/*[name()='author']/*[name()='assignedAuthor']/*[name()='id' and #id='1.2.643.100.3']/#extension. Not working
/*[name()='ClinicalDocument']/*[name()='author']/*[name()='assignedAuthor']/*[name()='id'][2]/#extension, but order of ids can
mixed. So that, I should retrieve by id's value
It's needed to me for retrieving value by Java's XPathExpression
First, bind namespace prefix, u: to urn:hl7-org:v3.
Then, this XPath,
//u:id[#root='1.2.643.100.3']/#extension
will return 03480134121, as requested.
If you are unable to bind a namespace prefix, you can instead use this XPath,
//*[local-name() ='id' and #root='1.2.643.100.3']/#extension
which will also return 03480134121, as requested.
Correct XPath: /*[name()='ClinicalDocument']/*[name()='author']/*[name()='assignedAuthor']/*[local-name()='id' and #root='1.2.643.100.3']/#extension

JDOM2 xpath finding nodes within a different namespace

I'm attempting to use JDOM2 in order to extract the information I care about out of a XML document. How do I get a tag within a tag?
I have been only partially successful. While I have been able to use xpath to extract <record> tags, the xpath query to extract the title, description and other data with in the record tags has been returning null.
I've been using Xpath successfully to extract <record> tags out of the document. To do this I use the follwing xpath query: "//oai:record" where the "oai" namespace is a namespace I made up in order to use xpath.
You can see the XML document I'm parsing here, and I've put a sample below: http://memory.loc.gov/cgi-bin/oai2_0?verb=ListRecords&set=cwp&metadataPrefix=oai_dc
<record>
<header>
<identifier>oai:lcoa1.loc.gov:loc.pnp/cph.3a02293</identifier>
<datestamp>2009-05-27T07:22:37Z</datestamp>
<setSpec>cwp</setSpec>
<setSpec>lcphotos</setSpec>
</header>
<metadata>
<oai_dc:dc xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
<dc:title>Jubal A. Early</dc:title>
<dc:description>This record contains unverified, old data from caption card.</dc:description>
<dc:date>[between 1860 and 1880]</dc:date>
<dc:type>image</dc:type>
<dc:type>still image</dc:type>
<dc:identifier>http://hdl.loc.gov/loc.pnp/cph.3a02293</dc:identifier>
<dc:language>eng</dc:language>
<dc:rights>No known restrictions on publication.</dc:rights>
</oai_dc:dc>
</metadata>
</record>
If you look in the larger document you will see that there is never a "xmlns" attribute listed on any of the tags. There is also the matter of there being three different namespaces in the document ("none/oai", "oai_dc", "dc").
What is happening is that the xpath is matching nothing, and evaluateFirst(parent) is returning null.
Here is some of my code to extract the title, date, description etc. out of the record element.
XPathFactory xpf = XPathFactory.instance();
XPathExpression<Element> xpath = xpf.compile("//dc:title",
Filters.element(), null,
namespaceList.toArray(new Namespace[namespaceList.size()]));
Element tag = xpath.evaluateFirst(parent);
if(tag != null)
{
return Option.fromString(tag.getText());
}
return Option.none();
Any thoughts would be appreciated! Thanks.
In your XML, dc prefix mapped to the namespace uri http://purl.org/dc/elements/1.1/, so make sure you declared the namespace prefix mapping to be used in the XPath accordingly. This is part where the namespace prefix declare in your XML :
<oai_dc:dc
xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/
http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
XML parser only see the namespace explicitly declared in the XML, it won't try to open the namespace URL since namespace is not necessarily a URL. For example, the following URI which I found in this recent SO question is also acceptable for namespace : uuid:ebfd9-45-48-a9eb-42d

How to use xpath in camel when the outermost element has an xmlns attribute?

I am having some trouble using xpath to extract the "Payload" values below using apache-camel. I use the below xpath in my route for both of the example xml, the first example xml returns SomeElement and SomeOtherElement as expected, but the second xml seems unable to parse the xml at all.
xpath("//Payload/*")
This example xml parses just fine.
<Message>
<Payload>
<SomeElement />
<SomeOtherElement />
</Payload>
</Message>
This example xml does not parse.
<Message xmlns="http://www.fake.com/Message/1">
<Payload>
<SomeElement />
<SomeOtherElement />
</Payload>
</Message>
I found a similar question about xml and xpath, but it deals with C# and is not a camel solution.
Any idea how to solve this using apache-camel?
Your 2nd example xml, specifies a default namespace: xmlns="http://www.fake.com/Message/1" and so your xpath expression will not match, as it specifies no namespace.
See http://camel.apache.org/xpath.html#XPath-Namespaces on how to specify a namespace.
You would need something like
Namespaces ns = new Namespaces("fk", "http://www.fake.com/Message/1");
xpath("//fk:Payload/*", ns)
I'm not familiar with Apache-Camel, this was just a result of some quick googling.
An alternative maybe to just change your xPath to something like
xpath("//*[local-name()='Payload']/*)
Good luck.

Prevent JDOM2 from creating xmlns=""

I try to add new <class> elements to a persistence.xml file with JDOM2.
persistenceUnitEl.add(new Element("class").addContent(className));
The problem is that jdom2 always adds xmlns="" to the <class> elements.
How can i prevent this?
removeAttribute("xmlns") does not work and removeNameSpace(el.getNameSpace()) also does not work.
JDOM only adds the xmlns="" if you add child elements to other elements that are already in a namespace. The default Namespace in XML is the one which has no prefix. In the following example:
<root>
<child />
</root>
There are no namespace prefixes, and the default namespace is "".
The above XML snippet is semantically identical to:
<root xmlns="" >
<child />
</root>
The xmlns="" means that, any time you see an element that has no prefix, that you should put it in the 'empty' namespace "".
Now, if you want to put things in a namespace, and have a prefix, you would do:
<ns:root xmlns:ns="http://mynamespace">
<ns:child />
</ns:root>
Note that the root and child elements in the above example are in the namespace http://mynamespace, and that namespace has the prefix ns. The above code would be semantically identical to (has the same meaning as):
<root xmlns="http://mynamespace">
<child />
</root>
In the above example, the default namespace is changed from "" to be http://mynamespace, so now elements that have no prefix are in that default namespace http://mynamespace. To reiterate, the following two documents are identical:
<ns:root xmlns:ns="http://mynamespace">
<ns:child />
</ns:root>
and
<root xmlns="http://mynamespace">
<child />
</root>
Now, what does all of this have to do with your problem?
Well, your element persistenceUnitEl must be in a default namespace that is not "". Somewhere on that element, or on of it's parents, you have something like:
<tagname xmlns="...something....">
<PersistenceUnit>
</PersistenceUnit>
</tagname>
In the above, the PersistenceUnit is in the namespace ...something..... Now, you are asking JDOM to add the element new Element("class") to the document, so you are getting:
<tagname xmlns="...something....">
<PersistenceUnit>
<class xmlns="" />
</PersistenceUnit>
</tagname>
The reason is because you are telling JDOM to put it in the "" namespace (Namespace.NO_NAMESPACE). See the documentation for JDOM here: new Element(String name).
instead, what you want to do, is put it in the same namespace as the parent:
Namespace parentNamespace = persistenceUnitEl.getNamespace();
persistenceUnitEl.add(new Element("class", parentNamespace).addContent(className));
Now, the real question is whether the "class" element actually belongs in the same namespace as the parent, or not. But that is a question only you can answer.
Resources:
Namespace specification
Decent introduction
A tutorial (quite advanced)
JDOM's NamespaceAware documentation
JDOM's FAQ
From my understanding, I think this is what you want.
<RootTagname xmlns="...some namespace....">
<SubTag>
<NewElement yourAttrib="1"/>
</SubTag>
</RootTagname >
This is what you get.
<RootTagname xmlns="...some namespace....">
<SubTag>
<NewElement xmlns="" yourAttrib="1"/>
</SubTag>
</RootTagname >
Use the below snippet to create the new Element
Element newElement = new Element("NewElement", subElement.getNamespace());
Here is the full code.
Namespace namespace = Namespace.getNamespace("prefix", ".....some namespace....");
XPathBuilder<Element> subTagXpathelementBuilder = new XPathBuilder<Element>("//prefix:SubTag", Filters.element());
subTagXpathelementBuilder.setNamespace(namespace);
XPathFactory xpathFactory = XPathFactory.instance();
Document doc = (Document) builder.build(xmlFile);
XPathExpression<Element> xpath = subTagXpathelementBuilder .compileWith(xpathFactory);
List<Element> subElementsList = xpath.evaluate(doc);
for (Element subElement : subElementsList ) {
Element newElement = new Element("NewElement", subElement.getNamespace());
List<Attribute> newElementAttribList = newElement.getAttributes();
newElementAttribList .add(new Attribute("yourAttrib", "1"));
subElement .addContent(newElement);
}

Applying xpath on xml with default namespace with XOM

I have below XML which contains a default namespace
<?xml version="1.0"?>
<catalog xmlns="http://www.edankert.com/examples/">
<cd>
<artist>Stoat</artist>
<title>Future come and get me</title>
</cd>
<cd>
<artist>Sufjan Stevens</artist>
<title>Illinois</title>
</cd>
<cd>
<artist>The White Stripes</artist>
<title>Get behind me satan</title>
</cd>
</catalog>
And Im running following code expecting some result in return
Element rootElem = new Builder().build(xml).getRootElement();
xc = XPathContext.makeNamespaceContext(rootElem);
xc.addNamespace("", "http://www.edankert.com/examples/");
Nodes matchedNodes = rootElem.query("cd/artist", xc);
System.out.println(matchedNodes.size());
But the size is always 0.
I gone through
https://stackoverflow.com/a/9674145/1160106 [I really didnt get the weired xpath syntax]
http://www.edankert.com/defaultnamespaces.html#Jaxen_and_XOM [Can see some hope. Just requires a major change in my current implementation]
Looking forward for any help.
Unprefixed names in XPath always mean "no namespace" - they don't respect the default namespace declaration. You need to use a prefix
Element rootElem = new Builder().build(xml).getRootElement();
xc = XPathContext.makeNamespaceContext(rootElem);
xc.addNamespace("ex", "http://www.edankert.com/examples/");
Nodes matchedNodes = rootElem.query("ex:cd/ex:artist", xc);
System.out.println(matchedNodes.size());
It doesn't matter that the XPath expression uses a prefix where the original document didn't, as long as the namespace URI that is bound to the prefix in the XPath namespace context is the same as the URI that is bound by xmlns in the document.

Categories