I am trying to get the value of the tag "fax" ( see sample XML below ) using XPath in java ...
I decided to try and get the nodes for "business" and step through the debugger to see if I could see the tags ...does not seem to work ...the code fragment I am using is:
String path =
"/locationDetailResponse/locationInfo/locationBusinessList/business"
XPath xPath = XPathFactory.newInstance().newXPath();
Element userElement = (Element) xPath.evaluate(path, documentObject,
XPathConstants.NODE);
documentObject contains an org.w3c.dom.Document object
<location>
<locationInfo>
<warehouseId>99</warehouseId>
<nearByLocations>
<location>
<name>Morganton, NC</name>
<url>morganton-nc-hvac</url>
</location>
<location>
<name>Statesville, NC</name>
<url>statesville-nc-plumbing</url>
</location>
</nearByLocations>
<locationBusinessList>
<business>
<id>123</id>
<fax>(800) 555-1212</fax>
</business>
<business>
<id>456</id>
<fax>(800) 666-2323</fax>
</business>
</locationBusinessList>
</locationInfo>
</location>
Any ideas on the proper XPath expression I should be using ?
You can try change / to // at beginning of line,
or use local-name:
//*[local-name()='location']/*[local-name()='locationInfo']/*[local-name()='locationBusinessList']/*[local-name()='business']
Related
I'm attempting to use JDOM2 in order to extract the information I care about out of a XML document. How do I get a tag within a tag?
I have been only partially successful. While I have been able to use xpath to extract <record> tags, the xpath query to extract the title, description and other data with in the record tags has been returning null.
I've been using Xpath successfully to extract <record> tags out of the document. To do this I use the follwing xpath query: "//oai:record" where the "oai" namespace is a namespace I made up in order to use xpath.
You can see the XML document I'm parsing here, and I've put a sample below: http://memory.loc.gov/cgi-bin/oai2_0?verb=ListRecords&set=cwp&metadataPrefix=oai_dc
<record>
<header>
<identifier>oai:lcoa1.loc.gov:loc.pnp/cph.3a02293</identifier>
<datestamp>2009-05-27T07:22:37Z</datestamp>
<setSpec>cwp</setSpec>
<setSpec>lcphotos</setSpec>
</header>
<metadata>
<oai_dc:dc xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
<dc:title>Jubal A. Early</dc:title>
<dc:description>This record contains unverified, old data from caption card.</dc:description>
<dc:date>[between 1860 and 1880]</dc:date>
<dc:type>image</dc:type>
<dc:type>still image</dc:type>
<dc:identifier>http://hdl.loc.gov/loc.pnp/cph.3a02293</dc:identifier>
<dc:language>eng</dc:language>
<dc:rights>No known restrictions on publication.</dc:rights>
</oai_dc:dc>
</metadata>
</record>
If you look in the larger document you will see that there is never a "xmlns" attribute listed on any of the tags. There is also the matter of there being three different namespaces in the document ("none/oai", "oai_dc", "dc").
What is happening is that the xpath is matching nothing, and evaluateFirst(parent) is returning null.
Here is some of my code to extract the title, date, description etc. out of the record element.
XPathFactory xpf = XPathFactory.instance();
XPathExpression<Element> xpath = xpf.compile("//dc:title",
Filters.element(), null,
namespaceList.toArray(new Namespace[namespaceList.size()]));
Element tag = xpath.evaluateFirst(parent);
if(tag != null)
{
return Option.fromString(tag.getText());
}
return Option.none();
Any thoughts would be appreciated! Thanks.
In your XML, dc prefix mapped to the namespace uri http://purl.org/dc/elements/1.1/, so make sure you declared the namespace prefix mapping to be used in the XPath accordingly. This is part where the namespace prefix declare in your XML :
<oai_dc:dc
xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/
http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
XML parser only see the namespace explicitly declared in the XML, it won't try to open the namespace URL since namespace is not necessarily a URL. For example, the following URI which I found in this recent SO question is also acceptable for namespace : uuid:ebfd9-45-48-a9eb-42d
I have an xml document looks like:
<xmlList>
<Phone>
<Prefix>04</Prefix>
</Phone>
<Phone>
<Prefix>04</Prefix>
</Phone>
<Phone>
<Prefix>03</Prefix>
</Phone>
</xmlList>
I would like to retrieve the Prefix node content onlt in case it is 04.
String xml = "<xmlList><Phone><Prefix>04</Prefix></Phone><Phone><Prefix>04</Prefix></Phone><Phone><Prefix>03</Prefix></Phone></xmlList>";
XPathFactory xpathFactory = XPathFactory.newInstance();
XPath xpath = xpathFactory.newXPath();
InputSource source = new InputSource(new StringReader(xml));
// only one string is returned
String prefix = xpath.evaluate("/xmlList/Phone/Prefix", source);
Only one string is retrieved from xpath.evaluate.
I would like to get a list with all of the 04 occurences in given XML.
Possible?
As you can see in the documentation https://docs.oracle.com/javase/7/docs/api/javax/xml/xpath/XPath.html#evaluate%28java.lang.String,%20org.xml.sax.InputSource%29, that overload of the evaluate method evaluates the XPath and returns the result as string. As with XPath 1.0 the string value of a set of nodes is the string value of the first node in the node set, you get a string with the contents of the first selected node.
So you will need to use a different overload where you can specify the result type as NODESET and then you can iterate over the returned NodeList to collect the values.
Or consider to switch to an XPath 2.0 or 3.0 or XQuery 1.0 or 3.0 implementation like Saxon 9 where there are then APIs to return a sequence of strings for e.g. /xmlList/Phone/Prefix/string(). You will need to use a different API however than the JAXP XPath API which is centered around XPath 1.0.
Dipping my toe in a little Java at the minute and have a question about XPath.
I have a large Xml and I want to use XPath to be able to grab a specific node and then fire further XPath calls against this small chunk of Xml.
Here s rough outline of my Xml:
<Page>
<ComponentPresentations>
<ComponentPresentation>
<Component>
<Title>
<ComponentTemplate>
<ComponentPresentation>
<Component>
<Title>
<ComponentTemplate>
My first XPath selects the <Component> node based upon the value of a <ComponenTemplate> Id value:
String componentExpFormat = "/Page/ComponentPresentations/ComponentPresentation/ComponentTemplate/Id[text()='%1$s']/ancestor::ComponentPresentation";
String componentExp = String.format(componentExpFormat, template);
XPathExpression expComponent = xPath.compile(componentExp);
Node componentXml = (Node) expComponent.evaluate(xmldoc, XPathConstants.NODE);
This gives me the <Component> I want but I can;t seem to be able to then XPath against the Node:
String componentExpTitle = "/Component/Fields/item/value/Field/Name[text()='title']/parent::node()/Values/string";
XPathExpression expTitle = xPath.compile(componentExpTitle);
String eventName = expTitle.evaluate(componentXml, XPathConstants.STRING).toString();
Without this I'll have to include the full XPath each time:
/Page/ComponentPresentations/ComponentPresentation/ComponentTemplate/Id[text()='%1$s']/ancestor::ComponentPresentation/Component/Fields/item/value/Field/Name[text()='title']/parent::node()/Values/string
Is that the only way?
Cheers
An XPath expression with a leading slash
/Component/Fields/item
is absolute, and when you evaluate it with a particular context node it will start looking from the root of the document that the context node belongs to. If you remove the leading slash
Component/Fields/item
it will look for Component children of the context node.
As an aside, you can simplify those XPaths quite a bit, you don't need all the up and down the tree stuff with ancestor::, and you also don't need to use text():
componentExpFormat = "/Page/ComponentPresentations/ComponentPresentation[ComponentTemplate/Id='%1$s']";
componentExpTitle = "Component/Fields/item/value/Field[Name='title']/Values/string";
I have an xform document
<?xml version="1.0" encoding="UTF-8"?><h:html xmlns:h="http://www.w3.org/1999/xhtml" xmlns="http://www.w3.org/1999/xhtml"
xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:jr="http://openrosa.org/javarosa">
<h:head>
<h:title>Summary</h:title>
<model>
<instance>
<data vaultType="nsp_inspection.4.1">
<metadata vaultType="metadata.1.1">
<form_start_time type="dateTime" />
<form_end_time type="dateTime" />
<device_id type="string" />
<username type="string" />
</metadata>
<date type="date" />
<monitor type="string" />
</data>
</instance>
</model>
</h:head>
I would like to select the data element from the xform using xpath and jdom
XPath xpath = XPath.newInstance("h:html/h:head/h:title/");
seems to work fine and selects the title element but
XPath xpath = XPath.newInstance("h:html/h:head/model");
does not select the model element.
I guess it has something to do with the namespace.
A few things. You really should be using JDOM 2.0.x ... (2.0.5 is latest release). The XPath API in the 2.0.x versions is far better than the one in JDOM 1.x: see https://github.com/hunterhacker/jdom/wiki/JDOM2-Feature-XPath-Upgrade
#wds is right about not having the correct namespace for the xforms elements too.... and that is why you XPath is working, because it has the same namespace as the xhtml elements with the 'h' prefix. Your code is likely to be broken still.
Namespaces in XPaths often confuse people, because every namespace in an XPath has to have a prefix. Even if something is the default namespace in the XML (no prefix like your 'model' element), it has to have one in the XPath. queries with no prefix in the XPath always reference the 'no namespace' namespace.... (XPath specification: http://www.w3.org/TR/xpath/#node-tests )
A QName in the node test is expanded into an expanded-name using the namespace
declarations from the expression context. This is the same way expansion is done
for element type names in start and end-tags except that the default namespace
declared with xmlns is not used: if the QName does not have a prefix, then the
namespace URI is null (this is the same way attribute names are expanded). It is
an error if the QName has a prefix for which there is no namespace declaration in
the expression context
Assuming #wds is correct, and the namespace for the model element is supposed to be "http://www.w3.org/2002/xforms" then your namespace delcaration in your document should be xmlns="http://www.w3.org/2002/xforms". But, this namespace is the 'default' namespace, and the URI for the no-prefix namespace in your XPath query is "".
To access the http://www.w3.org/2002/xforms namespace in your XPath you have to give it a prefix fo the context of the XPath, let's say xpns (for xpath namespace). In JDOM 1.x you add that namespace with:
XPath xpath = XPath.newInstance("/h:html/h:head/xpns:model");
xpath.addNamespace(Namespace.getNamespace("xpns", "http://www.w3.org/2002/xforms");
Element model = (Element)xpath.selectSingleNode(mydoc)
Note how that adds the xpns to the query. Also, note that I have 'anchored' the h:/html reference to the '/' root of the document, which will improve the performance of the query evaluation.
IN JDOM 2.x, the XPath API is significanty better (even though in some cases it may seem overkill).
XPathFactory xpf = XPathFactory.instance();
XPathExpression<Element> xpath = xpf.compile("/h:html/h:head/xpns:model",
Filters.element(), null,
Namespace.getNamesace("xpns", "http://www.w3.org/2002/xforms"));
Element model = xpath.evaluateFirst(mydoc);
See more about the new XPath API in the JDOM 2.x javadoc: XPathFactory.compile(...) javadoc
i have following xml file:
<diagnostic version="1.0">
<!-- diagnostic panel 1 -->
<panel xml:id="0">
<!-- list controls -->
<control xml:id="0_0">
<settings description="text 1"/>
</control>
<control xml:id="0_1">
<settings description="text 2"/>
</control>
</panel>
<panel xml:id="1">
<!-- list controls -->
<control xml:id="1_0">
<settings description="text 3"/>
</control>
<control xml:id="1_1">
<settings description="text 4"/>
</control>
</panel>
</diagnostic>
and definition XPath:
//*[not(#description='-')]/#description
and Java code:
DocumentBuilderFactory domFactory = DocumentBuilderFactory.newInstance();
domFactory.setNamespaceAware(true);
DocumentBuilder builder = domFactory.newDocumentBuilder();
Document doc = builder.parse("diagnostic.xml");
XPath xpath = XPathFactory.newInstance().newXPath();
// XPath Query for showing all nodes value
XPathExpression expr = xpath.compile("//*[not(#description='-')]/#description");
Object result = expr.evaluate(doc, XPathConstants.NODESET);
NodeList nodes = (NodeList) result;
for (int i = 0; i < nodes.getLength(); i++) {
System.out.println(i + ": " + nodes.item(i).getParentNode() + nodes.item(i).getNodeValue());
}
This definition of XPath would return all attribute values ​​description where the value of this attribute is not '-'.
Output:
text 1
text 2
text 3
text 4
But I need to find this attribute description also attribute xml:id element control.
Output:
0_0 text 1
0_1 text 2
1_0 text 3
1_1 text 4
How to do that in my description also returns a xml:id element of control? I need to know that the description given element is control.
Someone correct me if I'm wrong, but I don't think this can be done with a single XPath expression. The concat function returns a single text result, not a list. I suggest you run multiple XPath expressions and construct your results from that, or run a single XPath expression to get the settings elements you need, then take the description attribute from it and concatenate it with the xml:id attribute from the parent element if that's a control one.
Nodes keep references to their parents. Use method getParentNode() to obtain it.
Here's an alternative: run this XPath expression...
//control[settings[#description!='-']]/#xml:id | //control/settings[#description!='-']/#description
... and then concatenate the text of the alternating results in the returned node list. In other words, text from item 0 + item 1, text from item 2 + item 3 etc.
The above XPath expression will return this node list:
0_0
text 1
0_1
text 2
1_0
text 3
1_1
text 4
You can then parse through that list and construct your results.
Be careful. This will only work if there's at most 1 settings element per control element. Also, you may find that on evaluation the XPath engine throws an error for that xml: prefix. It may say that it's unknown. You might have to bind that prefix to the correct namespace first. Since the xml prefix is reserved and bound by default to a specific namespace, this might not be needed. I'm not certain as I haven't used it before.
I've tested the expression in XMLSpy. It's not entirely impossible that the XPath engine used in Java (or the one you set for use) returns the nodes in another order. It might evaluate both parts of the "or" (the pipe symbol) separately and then dump the results into a single node list. I don't know what the XPath spec mandates regarding result ordering.
I may be just as wrong, but the nodes you traverse in the result are the XML nodes themselves. Your code sample is almost there:
- nodes.item(i) points to the attribute "description".
- nodes.item(i).getParentNode() points to the tag "settings".
- nodes.item(i).getParentNode().getParentNode() would point to the tag "control" (class Element). You could then use getAttribute() or getAttributeNS() on that node to find get the attribute you need.