I'm attempting to use JDOM2 in order to extract the information I care about out of a XML document. How do I get a tag within a tag?
I have been only partially successful. While I have been able to use xpath to extract <record> tags, the xpath query to extract the title, description and other data with in the record tags has been returning null.
I've been using Xpath successfully to extract <record> tags out of the document. To do this I use the follwing xpath query: "//oai:record" where the "oai" namespace is a namespace I made up in order to use xpath.
You can see the XML document I'm parsing here, and I've put a sample below: http://memory.loc.gov/cgi-bin/oai2_0?verb=ListRecords&set=cwp&metadataPrefix=oai_dc
<record>
<header>
<identifier>oai:lcoa1.loc.gov:loc.pnp/cph.3a02293</identifier>
<datestamp>2009-05-27T07:22:37Z</datestamp>
<setSpec>cwp</setSpec>
<setSpec>lcphotos</setSpec>
</header>
<metadata>
<oai_dc:dc xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
<dc:title>Jubal A. Early</dc:title>
<dc:description>This record contains unverified, old data from caption card.</dc:description>
<dc:date>[between 1860 and 1880]</dc:date>
<dc:type>image</dc:type>
<dc:type>still image</dc:type>
<dc:identifier>http://hdl.loc.gov/loc.pnp/cph.3a02293</dc:identifier>
<dc:language>eng</dc:language>
<dc:rights>No known restrictions on publication.</dc:rights>
</oai_dc:dc>
</metadata>
</record>
If you look in the larger document you will see that there is never a "xmlns" attribute listed on any of the tags. There is also the matter of there being three different namespaces in the document ("none/oai", "oai_dc", "dc").
What is happening is that the xpath is matching nothing, and evaluateFirst(parent) is returning null.
Here is some of my code to extract the title, date, description etc. out of the record element.
XPathFactory xpf = XPathFactory.instance();
XPathExpression<Element> xpath = xpf.compile("//dc:title",
Filters.element(), null,
namespaceList.toArray(new Namespace[namespaceList.size()]));
Element tag = xpath.evaluateFirst(parent);
if(tag != null)
{
return Option.fromString(tag.getText());
}
return Option.none();
Any thoughts would be appreciated! Thanks.
In your XML, dc prefix mapped to the namespace uri http://purl.org/dc/elements/1.1/, so make sure you declared the namespace prefix mapping to be used in the XPath accordingly. This is part where the namespace prefix declare in your XML :
<oai_dc:dc
xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/
http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
XML parser only see the namespace explicitly declared in the XML, it won't try to open the namespace URL since namespace is not necessarily a URL. For example, the following URI which I found in this recent SO question is also acceptable for namespace : uuid:ebfd9-45-48-a9eb-42d
Related
I am creating a xml request using java.
I am new in creating xmls using java.
Here is code:
Document doc = docBuilder.newDocument();
Element rootElement = doc.createElement("UserRequest");
rootElement.setAttributeNS("http://www.w3.org/2000/xmlns/", "xmlns:ns0", "https://com.user.req");
rootElement.setAttributeNS("http://www.w3.org/2000/xmlns/", "xmlns:xsi", "http://www.w3.org/2001/XMLSchema-instance");
doc.appendChild(rootElement);
// user element
Element user = doc.createElement("User");
rootElement.appendChild(user);
// userAttributes element
Element userAttr = doc.createElement("UserAttributes");
rootElement.appendChild(userAttr);
// name elements
Element name = doc.createElement("Name");
name.appendChild(doc.createTextNode("hello"));
userAttr.appendChild(name);
// value elements
Element value = doc.createElement("Value");
name.appendChild(doc.createTextNode("dude"));
userAttr.appendChild(value);
Expected output:
<?xml version="1.0" encoding="UTF-8"?>
<UserRequest
xmlns:ns0="https://com.user.req"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:type="ns0:UserRequest">
<User/>
<UserAttributes>
<Name>hello</Name>
<Value>dude</Value>
</UserAttributes>
</UserRequest>
Generated output:
<?xml version="1.0" encoding="UTF-8"?>
<UserRequest
xmlns:ns0="https://com.user.req"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
<User/>
<UserAttributes>
<Name>hello</Name>
<Value>dude</Value>
</UserAttributes>
</UserRequest>
How to get correct namespace (as shown at expected section).
There's nothing wrong with the namespaces in your generated output. However this is an accident ... you're using setAttributeNS() to do something it's not intended for.
Read up on XML namespace declarations and namespace prefixes. That will be a lot easier than trying to explain point-by-point why you're not getting what you expected. For example, xmlns is not a namespace prefix, and xsi:type is not a namespace.
Instead of trying to create the desired namespace declarations as if they were normal attributes, delete these two lines
rootElement.setAttributeNS("http://www.w3.org/2000/xmlns/",
"xmlns:ns0", "https://com.user.req");
rootElement.setAttributeNS("http://www.w3.org/2000/xmlns/",
"xmlns:xsi", "http://www.w3.org/2001/XMLSchema-instance");
and instead use
rootElement.setAttributeNS("http://www.w3.org/2001/XMLSchema-instance",
"xsi:type", "ns0:UserRequest");
This should give you most of your expected output, except for the ns0 namespace prefix declaration. It won't generate that because you're not using ns0 on any element or attribute. Did you mean to have
<ns0:UserRequest ...
in your expected output?
This question already has answers here:
XPath with namespace in Java
(2 answers)
Closed 8 years ago.
I have an xml file, I use the xpath /content which returns the following line:
<content xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:nil="true"/>
How can I get the type 'nil'? I am trying to write a test which will check if the content is empty, if there is no conten it will say nil="true" otherwise it will give the type e.g. xsi:type="String">true
I've tried //content[#xsi] and //content/xsi but still can't limit it down to just the part I want.
I could just get a substring but I think there must be a way to do it with xpath.
In your code snippet:
<content xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:nil="true"/>
there is a namespace. Although it does not apply to the element content, it does apply to the attribute nil. It is an attribute from the XML Schema specification, by the way.
To find an attribute that is in a namespace you either
cleanly register this namespace in your application and prefix the attribute in any XPath expression
ignore all namespaces in path expressions
You did not show any code, so I find it hard to comment on the first option. The second option means a path expression like
//content[#*[local-name() = 'nil'] = 'true']
will select content elements where the xsi:nil attribute value is "true".
I have an xform document
<?xml version="1.0" encoding="UTF-8"?><h:html xmlns:h="http://www.w3.org/1999/xhtml" xmlns="http://www.w3.org/1999/xhtml"
xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:jr="http://openrosa.org/javarosa">
<h:head>
<h:title>Summary</h:title>
<model>
<instance>
<data vaultType="nsp_inspection.4.1">
<metadata vaultType="metadata.1.1">
<form_start_time type="dateTime" />
<form_end_time type="dateTime" />
<device_id type="string" />
<username type="string" />
</metadata>
<date type="date" />
<monitor type="string" />
</data>
</instance>
</model>
</h:head>
I would like to select the data element from the xform using xpath and jdom
XPath xpath = XPath.newInstance("h:html/h:head/h:title/");
seems to work fine and selects the title element but
XPath xpath = XPath.newInstance("h:html/h:head/model");
does not select the model element.
I guess it has something to do with the namespace.
A few things. You really should be using JDOM 2.0.x ... (2.0.5 is latest release). The XPath API in the 2.0.x versions is far better than the one in JDOM 1.x: see https://github.com/hunterhacker/jdom/wiki/JDOM2-Feature-XPath-Upgrade
#wds is right about not having the correct namespace for the xforms elements too.... and that is why you XPath is working, because it has the same namespace as the xhtml elements with the 'h' prefix. Your code is likely to be broken still.
Namespaces in XPaths often confuse people, because every namespace in an XPath has to have a prefix. Even if something is the default namespace in the XML (no prefix like your 'model' element), it has to have one in the XPath. queries with no prefix in the XPath always reference the 'no namespace' namespace.... (XPath specification: http://www.w3.org/TR/xpath/#node-tests )
A QName in the node test is expanded into an expanded-name using the namespace
declarations from the expression context. This is the same way expansion is done
for element type names in start and end-tags except that the default namespace
declared with xmlns is not used: if the QName does not have a prefix, then the
namespace URI is null (this is the same way attribute names are expanded). It is
an error if the QName has a prefix for which there is no namespace declaration in
the expression context
Assuming #wds is correct, and the namespace for the model element is supposed to be "http://www.w3.org/2002/xforms" then your namespace delcaration in your document should be xmlns="http://www.w3.org/2002/xforms". But, this namespace is the 'default' namespace, and the URI for the no-prefix namespace in your XPath query is "".
To access the http://www.w3.org/2002/xforms namespace in your XPath you have to give it a prefix fo the context of the XPath, let's say xpns (for xpath namespace). In JDOM 1.x you add that namespace with:
XPath xpath = XPath.newInstance("/h:html/h:head/xpns:model");
xpath.addNamespace(Namespace.getNamespace("xpns", "http://www.w3.org/2002/xforms");
Element model = (Element)xpath.selectSingleNode(mydoc)
Note how that adds the xpns to the query. Also, note that I have 'anchored' the h:/html reference to the '/' root of the document, which will improve the performance of the query evaluation.
IN JDOM 2.x, the XPath API is significanty better (even though in some cases it may seem overkill).
XPathFactory xpf = XPathFactory.instance();
XPathExpression<Element> xpath = xpf.compile("/h:html/h:head/xpns:model",
Filters.element(), null,
Namespace.getNamesace("xpns", "http://www.w3.org/2002/xforms"));
Element model = xpath.evaluateFirst(mydoc);
See more about the new XPath API in the JDOM 2.x javadoc: XPathFactory.compile(...) javadoc
I'm using SAX to read/parse XML documents and I have it working fine except for this particular site where eclipse tells me "junk after document element" and I get no data returned
http://www.zachblume.com/apis/rhyme.php?format=xml&word=example
The site is not mine..just trying to get some data from it.
Yes, that's not an XML document. It's trying to include more than one root element:
<?xml version="1.0"?>
<word>ampal</word>
<word>ample</word>
<word>hampel</word>
<word>hample</word>
<word>lampl</word>
<word>pampel</word>
<word>sample</word>
The parser regards everything after <word>ampal</word> as by that time it's read a complete document... hence the complain about "junk after document element".
An XML document can only have one root, but several children within the root. For example:
<?xml version="1.0"?>
<words>
<word>ampal</word>
<word>ample</word>
<word>hampel</word>
<word>hample</word>
<word>lampl</word>
<word>pampel</word>
<word>sample</word>
</words>
The page does not contain XML. It contains an XML snippet at best:
<?xml version="1.0"?>
<word>ampal</word>
<word>ample</word>
<word>hampel</word>
<word>hample</word>
<word>lampl</word>
<word>pampel</word>
<word>sample</word>
This is incorrect since there is no document element. SAX interprets the first <word> as the document element, and correctly reports "junk after document element" since for all it knows, the document element ends on line 1.
To get around the error, do not treat this document as XML. Download it as text, remove the XML declaration (<?xml version="1.0"?>) and then wrap it in a fake document element before you try to process it.
I am developing xml editor using jsp and servlet. In this case i am using DOM parser.
I have one problem in XML editor ,
How to edit the following xml file without losing elements.
eg:
<book id="b1">
<bookbegin id="bb1">
<para id="p1">This is<b>first</b>line</para>
<para id="p2">This is<b>second</b>line</para>
<para id="p3">This is<b>third</b>line</para>
</bookbegin>
</book>
I try to edit the above xml file using dtd using jsp,servlet. but while i read the textvalue from xml, it return only first,second,third.How to read the 'This is' and 'line '. Then how to store back to the xml file using xpath.
thank in advance.
The <b> tag inside the <para> tag is another element, not a formatting tag (in XML). Therefore, you need to traverse down to it.
Like #JRL says, the <b> tags are cosnidered as well-formed XML and, as a consequence, splitted by your DOM processor.
I think youf ail to read other text elements because you only read text when an XML node has no more XML node, which is not your case here.