I have an xml document looks like:
<xmlList>
<Phone>
<Prefix>04</Prefix>
</Phone>
<Phone>
<Prefix>04</Prefix>
</Phone>
<Phone>
<Prefix>03</Prefix>
</Phone>
</xmlList>
I would like to retrieve the Prefix node content onlt in case it is 04.
String xml = "<xmlList><Phone><Prefix>04</Prefix></Phone><Phone><Prefix>04</Prefix></Phone><Phone><Prefix>03</Prefix></Phone></xmlList>";
XPathFactory xpathFactory = XPathFactory.newInstance();
XPath xpath = xpathFactory.newXPath();
InputSource source = new InputSource(new StringReader(xml));
// only one string is returned
String prefix = xpath.evaluate("/xmlList/Phone/Prefix", source);
Only one string is retrieved from xpath.evaluate.
I would like to get a list with all of the 04 occurences in given XML.
Possible?
As you can see in the documentation https://docs.oracle.com/javase/7/docs/api/javax/xml/xpath/XPath.html#evaluate%28java.lang.String,%20org.xml.sax.InputSource%29, that overload of the evaluate method evaluates the XPath and returns the result as string. As with XPath 1.0 the string value of a set of nodes is the string value of the first node in the node set, you get a string with the contents of the first selected node.
So you will need to use a different overload where you can specify the result type as NODESET and then you can iterate over the returned NodeList to collect the values.
Or consider to switch to an XPath 2.0 or 3.0 or XQuery 1.0 or 3.0 implementation like Saxon 9 where there are then APIs to return a sequence of strings for e.g. /xmlList/Phone/Prefix/string(). You will need to use a different API however than the JAXP XPath API which is centered around XPath 1.0.
Related
I am trying to get the value of the tag "fax" ( see sample XML below ) using XPath in java ...
I decided to try and get the nodes for "business" and step through the debugger to see if I could see the tags ...does not seem to work ...the code fragment I am using is:
String path =
"/locationDetailResponse/locationInfo/locationBusinessList/business"
XPath xPath = XPathFactory.newInstance().newXPath();
Element userElement = (Element) xPath.evaluate(path, documentObject,
XPathConstants.NODE);
documentObject contains an org.w3c.dom.Document object
<location>
<locationInfo>
<warehouseId>99</warehouseId>
<nearByLocations>
<location>
<name>Morganton, NC</name>
<url>morganton-nc-hvac</url>
</location>
<location>
<name>Statesville, NC</name>
<url>statesville-nc-plumbing</url>
</location>
</nearByLocations>
<locationBusinessList>
<business>
<id>123</id>
<fax>(800) 555-1212</fax>
</business>
<business>
<id>456</id>
<fax>(800) 666-2323</fax>
</business>
</locationBusinessList>
</locationInfo>
</location>
Any ideas on the proper XPath expression I should be using ?
You can try change / to // at beginning of line,
or use local-name:
//*[local-name()='location']/*[local-name()='locationInfo']/*[local-name()='locationBusinessList']/*[local-name()='business']
I have XML file like
<Parent>
<child1 key= "">
<sub children>
</child1>
<child2 key="">
<sub children>
</child2>
</parent>
In this XML file I would like to get all nodes which have attribute 'key'.
How to achieve this using best Java XML Parser?
I tried with StAX parser but it has to check every element to check whether it has attribute 'key' or not. So, it takes time to give output in case of large files.
xpath for nodes with key (empty or not):
expression="//*[#key]";
or, for didactic purpose: empty (#key='') or not empty (string(#key))
expression="//*[(#key='')or(string(#key))]";
To parse with DOM, there are many examples abroad.
standard code:
DocumentBuilderFactory builderFactory =DocumentBuilderFactory.newInstance();
DocumentBuilder builder = builderFactory.newDocumentBuilder();
Document document = builder.parse(new InputSource(new StringReader(xml)));
XPath xpath = XPathFactory.newInstance().newXPath();
String expression="//*[(#key='')or(string(#key))]";
Set<String> towns=new HashSet<String>();
XPathExpression expr = xpath.compile(expression) ;
NodeList nodes = (NodeList) expr.evaluate(document, XPathConstants.NODESET);
I have read some links on parsing xml document like below:
<inventory>
<book year="2000">
<title>Snow Crash</title>
<author>Neal Stephenson</author>
<publisher>Spectra</publisher>
<isbn>0553380958</isbn>
<price>14.95</price>
</book>
<book year="2005">
<title>Burning Tower</title>
<author>Larry Niven</author>
<author>Jerry Pournelle</author>
<publisher>Pocket</publisher>
<isbn>0743416910</isbn>
<price>5.99</price>
</book>
<!-- more books... -->
</inventory>
using DOM parsing:
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse(<uri_as_string>);
XPathFactory xPathfactory = XPathFactory.newInstance();
XPath xpath = xPathfactory.newXPath();
XPathExpression expr = xpath.compile(<xpath_expression>);
however, their purpose are mostly to get VALUE of some node(s) by tag or by attribute from the document.
My purpose is to get the entire XML STRING of the node(s) back. For example, using Xpath /inventory/book[#year='2005'], i want to get the following xml back in a single string, i.e.
<book year="2005">
<title>Burning Tower</title>
<author>Larry Niven</author>
<author>Jerry Pournelle</author>
<publisher>Pocket</publisher>
<isbn>0743416910</isbn>
<price>5.99</price>
</book>
What is the API used for this purpose? And do i even need the DOM parsing in this case? Thanks,
COMMENT:
Maybe I should emphasize that I am asking this question as a XML related one, not a text file processing question. Concepts like 'tag', 'attribute', 'Xpath' still apply. The DOM model is not totally irrelevant. It's just that instead of getting the 'element' or value of a node, i want to get the whole node.
The given answers can not solve problems like: how to get a node in xml string format, given the node's Xpath representation, such as //book or /inventory/book[1]?
DOM parsers are designed to get values from the them not for actual file content.
You can use a simple file reader instead of XML.
Read line by line using a simple FileReader and check the line for the Condition and if the condition is met start the read content to concat as you want until the End of the node .
You can do it as
if(lineReadFromFile=="Your String Condition"){
//collect the desired file content here untill the end of the Node is found
}
You can simply read XML from file (consider it to be a normal text file) using FileReader. Simple apply the condition for example :
if(line.equals("<book year="2005"><title>Burning Tower</title>")) {
// retrieve/save the required content
}
I need to read output of 'search' tag from following url usign Java.
First I need to read XML into some string from following URL:
http://en.wikipedia.org/w/api.php?format=xml&action=query&list=search&srlimit=1&srsearch=big+brother
I should end up having this:
<api>
<query-continue>
<search sroffset="1"/>
</query-continue>
<query>
<searchinfo totalhits="55180"/>
<search>
<p ns="0" title="Big Brothers Big Sisters of America" snippet="<span class='searchmatch'>Big</span> <span class='searchmatch'>Brothers</span> <span class='searchmatch'>Big</span> Sisters of America is a 501(c)(3) non-profit organization whose goal is to help all children reach their potential through <b>...</b> " size="13008" wordcount="1906" timestamp="2014-04-15T06:46:01Z"/>
</search>
</query>
</api>
Then once I have the XML, I need to get content of the search tag:
Output of 'search' tag looks like this and I need to get two parts from the code in the middle:
<search>
<p ns="0" title="Big Brothers Big Sisters of America" snippet="<span class='searchmatch'>Big</span> <span class='searchmatch'>Brothers</span> <span class='searchmatch'>Big</span> Sisters of America is a 501(c)(3) non-profit organization whose goal is to help all children reach their potential through <b>...</b> " size="13008" wordcount="1906" timestamp="2014-04-15T06:46:01Z"/>
</search>
At the end, all I need is to have two strings, which would equal to this:
String title = Big Brothers Big Sisters of America
String snippet = "<span class='searchmatch'>Big..."
Can someone please help me amending this code, I am not sure what I am doing wrong. I don't think it's even retrieving XML from url, much less the tags inside the XML.
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse("http://en.wikipedia.org/w/api.php?format=xml&action=query&list=search&srlimit=1&srsearch=big+brother");
doc.getDocumentElement().normalize();
XPathFactory xFactory = XPathFactory.newInstance();
XPath xpath = xFactory.newXPath();
XPathExpression expr = xpath.compile("//query/search/text()");
Object result = expr.evaluate(doc, XPathConstants.NODESET);
NodeList nodes = (NodeList) result;
for (int i=0; i<nodes.getLength();i++){
System.out.println(nodes.item(i).getNodeValue());
}
Sorry, I am a newbie and can't find the answer to this anywhere.
The main problem here is that you're asking for text nodes that are children of <search>, but in fact the <p ..> that you want is not a text node: it's an element. (In fact, the <search> element has no text node children, as you can tell when you view the response from that URL using "View Source".)
So what you want to do is change your XPath expression to
//query/search/p
which will give you the p element node. Then ask for the value of this node's two attributes title and snippet in your Java code:
Element e = (Element)(nodes.item(i));
String title = e.getAttribute("title");
String snippet = e.getAttribute("snippet");
Or, you could do two XPath queries, one for each attribute:
//query/search/p/#title
and
//query/search/p/#snippet
assuming there will only be one <p> element. If you were doing this over multiple <p> elements, you'd probably want to keep each pair of attributes together instead of having two separate lists of results.
Dipping my toe in a little Java at the minute and have a question about XPath.
I have a large Xml and I want to use XPath to be able to grab a specific node and then fire further XPath calls against this small chunk of Xml.
Here s rough outline of my Xml:
<Page>
<ComponentPresentations>
<ComponentPresentation>
<Component>
<Title>
<ComponentTemplate>
<ComponentPresentation>
<Component>
<Title>
<ComponentTemplate>
My first XPath selects the <Component> node based upon the value of a <ComponenTemplate> Id value:
String componentExpFormat = "/Page/ComponentPresentations/ComponentPresentation/ComponentTemplate/Id[text()='%1$s']/ancestor::ComponentPresentation";
String componentExp = String.format(componentExpFormat, template);
XPathExpression expComponent = xPath.compile(componentExp);
Node componentXml = (Node) expComponent.evaluate(xmldoc, XPathConstants.NODE);
This gives me the <Component> I want but I can;t seem to be able to then XPath against the Node:
String componentExpTitle = "/Component/Fields/item/value/Field/Name[text()='title']/parent::node()/Values/string";
XPathExpression expTitle = xPath.compile(componentExpTitle);
String eventName = expTitle.evaluate(componentXml, XPathConstants.STRING).toString();
Without this I'll have to include the full XPath each time:
/Page/ComponentPresentations/ComponentPresentation/ComponentTemplate/Id[text()='%1$s']/ancestor::ComponentPresentation/Component/Fields/item/value/Field/Name[text()='title']/parent::node()/Values/string
Is that the only way?
Cheers
An XPath expression with a leading slash
/Component/Fields/item
is absolute, and when you evaluate it with a particular context node it will start looking from the root of the document that the context node belongs to. If you remove the leading slash
Component/Fields/item
it will look for Component children of the context node.
As an aside, you can simplify those XPaths quite a bit, you don't need all the up and down the tree stuff with ancestor::, and you also don't need to use text():
componentExpFormat = "/Page/ComponentPresentations/ComponentPresentation[ComponentTemplate/Id='%1$s']";
componentExpTitle = "Component/Fields/item/value/Field[Name='title']/Values/string";