I have XML file like
<Parent>
<child1 key= "">
<sub children>
</child1>
<child2 key="">
<sub children>
</child2>
</parent>
In this XML file I would like to get all nodes which have attribute 'key'.
How to achieve this using best Java XML Parser?
I tried with StAX parser but it has to check every element to check whether it has attribute 'key' or not. So, it takes time to give output in case of large files.
xpath for nodes with key (empty or not):
expression="//*[#key]";
or, for didactic purpose: empty (#key='') or not empty (string(#key))
expression="//*[(#key='')or(string(#key))]";
To parse with DOM, there are many examples abroad.
standard code:
DocumentBuilderFactory builderFactory =DocumentBuilderFactory.newInstance();
DocumentBuilder builder = builderFactory.newDocumentBuilder();
Document document = builder.parse(new InputSource(new StringReader(xml)));
XPath xpath = XPathFactory.newInstance().newXPath();
String expression="//*[(#key='')or(string(#key))]";
Set<String> towns=new HashSet<String>();
XPathExpression expr = xpath.compile(expression) ;
NodeList nodes = (NodeList) expr.evaluate(document, XPathConstants.NODESET);
Related
I am creating an XML file using Java.
I created root node using createElemenNS() to createh root node with namespace.
Element root = doc.createElementNS("http://www.myorc.com/schemas", "InvConf");
Then I created a node using createElement() and added it to root node. This node is automatically added with namespace like below.
Element invList = doc.createElement("InvList");
root.appendChild(invList);
<InvConf xmlns="http://www.myorc.com/schemas">
<InvList xmlns="">
...
</InvList>
<InvList xmlns="">
...
</InvList>
<InvList xmlns="">
...
</InvList>
</InvConf>
How to avoid adding the namespace to child nodes ?
I want the final XML to be like the following
<InvConf xmlns="http://www.myorc.com/schemas">
<InvList>
...
</InvList>
<InvList>
...
</InvList>
<InvList>
...
</InvList>
</InvConf>
Found that issues is coming only when xmlparserv2.jar is in CLASSPATH. This is required by some parts for the application. How to resolve this ?
The xmlns="" were added because your child is not in a namespace, and your parent is. To change that, put it in the namespace at the time you create the element.
Change the
createElement("InvList");
To the correct namespace.
As pointed out in comments xmlns="" means that element doesn't have any namespace. E.g. from XML parser point of view following two documents are identical:
<ns:root xmlns:ns="http://namespace.com">
<child/>
</ns:root>
and
<root xmlns="http://namespace.com">
<child xmlns=""/>
</root>
To avoid creation of xmlns="" in elements not belonging to any namespace you can create prefix on upper level element:
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setNamespaceAware(true);
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.newDocument();
Element root = doc.createElementNS("http://namespace.com", "root");
root.setPrefix("ns");
Element child = doc.createElementNS("", "child");
root.appendChild(child);
doc.appendChild(root);
This code will create following XML:
<ns:root xmlns:ns="http://namespace.com">
<child/>
</ns:root>
Alternatively you can use following syntax to achieve same result:
Document doc = db.newDocument();
Element root = doc.createElementNS("http://namespace.com", "ns:root");
Element child = doc.createElementNS("", "child");
root.appendChild(child);
doc.appendChild(root);
When commenting out line root.setPrefix("ns"); or creating element without prefix (doc.createElementNS("http://namespace.com", "root");) following XML will be generated:
<root xmlns="http://namespace.com">
<child xmlns=""/>
</root>
I have read some links on parsing xml document like below:
<inventory>
<book year="2000">
<title>Snow Crash</title>
<author>Neal Stephenson</author>
<publisher>Spectra</publisher>
<isbn>0553380958</isbn>
<price>14.95</price>
</book>
<book year="2005">
<title>Burning Tower</title>
<author>Larry Niven</author>
<author>Jerry Pournelle</author>
<publisher>Pocket</publisher>
<isbn>0743416910</isbn>
<price>5.99</price>
</book>
<!-- more books... -->
</inventory>
using DOM parsing:
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse(<uri_as_string>);
XPathFactory xPathfactory = XPathFactory.newInstance();
XPath xpath = xPathfactory.newXPath();
XPathExpression expr = xpath.compile(<xpath_expression>);
however, their purpose are mostly to get VALUE of some node(s) by tag or by attribute from the document.
My purpose is to get the entire XML STRING of the node(s) back. For example, using Xpath /inventory/book[#year='2005'], i want to get the following xml back in a single string, i.e.
<book year="2005">
<title>Burning Tower</title>
<author>Larry Niven</author>
<author>Jerry Pournelle</author>
<publisher>Pocket</publisher>
<isbn>0743416910</isbn>
<price>5.99</price>
</book>
What is the API used for this purpose? And do i even need the DOM parsing in this case? Thanks,
COMMENT:
Maybe I should emphasize that I am asking this question as a XML related one, not a text file processing question. Concepts like 'tag', 'attribute', 'Xpath' still apply. The DOM model is not totally irrelevant. It's just that instead of getting the 'element' or value of a node, i want to get the whole node.
The given answers can not solve problems like: how to get a node in xml string format, given the node's Xpath representation, such as //book or /inventory/book[1]?
DOM parsers are designed to get values from the them not for actual file content.
You can use a simple file reader instead of XML.
Read line by line using a simple FileReader and check the line for the Condition and if the condition is met start the read content to concat as you want until the End of the node .
You can do it as
if(lineReadFromFile=="Your String Condition"){
//collect the desired file content here untill the end of the Node is found
}
You can simply read XML from file (consider it to be a normal text file) using FileReader. Simple apply the condition for example :
if(line.equals("<book year="2005"><title>Burning Tower</title>")) {
// retrieve/save the required content
}
I have an xml document looks like:
<xmlList>
<Phone>
<Prefix>04</Prefix>
</Phone>
<Phone>
<Prefix>04</Prefix>
</Phone>
<Phone>
<Prefix>03</Prefix>
</Phone>
</xmlList>
I would like to retrieve the Prefix node content onlt in case it is 04.
String xml = "<xmlList><Phone><Prefix>04</Prefix></Phone><Phone><Prefix>04</Prefix></Phone><Phone><Prefix>03</Prefix></Phone></xmlList>";
XPathFactory xpathFactory = XPathFactory.newInstance();
XPath xpath = xpathFactory.newXPath();
InputSource source = new InputSource(new StringReader(xml));
// only one string is returned
String prefix = xpath.evaluate("/xmlList/Phone/Prefix", source);
Only one string is retrieved from xpath.evaluate.
I would like to get a list with all of the 04 occurences in given XML.
Possible?
As you can see in the documentation https://docs.oracle.com/javase/7/docs/api/javax/xml/xpath/XPath.html#evaluate%28java.lang.String,%20org.xml.sax.InputSource%29, that overload of the evaluate method evaluates the XPath and returns the result as string. As with XPath 1.0 the string value of a set of nodes is the string value of the first node in the node set, you get a string with the contents of the first selected node.
So you will need to use a different overload where you can specify the result type as NODESET and then you can iterate over the returned NodeList to collect the values.
Or consider to switch to an XPath 2.0 or 3.0 or XQuery 1.0 or 3.0 implementation like Saxon 9 where there are then APIs to return a sequence of strings for e.g. /xmlList/Phone/Prefix/string(). You will need to use a different API however than the JAXP XPath API which is centered around XPath 1.0.
I am trying to create an org.w3c.dom.Document object from an xml string. I have followed what many have suggested in other questions but the document ends up empty. What is wrong with the following code?
DocumentBuilderFactory docFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder docBuilder = docFactory.newDocumentBuilder();
Document doc = docBuilder.parse(new InputSource(new StringReader(response.getResponseText())));
And the xml text in the string looks like the following (this comes from response.getResponseText())
<s:Envelope xmlns:s="http://www.w3.org/2003/05/soap-envelope" xmlns:a="http://www.w3.org/2005/08/addressing">
<s:Header>
<a:Action s:mustUnderstand="1">http://www.blah.com/ns/2006/05/01/webservices/123/TokenManagement_1/CreateServiceToken_1_Reply</a:Action>
<CacheResponse xsi:type="DoNotStoreCacheResponse" xmlns="http://www.blah.com/ns/2008/03/01/webservices/123/Cache_1" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<Date>2012-09-04T15:35:06.8116593Z</Date>
<DoNotStore />
</CacheResponse>
<a:RelatesTo>ba04425d-d93e-4a70-a134-ab8e29d5345c}</a:RelatesTo>
</s:Header>
<s:Body>
<CreateServiceToken_Response_1 xmlns="http://www.blah.com/ns/2006/05/01/webservices/123/TokenManagement_1" xmlns:global="http://www.blah.com/ns/2006/05/01/webservices/123/Common_1">
<Expiration>2012-09-04T17:04:19.1834228Z</Expiration>
<global:Token>3DEC2723A01047D1590544CBA5BA1E30326535E609DC1E6FAC5C659BC3B8A693BB054834A58B235037ED830CD05784DB176A62309AEB4B608C6F0B5B3F13ADE0EC56BE9F822ACFA3B549D4427D89BF030BFF48BA671DCAEB49940EFEBDEBFB71</global:Token>
</CreateServiceToken_Response_1>
</s:Body>
Can anyone see what is wrong with my code? I ultimately just want to run a couple of xpath queries on the document...
I would suggest to start with setting docFactory.setNamespaceAware(true);, otherwise the parsing, the DOM built and the XPath implementation will not be able to work with XML with namespaces as you have posted.
I have to update a strictly defined (e.g. can't alter the format) XML document. I am using the DOM parser to load the file and update where appropriately. Unfortunately, the document does supply ids to anything, so I am forced to use getElementsByTagName to find the nodes/elements I need to update.
I haven't had any issues yet, but have just come across a section of text like:
<types>
<type type_def_id="1" type_value="008" />
<type type_def_id="6" type_value="uhl" />
<type type_def_id="9" type_value="xpm" />
<type type_def_id="11" type_value="4100" />
</types>
Using getElementsByTagName, I would need to iterate through the NodeList finding the type_def_id I need to updated which doesn't seem to be the best approach.
Any suggestions using Java 1.4?
As #biziclop suggested, XPath would be an efficient method in terms of programmer time (and also probably in terms of CPU time).
Here is a primer on using the javax.xml.xpath package in Java 5.
A code sample based on the above article would be:
DocumentBuilderFactory domFactory = DocumentBuilderFactory.newInstance();
domFactory.setNamespaceAware(true); // never forget this!
DocumentBuilder builder = domFactory.newDocumentBuilder();
Document doc = builder.parse("myInput.xml");
XPathFactory factory = XPathFactory.newInstance();
XPath xpath = factory.newXPath();
XPathExpression expr = xpath.compile("//type[type_def_id = '9']");
Object result = expr.evaluate(doc, XPathConstants.NODESET);
NodeList nodes = (NodeList) result;
for (int i = 0; i < nodes.getLength(); i++) {
// do what you need to do...
}