I am trying to get a specific value from an xml. When I iterate over the nodes, the value is never returned. Here is the xml sample
<Fields>
<Field FieldName="NUMBER">
<String>1234</String>
</Field>
<Field FieldName="TYPE">
<String>JAVA</String>
</Field>
<Field FieldName="ATYPE">
<String>BB</String>
</Field>
</Fields>
Here is what I have attempted based on this online resource that looks like my sample xml file
private static void updateElementValue(Document doc) {
NodeList employees = doc.getElementsByTagName("Field");
Element emp = null;
//loop for each
for(int i=0; i<employees.getLength();i++){
emp = (Element) employees.item(i);
System.out.println("here is the emp " + emp);
Node name = emp.getElementsByTagName("NUMBER").item(0).getFirstChild();
name.setNodeValue(name.getNodeValue().toUpperCase());
}
}
This is the online resource guiding my attempts
https://www.journaldev.com/901/modify-xml-file-in-java-dom-parser
Please assist
If you want to get a specific value from XML, XPath API may be more convenient in compare to DOM parser API. Here an example for retrieving value of a "String" elements, which are children of "Field" elements, having attribute "FieldName" with value "NUMBER":
// parse XML document from file
DocumentBuilder db = DocumentBuilderFactory.newInstance().newDocumentBuilder();
Document doc = db.parse(new FileInputStream(fileName));
// prepare an XPath expression
XPathExpression xpath = XPathFactory.newInstance().newXPath().compile("/Fields/Field[#FieldName='NUMBER']/String");
// retrieve from XML nodes using XPath
NodeList list = (NodeList)xpath.evaluate(doc, XPathConstants.NODESET);
// iterate over resulting nodes and retrieve their values
for(int i = 0; i < list.getLength(); i ++) {
Node node = list.item(i);
// udate node content
node.setTextContent("New text");
}
// output edited XML document
StringWriter writer = new StringWriter(); // Use FileWriter to output to the file
TransformerFactory tf = TransformerFactory.newInstance();
Transformer transformer = tf.newTransformer();
transformer.transform(new DOMSource(doc), new StreamResult(writer));
System.out.println(writer.toString());
Related
I am trying to display all text within text nodes only, within an XFA XML document while ignoring namespaces.
I came up with an Xpath that returns the desired results within XMLSpy with xpath 1.0 but the same Xpath in Java returns null for some reason.
Xpath = //*[local-name()='text'][string-length(normalize-space(.))>0]
XPathFactory factory = XPathFactory.newInstance();
XPath xpath = factory.newXPath();
ArrayList<String> list = new ArrayList<>();
XPathExpression expr = xpath.compile("//*[local-name()='text'][string-length(normalize-space(.))>0]");
NodeList nodes = (NodeList) expr.evaluate(doc, XPathConstants.NODESET);
for (int i = 0; i < nodes.getLength(); i++) {
System.out.println("This prints null = " + nodes.item(i).getNodeValue());
}
XML file wouldn't post here so it can be viewed at the link below:
https://drive.google.com/file/d/1n-v3gzT-3GgxNnYKFUvMPjRQmtnkqcpY/view?usp=sharing
The problem is that it's not the <text> elements that contain the values, but their child text nodes.
Replace the line
System.out.println("This prints null = " + nodes.item(i).getNodeValue());
with
System.out.println("This does not print null = " + nodes.item(i).getFirstChild().getNodeValue());
I have the bellow xml:
<modelingOutput>
<listOfTopics>
<topic id="1">
<token id="354">wish</token>
</topic>
</listOfTopics>
<rankedDocs>
<topic id="1">
<documents>
<document id="1" numWords="0"/>
<document id="2" numWords="1"/>
<document id="3" numWords="2"/>
</documents>
</topic>
</rankedDocs>
<listOfDocs>
<documents>
<document id="1">
<topic id="1" percentage="4.790644689978203%"/>
<topic id="2" percentage="11.427632949428334%"/>
<topic id="3" percentage="17.86913349249596%"/>
</document>
</documents>
</listOfDocs>
</modelingOutput>
Ι Want to parse this xml file and get the topic id and percentage from ListofDocs
The first way is to get all document element from xml and then I check if grandfather node is ListofDocs.
But the element document exist in rankedDocs and in listOfDocs, so I have a very large list.
So I wonder if exist better solution to parse this xml avoiding if statement?
My code:
public void parse(){
Document dom = null;
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
InputSource is = new InputSource(new StringReader(xml));
dom = db.parse(is);
Element doc = dom.getDocumentElement();
NodeList documentnl = doc.getElementsByTagName("document");
for (int i = 1; i <= documentnl.getLength(); i++) {
Node item = documentnl.item(i);
Node parentNode = item.getParentNode();
Node grandpNode = parentNode.getParentNode();
if(grandpNode.getNodeName() == "listOfDocs"{
//get value
}
}
}
First, when checking the node name you shouldn't compare Strings using ==. Always use the equals method instead.
You can use XPath to evaluate only the document topic elements under listOfDocs:
XPathFactory xPathFactory = XPathFactory.newInstance();
XPath xPath = xPathFactory.newXPath();
XPathExpression xPathExpression = xPath.compile("//listOfDocs//document/topic");
NodeList topicnl = (NodeList) xPathExpression.evaluate(dom, XPathConstants.NODESET);
for(int i = 0; i < topicnl.getLength(); i++) {
...
If you do not want to use the if statement you can use XPath to get the element you need directly.
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse("source.xml");
XPathFactory xPathfactory = XPathFactory.newInstance();
XPath xpath = xPathfactory.newXPath();
XPathExpression expr = xpath.compile("/*/listOfDocs/documents/document/topic");
NodeList nodes = (NodeList) expr.evaluate(doc, XPathConstants.NODESET);
for (int i = 0; i < nodes.getLength(); i++) {
System.out.println(nodes.item(i).getAttributes().getNamedItem("id"));
System.out.println(nodes.item(i).getAttributes().getNamedItem("percentage"));
}
Please check GitHub project here.
Hope this helps.
I like to use XMLBeam for such tasks:
public class Answer {
#XBDocURL("resource://data.xml")
public interface DataProjection {
public interface Topic {
#XBRead("./#id")
int getID();
#XBRead("./#percentage")
String getPercentage();
}
#XBRead("/modelingOutput/listOfDocs//document/topic")
List<Topic> getTopics();
}
public static void main(final String[] args) throws IOException {
final DataProjection dataProjection = new XBProjector().io().fromURLAnnotation(DataProjection.class);
for (Topic topic : dataProjection.getTopics()) {
System.out.println(topic.getID() + ": " + topic.getPercentage());
}
}
}
There is even a convenient way to convert the percentage to float or double. Tell me if you like to have an example.
How do I list the element names at a given level in an xml schema hierarchy? The code I have below is listing all element names at every level of the hierarchy, with no concept of nesting.
Here is my xml file:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?><?xml-stylesheet type="text/xsl" href="CDA.xsl"?>
<SomeDocument xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="urn:something">
<title>some title</title>
<languageCode code="en-US"/>
<versionNumber value="1"/>
<recordTarget>
<someRole>
<id extension="998991"/>
<addr use="HP">
<streetAddressLine>1357 Amber Drive</streetAddressLine>
<city>Beaverton</city>
<state>OR</state>
<postalCode>97867</postalCode>
<country>US</country>
</addr>
<telecom value="tel:(816)276-6909" use="HP"/>
</someRole>
</recordTarget>
</SomeDocument>
Here is my java method for importing and iterating the xml file:
public static void parseFile() {
//get the factory
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
try {
//Using factory get an instance of document builder
DocumentBuilder db = dbf.newDocumentBuilder();
//parse using builder to get DOM representation of the XML file
Document dom = db.parse("D:\\mypath\\somefile.xml");
//get the root element
Element docEle = dom.getDocumentElement();
//get a nodelist of elements
NodeList nl = docEle.getElementsByTagName("*");
if (nl != null && nl.getLength() > 0) {
for (int i = 0; i < nl.getLength(); i++) {
Node node = nl.item(i);
if (node.getNodeType() == Node.ELEMENT_NODE) {
System.out.println("node.getNodeName() is: "+node.getNodeName());
}
}
}
} catch (Exception e) {
e.printStackTrace();
}
}
The output of the above program is:
title
languageCode
versionNumber
recordTarget
someRole
id
addr
streetAddressLine
city
state
postalCode
country
telecom
Instead, I would like to output the following:
title
languageCode
versionNumber
recordTarget
It would be nice to then be able to list the children of recordTarget as someRole, and then to list the children of someRole as id, addr, and telecom. And so on, but at my discretion in the code. How can I change my code to get the output that I want?
You're getting all nodes with this line:
NodeList nl = docEle.getElementsByTagName("*");
Change it to
NodeList nl = docEle.getChildNodes();
to get all of its children. Your print statement will then give you the output you're looking for.
Then, when you iterate through your NodeList, you can choose to call the same method on each Node you create:
NodeList children = node.getChildNodes();
If you want to print an XML-like structure, perhaps a recursive method that prints all child nodes is what you are looking for.
You could re-write the parseFile (I'd rather call it parseChildrenElementNames) method to take an input String that specifies the element name for which you want to print out its children element names:
public static void parseChildrenElementNames(String parentElementName) {
// get the factory
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
try {
// Using factory get an instance of document builder
DocumentBuilder db = dbf.newDocumentBuilder();
// parse using builder to get DOM representation of the XML file
Document dom = db
.parse("D:\\mypath\\somefile.xml");
// get the root element
NodeList elementsByTagName = dom.getElementsByTagName(parentElementName);
if(elementsByTagName != null) {
Node parentElement = elementsByTagName.item(0);
// get a nodelist of elements
NodeList nl = parentElement.getChildNodes();
if (nl != null) {
for (int i = 0; i < nl.getLength(); i++) {
Node node = nl.item(i);
if (node.getNodeType() == Node.ELEMENT_NODE) {
System.out.println("node.getNodeName() is: "
+ node.getNodeName());
}
}
}
}
} catch (Exception e) {
e.printStackTrace();
}
}
However, this will only consider the first element that matches the specified name.
For example, to get the list of elements under the first node named someRole, you would call parseChildrenElementNames("someRole"); which would print out:
node.getNodeName() is: id
node.getNodeName() is: addr
node.getNodeName() is: telecom
I have the following xml:
<version>
<name>2.0.2</name>
<description>
-Stop hsql database after close fist <br />
-Check for null category name before adding it to the categories list <br />
-Fix NPE bug if there is no updates <br />
-add default value for variable, change read bytes filter, and description of propertyFile <br />
-Change HTTP web Proxy (the “qcProxy” field ) to http://web-proxy.isr.hp.com:8080 <br />
</description>
<fromversion>>=2.0</fromversion>
</version>
I want to return description tag string content using Java?
This is pretty standard Java XML parsing, you can find it anywhere on the internet, but it goes like this using XPath in standard JDK.
String xml = "your XML";
// load the XML as String into a DOM Document object
DocumentBuilderFactory docFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder docBuilder = docFactory.newDocumentBuilder();
ByteArrayInputStream bis = new ByteArrayInputStream(xml.getBytes());
Document doc = docBuilder.parse(bis);
// XPath to retrieve the content of the <version>/<description> tag
XPath xpath = XPathFactory.newInstance().newXPath();
XPathExpression expr = xpath.compile("/version/description");
Node description = (Node)expr.evaluate(doc, XPathConstants.NODE);
System.out.println("description: " + description.getTextContent());
Edit
Since you are having XML <br/> in your text content, it cannot be retrieved from Node.getTextContent(). One solution is to transform that Node to XML String equivalent, stripping the root node <description>.
This is a complete example:
String xml = "<version>\r\n" + //
" <name>2.0.2</name>\r\n" + //
" <description>\r\n" + //
"-Stop hsql database after close fist <br />\r\n" + //
"-Check for null category name before adding it to the categories list <br />\r\n" + //
"-Fix NPE bug if there is no updates <br />\r\n" + //
"-add default value for variable, change read bytes filter, and description of propertyFile <br />\r\n" + //
"-Change HTTP web Proxy (the “qcProxy” field ) to http://web-proxy.isr.hp.com:8080 <br />\r\n" + //
"</description>\r\n" + //
" <fromversion>>=2.0</fromversion>\r\n" + //
"</version>";
DocumentBuilderFactory docFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder docBuilder = docFactory.newDocumentBuilder();
ByteArrayInputStream bis = new ByteArrayInputStream(xml.getBytes());
Document doc = docBuilder.parse(bis);
// XPath to retrieve the <version>/<description> tag
XPath xpath = XPathFactory.newInstance().newXPath();
XPathExpression expr = xpath.compile("/version/description");
Node descriptionNode = (Node) expr.evaluate(doc, XPathConstants.NODE);
// Transformer to convert the XML Node to String equivalent
Transformer transformer = TransformerFactory.newInstance().newTransformer();
transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
StringWriter sw = new StringWriter();
transformer.transform(new DOMSource(descriptionNode), new StreamResult(sw));
String description = sw.getBuffer().toString().replaceAll("</?description>", "");
System.out.println(description);
prints:
-Stop hsql database after close fist <br/>
-Check for null category name before adding it to the categories list <br/>
-Fix NPE bug if there is no updates <br/>
-add default value for variable, change read bytes filter, and description of propertyFile <br/>
-Change HTTP web Proxy (the “qcProxy” field ) to http://web-proxy.isr.hp.com:8080 <br/>
Edit 2
In order to have them all you need to get a NODESET of the different nodes and iterate over it to do the exact same operation as above.
// XPath to retrieve the content of the <version>/<description> tag
XPath xpath = XPathFactory.newInstance().newXPath();
XPathExpression expr = xpath.compile("//description");
NodeList descriptionNode = (NodeList) expr.evaluate(doc, XPathConstants.NODESET);
List<String> descriptions = new ArrayList<String>(); // hold all the descriptions as String
Transformer transformer = TransformerFactory.newInstance().newTransformer();
transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
for (int i = 0; i < descriptionNode.getLength(); ++i) {
Node descr = descriptionNode.item(i);
StringWriter sw = new StringWriter();
transformer.transform(new DOMSource(descr), new StreamResult(sw));
String description = sw.getBuffer().toString().replaceAll("</?description>", "");
descriptions.add(description);
}
// here you can do what you want with the List of Strings `description`
Normally in PHP, I would just parse the old document and write to the new document while ignoring the unwanted elements.
This was the first solution I came up with:
DocumentBuilder builder = DocumentBuilderFactory
.newInstance()
.newDocumentBuilder();
StringReader reader = new StringReader( xml );
Document document = builder.parse( new InputSource(reader) );
XPathExpression expr = XPathFactory
.newInstance()
.newXPath()
.compile( ... );
Object result = expr.evaluate(document, XPathConstants.NODESET);
Element el = document.getDocumentElement();
NodeList nodes = (NodeList) result;
for (int i = 0; i < nodes.getLength(); i++) {
el.removeChild( nodes.item(i) );
}
As you can see it's kinda long. Being a coder who strives for simplicity, I decided to take Ahmed's advice hoping I'll find a better solution and I came up with this:
List<?> elements = page.getByXPath( ... );
DomNode node = null;
for( Object o : elements ) {
node = (DomNode)o;
node.getParentNode().removeChild( node );
}
Please note these are just snippets, I omitted the imports and the XPath expressions but you get the idea.
Have a look at the DOM methods, you can remove nodes.
http://htmlunit.sourceforge.net/apidocs/com/gargoylesoftware/htmlunit/html/DomNode.html