Getting html string from xml document

Getting html string from xml document - java

I have the following xml:
<version>
<name>2.0.2</name>
<description>
-Stop hsql database after close fist <br />
-Check for null category name before adding it to the categories list <br />
-Fix NPE bug if there is no updates <br />
-add default value for variable, change read bytes filter, and description of propertyFile <br />
-Change HTTP web Proxy (the “qcProxy” field ) to http://web-proxy.isr.hp.com:8080 <br />
</description>
<fromversion>>=2.0</fromversion>
</version>
I want to return description tag string content using Java?

This is pretty standard Java XML parsing, you can find it anywhere on the internet, but it goes like this using XPath in standard JDK.
String xml = "your XML";
// load the XML as String into a DOM Document object
DocumentBuilderFactory docFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder docBuilder = docFactory.newDocumentBuilder();
ByteArrayInputStream bis = new ByteArrayInputStream(xml.getBytes());
Document doc = docBuilder.parse(bis);
// XPath to retrieve the content of the <version>/<description> tag
XPath xpath = XPathFactory.newInstance().newXPath();
XPathExpression expr = xpath.compile("/version/description");
Node description = (Node)expr.evaluate(doc, XPathConstants.NODE);
System.out.println("description: " + description.getTextContent());
Edit
Since you are having XML <br/> in your text content, it cannot be retrieved from Node.getTextContent(). One solution is to transform that Node to XML String equivalent, stripping the root node <description>.
This is a complete example:
String xml = "<version>\r\n" + //
" <name>2.0.2</name>\r\n" + //
" <description>\r\n" + //
"-Stop hsql database after close fist <br />\r\n" + //
"-Check for null category name before adding it to the categories list <br />\r\n" + //
"-Fix NPE bug if there is no updates <br />\r\n" + //
"-add default value for variable, change read bytes filter, and description of propertyFile <br />\r\n" + //
"-Change HTTP web Proxy (the “qcProxy” field ) to http://web-proxy.isr.hp.com:8080 <br />\r\n" + //
"</description>\r\n" + //
" <fromversion>>=2.0</fromversion>\r\n" + //
"</version>";
DocumentBuilderFactory docFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder docBuilder = docFactory.newDocumentBuilder();
ByteArrayInputStream bis = new ByteArrayInputStream(xml.getBytes());
Document doc = docBuilder.parse(bis);
// XPath to retrieve the <version>/<description> tag
XPath xpath = XPathFactory.newInstance().newXPath();
XPathExpression expr = xpath.compile("/version/description");
Node descriptionNode = (Node) expr.evaluate(doc, XPathConstants.NODE);
// Transformer to convert the XML Node to String equivalent
Transformer transformer = TransformerFactory.newInstance().newTransformer();
transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
StringWriter sw = new StringWriter();
transformer.transform(new DOMSource(descriptionNode), new StreamResult(sw));
String description = sw.getBuffer().toString().replaceAll("</?description>", "");
System.out.println(description);
prints:
-Stop hsql database after close fist <br/>
-Check for null category name before adding it to the categories list <br/>
-Fix NPE bug if there is no updates <br/>
-add default value for variable, change read bytes filter, and description of propertyFile <br/>
-Change HTTP web Proxy (the “qcProxy” field ) to http://web-proxy.isr.hp.com:8080 <br/>
Edit 2
In order to have them all you need to get a NODESET of the different nodes and iterate over it to do the exact same operation as above.
// XPath to retrieve the content of the <version>/<description> tag
XPath xpath = XPathFactory.newInstance().newXPath();
XPathExpression expr = xpath.compile("//description");
NodeList descriptionNode = (NodeList) expr.evaluate(doc, XPathConstants.NODESET);
List<String> descriptions = new ArrayList<String>(); // hold all the descriptions as String
Transformer transformer = TransformerFactory.newInstance().newTransformer();
transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
for (int i = 0; i < descriptionNode.getLength(); ++i) {
Node descr = descriptionNode.item(i);
StringWriter sw = new StringWriter();
transformer.transform(new DOMSource(descr), new StreamResult(sw));
String description = sw.getBuffer().toString().replaceAll("</?description>", "");
descriptions.add(description);
}
// here you can do what you want with the List of Strings `description`

Related

Split XML into smaller chunks based on the id of the grandchild

I have an xml that should be split into smaller chunks by unique BookId node. Basically I need to filter out each book into separate xml having the same structure of the initial XML.
The purpose of that is - requirement to validate each smaller XML against XSD to determine which Book/PendingBook is not valid.
Note that Books node can contain both Book and PendingBook nodes.
Initial XML:
<Main xmlns="http://some/url/name">
<Books>
<Book>
<IdentifyingInformation>
<ID>
<Year>2021</Year>
<BookId>001</BookId>
<BookDateTime>2021-05-10T12:35:00</BookDateTime>
</ID>
</IdentifyingInformation>
</Book>
<Book>
<IdentifyingInformation>
<ID>
<Year>2020</Year>
<BookId>002</BookId>
<BookDateTime>2021-05-10T12:35:00</BookDateTime>
</ID>
</IdentifyingInformation>
</Book>
<PendingBook>
<IdentifyingInformation>
<ID>
<Year>2020</Year>
<BookId>003</BookId>
<BookDateTime>2021-05-10T12:35:00</BookDateTime>
</ID>
</IdentifyingInformation>
</PendingBook>
<OtherInfo>...</OtherInfo>
</Books>
</Main>
The result should be like next xmls:
Book_001.xml (BookId = 001):
<Main xmlns="http://some/url/name">
<Books>
<Book>
<IdentifyingInformation>
<ID>
<Year>2021</Year>
<BookId>001</BookId>
<BookDateTime>2021-05-10T12:35:00</BookDateTime>
</ID>
</IdentifyingInformation>
</Book>
<OtherInfo>...</OtherInfo>
</Books>
</Main>
Book_002.xml (BookId = 002):
<Main xmlns="http://some/url/name">
<Books>
<Book>
<IdentifyingInformation>
<ID>
<Year>2020</Year>
<BookId>002</BookId>
<BookDateTime>2021-05-10T12:35:00</BookDateTime>
</ID>
</IdentifyingInformation>
</Book>
<OtherInfo>...</OtherInfo>
</Books>
</Main>
PendingBook_003.xml (BookId = 003):
<Main xmlns="http://some/url/name">
<Books>
<PendingBook>
<IdentifyingInformation>
<ID>
<Year>2021</Year>
<BookId>003</BookId>
<BookDateTime>2021-05-10T12:35:00</BookDateTime>
</ID>
</IdentifyingInformation>
</PendingBook>
<OtherInfo>...</OtherInfo>
</Books>
</Main>
So far I fetched only each ID node into smaller xmls. And created root element manually.
Ideally I want to copy all elements from initial xml and put into Books node single Book/PendingBook node.
My java sample:
package com.main;
import java.io.File;
import java.util.ArrayList;
import java.util.List;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.transform.OutputKeys;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathExpression;
import javax.xml.xpath.XPathFactory;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
public class ExtractXmls {
/**
* #param args
*/
public static void main(String[] args) throws Exception
{
String inputFile = "C:/pathToXML/Main.xml";
File xmlFile = new File(inputFile);
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.parse(xmlFile);
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(true); // never forget this!
XPathFactory xfactory = XPathFactory.newInstance();
XPath xpath = xfactory.newXPath();
XPathExpression allBookIdsExpression = xpath.compile("//Books/*/IdentifyingInformation/ID/BookId/text()");
NodeList bookIdNodes = (NodeList) allBookIdsExpression.evaluate(doc, XPathConstants.NODESET);
//Save all the products
List<String> bookIds = new ArrayList<>();
for (int i = 0; i < bookIdNodes.getLength(); ++i) {
Node bookId = bookIdNodes.item(i);
System.out.println(bookId.getTextContent());
bookIds.add(bookId.getTextContent());
}
//Now we create and save split XMLs
for (String bookId : bookIds)
{
//With such query I can find node based on bookId
String xpathQuery = "//ID[BookId='" + bookId + "']";
xpath = xfactory.newXPath();
XPathExpression query = xpath.compile(xpathQuery);
NodeList bookIdNodesFiltered = (NodeList) query.evaluate(doc, XPathConstants.NODESET);
System.out.println("Found " + bookIdNodesFiltered.getLength() + " bookId(s) for bookId " + bookId);
//We store the new XML file in bookId.xml e.g. 001.xml
Document aamcIdXml = dBuilder.newDocument();
Element root = aamcIdXml.createElement("Main"); //Here I'm recreating root element (don't know if I can avoid it and copy somehow structure of initial xml)
aamcIdXml.appendChild(root);
for (int i = 0; i < bookIdNodesFiltered.getLength(); i++) {
Node node = bookIdNodesFiltered.item(i);
Node copyNode = aamcIdXml.importNode(node, true);
root.appendChild(copyNode);
}
//At the end, we save the file XML on disk
TransformerFactory transformerFactory = TransformerFactory.newInstance();
Transformer transformer = transformerFactory.newTransformer();
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
DOMSource source = new DOMSource(aamcIdXml);
StreamResult result = new StreamResult(new File("C:/pathToXML/" + bookId.trim() + ".xml"));
transformer.transform(source, result);
System.out.println("Done for " + bookId);
}
}
}

Consider XSLT, the special purpose language designed to transform XML files including extracting needed nodes. Additionally, you can pass parameters from application layer like Java into XSLT (just like SQL)!
Specifically, iteratively passed in the XPath retrieved BookIds by Java into XSLT named param. By the way, no extensive code re-factoring is needed since you already have the transformer set up to run XSLT!
XSLT (save as .xsl, a special .xml)
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output indent="yes" encoding="UTF-8"/>
<xsl:strip-space elements="*"/>
<!-- INITIALIZE PARAMETER -->
<xsl:param name="param_bookId"/>
<!-- IDENTITY TRANSFORM -->
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="Books">
<xsl:copy>
<xsl:apply-templates select="Book[descendant::BookId = $param_bookId] |
PendingBook[descendant::BookId = $param_bookId]"/>
<xsl:apply-templates select="OtherInfo"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
Online Demo
Java (no rebuild of trees)
// ... same code as reading XML input ...
// ... same code as creating bookIdNodes ...
String curr_bookId = null;
String outputXML = null;
String xslFile = "C:/Path/To/XSL/Style.xsl";
Source xslt = new StreamSource(new File(xslFile));
// ITERATE THROUGH EACH BOOK ID
for (int i = 0; i < bookIdNodes.getLength(); ++i) {
Node bookId = bookIdNodes.item(i);
System.out.println(bookId.getTextContent());
curr_bookId = bookId.getTextContent();
// CONFIGURE TRANSFORMER
TransformerFactory prettyPrint = TransformerFactory.newInstance();
Transformer transformer = prettyPrint.newTransformer(xslt);
transformer.setParameter("param_bookId", curr_bookId); // PASS PARAM
transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "no");
transformer.setOutputProperty(OutputKeys.METHOD, "xml");
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "4");
// TRANSFORM AND OUTPUT FILE TO DISK
outputXML = "C:/Path/To/XML/BookId_" + curr_bookId + ".xml";
DOMSource source = new DOMSource(doc);
StreamResult result = new StreamResult(new File(outputXML));
transformer.transform(source, result);
}

You almost got it to work. You could change your XPath in your loop iterating the book IDs to get the Book or PendingBook Element and then use it. Also you need to create Books element in addition to Main and append Book or PendingBook to the newly created Books Element.
The XPath is : //ancestor::*[IdentifyingInformation/ID/BookId=bookId]
It gets the ancestor of the element whose bookId matches to that of the ID in the current iteration i.e. the Book or PendingBook element.
//Now we create and save split XMLs
for (String bookId : bookIds)
{
//With such query I can find node based on bookId
String xpathQuery = "//ancestor::*[IdentifyingInformation/ID/BookId=" + bookId + "]";
xpath = xfactory.newXPath();
XPathExpression query = xpath.compile(xpathQuery);
NodeList bookIdNodesFiltered = (NodeList) query.evaluate(doc, XPathConstants.NODESET);
System.out.println("Found " + bookIdNodesFiltered.getLength() + " bookId(s) for bookId " + bookId);
//We store the new XML file in bookId.xml e.g. 001.xml
Document aamcIdXml = dBuilder.newDocument();
Element root = aamcIdXml.createElement("Main");
Element booksNode = aamcIdXml.createElement("Books");
root.appendChild(booksNode);
//Here I'm recreating root element (don't know if I can avoid it and copy somehow structure of initial xml)
aamcIdXml.appendChild(root);
String bookName = "";
for (int i = 0; i < bookIdNodesFiltered.getLength(); i++) {
Node node = bookIdNodesFiltered.item(i);
Node copyNode = aamcIdXml.importNode(node, true);
bookName = copyNode.getNodeName();
booksNode.appendChild(copyNode);
}
//At the end, we save the file XML on disk
TransformerFactory transformerFactory = TransformerFactory.newInstance();
Transformer transformer = transformerFactory.newTransformer();
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
DOMSource source = new DOMSource(aamcIdXml);
StreamResult result = new StreamResult(new File(bookName + "_" + bookId.trim() + ".xml"));
transformer.transform(source, result);
System.out.println("Done for " + bookId);
}
And also I modified code to name the file as you needed like Book_001.xml.

get a specific value from an xml

I am trying to get a specific value from an xml. When I iterate over the nodes, the value is never returned. Here is the xml sample
<Fields>
<Field FieldName="NUMBER">
<String>1234</String>
</Field>
<Field FieldName="TYPE">
<String>JAVA</String>
</Field>
<Field FieldName="ATYPE">
<String>BB</String>
</Field>
</Fields>
Here is what I have attempted based on this online resource that looks like my sample xml file
private static void updateElementValue(Document doc) {
NodeList employees = doc.getElementsByTagName("Field");
Element emp = null;
//loop for each
for(int i=0; i<employees.getLength();i++){
emp = (Element) employees.item(i);
System.out.println("here is the emp " + emp);
Node name = emp.getElementsByTagName("NUMBER").item(0).getFirstChild();
name.setNodeValue(name.getNodeValue().toUpperCase());
}
}
This is the online resource guiding my attempts
https://www.journaldev.com/901/modify-xml-file-in-java-dom-parser
Please assist

If you want to get a specific value from XML, XPath API may be more convenient in compare to DOM parser API. Here an example for retrieving value of a "String" elements, which are children of "Field" elements, having attribute "FieldName" with value "NUMBER":
// parse XML document from file
DocumentBuilder db = DocumentBuilderFactory.newInstance().newDocumentBuilder();
Document doc = db.parse(new FileInputStream(fileName));
// prepare an XPath expression
XPathExpression xpath = XPathFactory.newInstance().newXPath().compile("/Fields/Field[#FieldName='NUMBER']/String");
// retrieve from XML nodes using XPath
NodeList list = (NodeList)xpath.evaluate(doc, XPathConstants.NODESET);
// iterate over resulting nodes and retrieve their values
for(int i = 0; i < list.getLength(); i ++) {
Node node = list.item(i);
// udate node content
node.setTextContent("New text");
}
// output edited XML document
StringWriter writer = new StringWriter(); // Use FileWriter to output to the file
TransformerFactory tf = TransformerFactory.newInstance();
Transformer transformer = tf.newTransformer();
transformer.transform(new DOMSource(doc), new StreamResult(writer));
System.out.println(writer.toString());

how to parse xml to java in nodelist

that is my xml
<?xml version = "1.0" encoding = "UTF-8"?>
<ns0:GetADSLProfileResponse xmlns:ns0 = "http://">
<ns0:Result>
<ns0:eCode>0</ns0:eCode>
<ns0:eDesc>Success</ns0:eDesc>
</ns0:Result>
</ns0:GetADSLProfileResponse>
that is my code in java I need to know how to start in this
I tried some code online but still did not solve my problem
how to get the values in the result to loop in it and get 0 in ecode and Success in eDesc
CustomerProfileResult pojo = new CustomerProfileResult();
String body = readfile();
System.out.println(body);
try {
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document dom = db.parse(new InputSource(new StringReader(body)));
XPath xpath =XPathFactory.newInstance().newXPath();
XPathExpression name = xpath.compile("/xml/GetADSLProfileResponse/Result");
NodeList nodeName = (NodeList) name.evaluate(dom, XPathConstants.NODESET);
if(nodeName!=null){
}

Summary
You can try to following expression which allows you to select nodes without caring the namespace ns0:
/*[local-name()='GetADSLProfileResponse']/*[local-name()='Result']/*
Explanation
In your syntax, several parts were incorrect. Let's take a look together. XPath syntax /xml means that the root node of the document is <xml>, but the root element is <ns0:GetADSLProfileResponse>; GetADSLProfileResponse is incorrect too, because your XML file contains a namespace. Same for Result:
/xml/GetADSLProfileResponse/Result
In my solution, I ignored the namespace, because your namespace provided is incomplet. Here's a full program to get started:
String XML =
"<?xml version = \"1.0\" encoding = \"UTF-8\"?>\n"
+ "<ns0:GetADSLProfileResponse xmlns:ns0 = \"http://\">\n"
+ " <ns0:Result>\n"
+ " <ns0:eCode>0</ns0:eCode>\n"
+ " <ns0:eDesc>Success</ns0:eDesc>\n"
+ " </ns0:Result>\n"
+ "</ns0:GetADSLProfileResponse> ";
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document document;
try (InputStream in = new ByteArrayInputStream(XML.getBytes(StandardCharsets.UTF_8))) {
document = builder.parse(in);
}
XPath xPath = XPathFactory.newInstance().newXPath();
XPathExpression expr = xPath.compile("/*[local-name()='GetADSLProfileResponse']/*[local-name()='Result']/*");
NodeList nodeList = (NodeList) expr.evaluate(document, XPathConstants.NODESET);
for (int i = 0; i < nodeList.getLength(); i++) {
Node node = nodeList.item(i);
System.out.println(node.getNodeName() + ": " + node.getTextContent());
}
It prints:
ns0:eCode: 0
ns0:eDesc: Success
See also:
How to query XML using namespaces in Java with XPath?
Node (Java Platform SE 8)

How to get xml attribute values using Document builder factory

How to get attribute values by using the following code i am getting ; as output for msg . I want to print MSID,type,CHID,SPOS,type,PPOS values can any one solve this issue .
String xml1="<message MSID='20' type='2635'>"
+"<che CHID='501' SPOS='2'>"
+"<pds type='S'>"
+"<position PPOS='S01'/>"
+"</pds>"
+"</che>"
+"</message>";
InputSource source = new InputSource(new StringReader(xml1));
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document document = db.parse(source);
XPathFactory xpathFactory = XPathFactory.newInstance();
XPath xpath = xpathFactory.newXPath();
String msg = xpath.evaluate("/message/che/CHID", document);
String status = xpath.evaluate("/pds/position/PPOS", document);
System.out.println("msg=" + msg + ";" + "status=" + status);

You need to use # in your XPath for an attribute, and also your path specifier for the second element is wrong:
String msg = xpath.evaluate("/message/che/#CHID", document);
String status = xpath.evaluate("/message/che/pds/position/#PPOS", document);
With those changes, I get an output of:
msg=501;status=S01

You can use Document.getDocumentElement() to get the root element and Element.getElementsByTagName() to get child elements:
Document document = db.parse(source);
Element docEl = document.getDocumentElement(); // This is <message>
String msid = docEl.getAttribute("MSID");
String type = docEl.getAttribute("type");
Element position = (Element) docEl.getElementsByTagName("position").item(0);
String ppos = position.getAttribute("PPOS");
System.out.println(msid); // Prints "20"
System.out.println(type); // Prints "2635"
System.out.println(ppos); // Prints "S01"

Parse XML with XPath & namespaces in Java

Can you help me adjust this code so it manages to parse the XML? If I drop the XML namespace it works:
String webXmlContent = "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n" +
"<foo xmlns=\"http://foo.bar/boo\"><bar>baz</bar></foo>";
DocumentBuilderFactory domFactory = DocumentBuilderFactory.newInstance();
domFactory.setNamespaceAware(true);
DocumentBuilder builder = domFactory.newDocumentBuilder();
org.w3c.dom.Document doc = builder.parse(new StringInputStream(webXmlContent));
NamespaceContextImpl namespaceContext = new NamespaceContextImpl();
namespaceContext.startPrefixMapping("foo", "http://www.w3.org/2001/XMLSchema-instance");
XPath xpath = XPathFactory.newInstance().newXPath();
xpath.setNamespaceContext(namespaceContext);
XPathExpression expr = xpath.compile("/foo/bar");
Object result = expr.evaluate(doc, XPathConstants.NODESET);
NodeList nodes = (NodeList) result;
System.out.println("Got " + nodes.getLength() + " nodes");

You must use a prefix in your XPath, e. g.: "/my:foo/my:bar" You can choose any prefix you like - it doesn't have anything to do with the prefixes you use or don't use in the XML file - but you must choose one. This is a limitation of XPath 1.0.
You must perform prefix mapping from "my" to "http://foo.bar/boo" (not to "http://www.w3.org/2001/XMLSchema-instance")

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Getting html string from xml document - java

Related

Split XML into smaller chunks based on the id of the grandchild

get a specific value from an xml

how to parse xml to java in nodelist

How to get xml attribute values using Document builder factory

Parse XML with XPath & namespaces in Java

Categories

Resources