How to Parse Complete XML into a string using DOM PARSER?

How to Parse Complete XML into a string using DOM PARSER? - java

For example:
If I pass tagValue=1 then it should return complete xml1 as a string to me String input is in some function.
String input = "<1><xml1></xml1></1><2><xml2></xml2><2>.......<10000><xml10000></xml10000></10000‌>";
String output = "<xml1></xml1>"; // for tagValue=1;

If the xml has a root element then it is doable
1. Parse the XML using dom parser.
2. Iterate through each node
3. Find the desired node.
4. Write the node in a different xml using transform
Sample code
Step 1: I used XML as a string, you can read from file.
DocumentBuilder dBuilder = DocumentBuilderFactory.newInstance()
.newDocumentBuilder();
InputSource is = new InputSource(new StringReader(uri));
Document doc = dBuilder.parse(is);
Iterate through each node
if (doc.hasChildNodes()) {
printNote(doc.getChildNodes(), doc);
}
Please put in your logic to iterate thorugh the nodes and find the right child node which you want to process.
Write back as xml. Here assumption is that tempNode is the one you want to write as XML.
TransformerFactory tFactory =
TransformerFactory.newInstance();
Transformer transformer;
try {
transformer = tFactory.newTransformer();
DOMSource source = new DOMSource(tempNode);
StreamResult result = new StreamResult(System.out);
transformer.transform(source, result);

You are not parsing XML, you are parsing a String, a replace should be enough in your particular case
String input = "<xml1><note><to>Tove</to><from>Jani</from<heading>Reminder</heading><body>Don't forget me this weekend</body></note><xml1>";
input = input.replace("<xml1>", "");

Related

Import and parse an xml file without FileOutputStream

Consider the code fragment that I have at the moment which works and the right elements are found and placed into my map:
public void importXml(InputSource emailAttach)throws Exception {
Map<String, String> hWL = new HashMap<String, String>();
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(emailAttach);
FileOutputStream fos=new FileOutputStream("temp.xml");
OutputStreamWriter os = new OutputStreamWriter(fos,"UTF-8");
// Transform to XML UTF-8 format
TransformerFactory tf = TransformerFactory.newInstance();
Transformer t = tf.newTransformer();
t.transform(new DOMSource(doc), new StreamResult(os));
os.close();
fos.close();
doc = db.parse(new File("temp.xml"));
NodeList nl = doc.getElementsByTagName("Email");
Element eE=(Element)nl.item(0);
int ctr=eE.getChildNodes().getLength();
String sNName;
String sNValue;
Node nTemp;
for (int i=0;i<ctr;i++){
nTemp=eE.getChildNodes().item(i);
sNName=nTemp.getNodeName().toUpperCase().trim();
if (nTemp.getChildNodes().item(0)!=null) {
sNValue=nTemp.getChildNodes().item(0).getNodeValue().trim();
hWL.put(sNName,sNValue);
}
}
}
However I prefer not to create a temp file first after converting the data to UTF-8 and parsing from the temp file. Is there anyway I can do this?
I've tried using a ByteArrayOutputStream in place of OutputStreamWriter, and calling toString() on the ByteArrayOutputStream as such:
doc = db.parse(bos.toString("UTF-8");
But then my Map ends up being empty.

From the API docs (the ability of its meticulous studying is a valuable asset for any programmer) - the parse method with the String argument seems to take something different from what you feed to it:
Document parse(String uri)
Parse the content of the given URI as an XML document and return a new DOM >Document object.
This might be your friend:
db.parse ( new ByteArrayInputStream( bos.toByteArray()));

Update
#user2496748 sorry I should have searched for the API but instead I was looking at the source code through a decompiler which tells me the parameter is arg0 instead of uri. Big difference.
I think I understand stream readers/writers and byte to char or vice versa a little more now.
After some review I was able to simply my code to this and achieve what I wanted to do. Since I am able to get the email attachment as a InputSource:
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
emailAttach.setEncoding("UTF-8");
Document doc = db.parse(emailAttach);
Works as well and tested with non-english characters.

You don't need to write and re-read and re-parse the transformed document. Just change this:
t.transform(new DOMSource(doc), new StreamResult(os));
to this:
DOMResult result = new DOMResult();
t.transform(new DOMSource(doc), result);
doc = (Document)result.getNode();
and then continue from after your present doc = db.parse(new File("temp.xml"));.

Work with raw text in javax.xml.transform.Transformer

While working with an XML document, I use strings that already contain XML entities and wish them to be inserted as-is. However, this happens instead:
String s = "This — That";
....
document.appendChild(document.createTextNode(s));
....
transformer.transform(new DOMSource(document), new StreamResult(stringWriter));
System.out.println(stringWriter.toString()); // outputs "This &mdash; That" at the relevant Node.
I have no control over the input string and I need exactly the output "This — That".
If I use StringEscapeUtils.unescapeHtml, the output is "This — That" which is not what I need.
I also tried several versions of transformer.setOutputProperty(OutputKeys.ENCODING, "encoding") but haven't found an encoding that converts "—" to "—".
What can I do to prevent javax.xml.transform.Transformer from re-escaping already correctly escaped text or how can I transform the input to get entities in the output?
Please explain how this is a duplicate.
The question referenced had the problem that "
" was being converted into CRLF because the entities were being resolved. The solution was to escape the entities.
My problem is the reverse. The text is already escaped and the transformer is re-escaping the text. "—" is outputting "&mdash;".
I cannot use the solution to post-convert all "&" -> "&" because not all nodes represent html.
More complete code:
TransformerFactory factory = TransformerFactory.newInstance();
Transformer t = factory.newTransformer();
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = dbFactory.newDocumentBuilder();
Document document = builder.newDocument();
Element rootElement = document.createElement("Test");
rootElement.appendChild(document.createTextNode("This — That");
document.appendChild(rootElement);
DOMImplementation domImpl = bgDoc.getImplementation();
DocumentType docType = domImpl.createDocumentType("Test",
"-//Company//program//language",
"test.dtd");
t.setOutputProperty(OutputKeys.DOCTYPE_PUBLIC, docType.getPublicId());
t.setOutputProperty(OutputKeys.DOCTYPE_SYSTEM, docType.getSystemId());
StringWriter writer = new StringWriter();
StreamResult rslt = new StreamResult(writer);
Source src = new DOMSource(document);
t.transform(src, rslt);
System.out.println(writer.toString());
// outputs xml header, then "<Test>This &mdash; That</Test>"

The fact is, once you have a DOM tree, there's no longer a string with —: it's instead represented internally as a Unicode string.
So, to input the raw string, you need to parse it to a Node, and to output, serialize a Node.
Regarding serialization, there are a few other questions including Change the com.sun.org.apache.xml.internal.serialize.XMLSerializer & com.sun.org.apache.xml.internal.serialize.OutputFormat .
To parse a single node, there is LSParser.parseWithContext.

Update XML attribute value in document tree

I am not too sure why i cannot modify an attribute to my xml. The code below i used to get the read the attributes from the XML. Pulls the attributes without any issues.
document = documentBuilder.parse(file);
NodeList sessionNodelist = document.getElementsByTagName("session");
if (sessionNodelist.getLength() > 0)
{
Element sessionElement = (Element) sessionNodelist.item(0);
String timeout = sessionElement.getAttribute("timeout");
String warning = sessionElement.getAttribute("warning");
}
Now when i go to set them, it doesn't work and I am not too sure why. The code is below. it's the exact same code i used to pull the atribles, but instead of the getAttribute i used setAttribute which takes two parameters. setAttribute(String name, String Value).
document = documentBuilder.parse(file);
NodeList sessionNodelist = document.getElementsByTagName("session");
if (sessionNodelist.getLength() > 0)
{
Element sessionElement = (Element) sessionNodelist.item(0);
sessionElement.setAttribute("timeout","12");
sessionElement.setAttribute("warning", "10");
}
Any ideas?

You need to write the document tree back to the XML file. See this page for how to write a DOM tree to a file.
You would use a javax.xml.transform.Transformer to write the object into the file as follows:
TransformerFactory transformerFactory = TransformerFactory.newInstance();
Transformer transformer = transformerFactory.newTransformer();
DOMSource source = new DOMSource(document);
StreamResult result = new StreamResult(file);
transformer.transform(source, result);

How to avoid encoding of <,>,& with Document.createTextNode

class XMLencode
{
public static void main(String[] args)
{
try{
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder docBuilder = factory.newDocumentBuilder();
Document doc = docBuilder.newDocument();
Element root = doc.createElement("roseindia");
doc.appendChild(root);
Text elmnt=doc.createTextNode("<data>sun</data><abcdefg/><end/>");
root.appendChild(elmnt);
TransformerFactory tranFactory = TransformerFactory.newInstance();
Transformer aTransformer = tranFactory.newTransformer();
Source src = new DOMSource(doc);
Result dest = new StreamResult(System.out);
aTransformer.transform(src, dest);
}catch(Exception e){
System.out.println(e.getMessage());
}
}
}
Here is my above piece of code.
The output generated is like this
<?xml version="1.0" encoding="UTF-8" standalone="no"?><roseindia><data>sun</data><abcdefg/><end/></roseindia>
I dont want the tags to be encoded. I need the output in this fashion.
<?xml version="1.0" encoding="UTF-8" standalone="no"?><roseindia><data>sun</data><abcdefg/><end/></roseindia>
Please help me on this.
Thanks,
Mohan

Short Answer
You could leverage the CDATA mechanism in XML to prevent characters from being escaped. Below is an example of the DOM code:
doc.createCDATASection("<foo/>");
The content will be:
<![CDATA[<foo/>]]>
LONG ANSWER
Below is a complete example of leveraging a CDATA section using the DOM APIs.
package forum12525152;
import javax.xml.parsers.*;
import javax.xml.transform.*;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import org.w3c.dom.*;
public class Demo {
public static void main(String[] args) throws Exception {
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document document = db.newDocument();
Element rootElement = document.createElement("root");
document.appendChild(rootElement);
// Create Element with a Text Node
Element fooElement = document.createElement("foo");
fooElement.setTextContent("<foo/>");
rootElement.appendChild(fooElement);
// Create Element with a CDATA Section
Element barElement = document.createElement("bar");
CDATASection cdata = document.createCDATASection("<bar/>");
barElement.appendChild(cdata);
rootElement.appendChild(barElement);
TransformerFactory tf = TransformerFactory.newInstance();
Transformer t = tf.newTransformer();
DOMSource source = new DOMSource(document);
StreamResult result = new StreamResult(System.out);
t.transform(source, result);
}
}
Output
Note the difference in the foo and bar elements even though they have similar content. I have formatted the result of running the demo code to make it more readable:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<root>
<foo><foo/></foo>
<bar><![CDATA[<bar/>]]></bar>
</root>

Instead of writing like this doc.createTextNode("<data>sun</data><abcdefg/><end/>");
You should create each element.
import javax.xml.parsers.*;
import javax.xml.transform.*;
import javax.xml.transform.dom.*;
import javax.xml.transform.stream.*;
import org.w3c.dom.*;
class XMLencode {
public static void main(String[] args) {
try {
DocumentBuilderFactory factory = DocumentBuilderFactory
.newInstance();
DocumentBuilder docBuilder = factory.newDocumentBuilder();
Document doc = docBuilder.newDocument();
Element root = doc.createElement("roseindia");
doc.appendChild(root);
Element data = doc.createElement("data");
root.appendChild(data);
Text elemnt = doc.createTextNode("sun");
data.appendChild(elemnt);
Element data1 = doc.createElement("abcdefg");
root.appendChild(data1);
//Text elmnt = doc.createTextNode("<data>sun</data><abcdefg/><end/>");
//root.appendChild(elmnt);
TransformerFactory tranFactory = TransformerFactory.newInstance();
Transformer aTransformer = tranFactory.newTransformer();
Source src = new DOMSource(doc);
Result dest = new StreamResult(System.out);
aTransformer.transform(src, dest);
} catch (Exception e) {
System.out.println(e.getMessage());
}
}
}

You can use the doc.createTextNode and use a workaround (long) for the escaped characters.
SOAPMessage msg = messageContext.getMessage();
header.setTextContent(seched);
Then use
Source src = msg.getSOAPPart().getContent();
To get the content, the transform it to string
TransformerFactory tf = TransformerFactory.newInstance();
Transformer transformer = tf.newTransformer();
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
transformer. setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
StreamResult result1 = new StreamResult(new StringWriter());
transformer.transform(src, result1);
Replace the string special characters
String xmlString = result1.getWriter().toString()
.replaceAll("<", "<").
replaceAll(">", ">");
System.out.print(xmlString);
the oposite string to dom with the fixed escaped characters
DocumentBuilder db = DocumentBuilderFactory.newInstance().newDocumentBuilder();
InputSource is = new InputSource();
is.setCharacterStream(new StringReader(xmlString));
Document doc = db.parse(is);
Source src123 = new DOMSource(doc);
Then set it back to the soap message
msg.getSOAPPart().setContent(src123);

Don't use createTextNode - the whole point of it is to insert some text (as data) into the document, not a fragment of raw XML.
Use a combination of createTextNode for the text and createElement for the elements.

I dont want the tags to be encoded. I need the output in this fashion.
Then you don't want a text node at all - which is why createTextNode isn't working for you. (Or rather, it's working fine - it's just not doing what you want). You should probably just parse your XML string, then import the document node from the result into your new document.
Of course, if you know the elements beforehand, don't express them as text in the first place - use a mixture of createElement, createAttribute, createTextNode and appendChild to create the structure.
It's entirely possible that something like JDOM will make this simpler, but that's the basic approach.

Mohan,
You can't use Document.createTextNode(). That methos transforms (or escapes) the charactes in your XML.
Instead, you need to build two separate Documents from the 2 XML's and use importNode.
I use Document.importNode() like this to solve my problem:
Build your builders:
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = dbf.newDocumentBuilder();
Document oldDoc = builder.parse(isOrigXml); //this is XML as InputSource
Document newDoc = builder.parse(isInsertXml); //this is XML as InputSource
Next, build a NodeList of the Element/Node you want to import. Create a Node from the NodeList. Create another Node of what you are going to import using importNode. Build the last Node of the final XML as such:
NodeList nl = newDoc.getElementByTagName("roseindia"); //or whatever the element name is
Node xmlToInsert = nl.item(0);
Node importNode = oldDoc.importNode(xmlToImport, true);
Node target = ((NodeList) oldDoc.getElementsByTagName("ELEMENT_NAME_OF_LOCATION")).item(0);
target.appendChild(importNode);
Source source = new DOMSource(target);
....
The rest is standard Transformer - StringWriter to StreamResult stuff to get the results.

Adding namespace to an already created XML document

I am creating a W3C Document object using a String value. Once I created the Document object, I want to add a namespace to the root element of this document. Here's my current code:
Document document = builder.parse(new InputSource(new StringReader(xmlString)));
document.getDocumentElement().setAttributeNS("http://com", "xmlns:ns2", "Test");
document.setPrefix("ns2");
TransformerFactory tranFactory = TransformerFactory.newInstance();
Transformer aTransformer = tranFactory.newTransformer();
Source src = new DOMSource(document);
Result dest = new StreamResult(new File("c:\\xmlFileName.xml"));
aTransformer.transform(src, dest);
What I use as input:
<product>
<arg0>DDDDDD</arg0>
<arg1>DDDD</arg1>
</product>
What the output should look like:
<ns2:product xmlns:ns2="http://com">
<arg0>DDDDDD</arg0>
<arg1>DDDD</arg1>
</ns2:product>
I need to add the prefix value and namespace also to the input xml string. If I try the above code I am getting this exception:
NAMESPACE_ERR: An attempt is made to create or change an object in a way which is incorrect with regard to namespaces.
Appreciate your help!

Since there is not an easy way to rename the root element, we'll have to replace it with an element that has the correct namespace and attribute, and then copy all the original children into it. Forcing the namespace declaration is not needed because by giving the element the correct namespace (URI) and setting the prefix, the declaration will be automatic.
Replace the setAttribute and setPrefix with this (line 2,3)
String namespace = "http://com";
String prefix = "ns2";
// Upgrade the DOM level 1 to level 2 with the correct namespace
Element originalDocumentElement = document.getDocumentElement();
Element newDocumentElement = document.createElementNS(namespace, originalDocumentElement.getNodeName());
// Set the desired namespace and prefix
newDocumentElement.setPrefix(prefix);
// Copy all children
NodeList list = originalDocumentElement.getChildNodes();
while(list.getLength()!=0) {
newDocumentElement.appendChild(list.item(0));
}
// Replace the original element
document.replaceChild(newDocumentElement, originalDocumentElement);
In the original code the author tried to declare an element namespace like this:
.setAttributeNS("http://com", "xmlns:ns2", "Test");
The first parameter is the namespace of the attribute, and since it's a namespace attribute it need to have the http://www.w3.org/2000/xmlns/ URI. The declared namespace should come into the 3rd parameter
.setAttributeNS("http://www.w3.org/2000/xmlns/", "xmlns:ns2", "http://com");

Bellow approach also works for me, but probably should not use in performance critical case.
Add name space to document root element as attribute.
Transform the document to XML string. The purpose of this step is to make the child element in the XML string inherit parent element namespace.
Now the xml string have name space.
You can use the XML string to build a document again or used for JAXB unmarshal, etc.
private static String addNamespaceToXml(InputStream in)
throws ParserConfigurationException, SAXException, IOException,
TransformerException {
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
/*
* Must not namespace aware, otherwise the generated XML string will
* have wrong namespace
*/
// dbf.setNamespaceAware(true);
DocumentBuilder db = dbf.newDocumentBuilder();
Document document = db.parse(in);
Element documentElement = document.getDocumentElement();
// Add name space to root element as attribute
documentElement.setAttribute("xmlns", "http://you_name_space");
String xml = transformXmlNodeToXmlString(documentElement);
return xml;
}
private static String transformXmlNodeToXmlString(Node node)
throws TransformerException {
TransformerFactory transFactory = TransformerFactory.newInstance();
Transformer transformer = transFactory.newTransformer();
StringWriter buffer = new StringWriter();
transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
transformer.transform(new DOMSource(node), new StreamResult(buffer));
String xml = buffer.toString();
return xml;
}

Partially gleaned from here, and also from a comment above, I was able to get it to work (transforming an arbitrary DOM Node and adding a prefix to it and all its children) thus:
private String addNamespacePrefix(Document doc, Node node) throws TransformerException {
Element mainRootElement = doc.createElementNS(
"http://abc.de/x/y/z", // namespace
"my-prefix:fake-header-element" // prefix to "register" it with the DOM so we don't get exceptions later...
);
List<Element> descendants = nodeListToArrayRecurse(node.getChildNodes()); // for some reason we have to grab all these before doing the first "renameNode" ... no idea why ...
mainRootElement.appendChild(node);
doc.renameNode(node, "http://abc.de/x/y/z", "my-prefix:" + node.getNodeName());
descendants.stream().forEach(c -> doc.renameNode(c, "http://abc.de/x/y/z", "my-prefix:" + c.getNodeName()));
}
private List<Element> nodeListToArrayRecurse(NodeList entryNodes) {
List<Element> allEntries = new ArrayList<>();
for (int i = 0; i < entryNodes.getLength(); i++) {
Node child = entryNodes.item(i);
if (child.getNodeType() == Node.ELEMENT_NODE) {
allEntries.add((Element) child);
allEntries.addAll(nodeListToArray(child.getChildNodes())); // recurse
} // ignore other [i.e. text] nodes https://stackoverflow.com/questions/14566596/loop-through-all-elements-in-xml-using-nodelist
}
return allEntries;
}
If it helps anybody. I then convert it to string, then manually remove the extra header and closing lines. What a pain, I must be doing something wrong...

This seems to be working for me, and it's much simpler than those answers provided:
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
document = builder.parse(new File(filename));
document.getDocumentElement().setAttributeNS("http://www.w3.org/2000/xmlns/", "xmlns:yourNamespace", "http://whatever/else");

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to Parse Complete XML into a string using DOM PARSER? - java

For example: If I pass tagValue=1 then it should return complete xml1 as a string to me String input is in some function. String input = "<1><xml1></xml1></1><2><xml2></xml2><2>.......<10000><xml10000></xml10000></10000‌>"; String output = "<xml1></xml1>"; // for tagValue=1;

You are not parsing XML, you are parsing a String, a replace should be enough in your particular case String input = "<xml1><note><to>Tove</to><from>Jani</from<heading>Reminder</heading><body>Don't forget me this weekend</body></note><xml1>"; input = input.replace("<xml1>", "");

Related

Import and parse an xml file without FileOutputStream

Work with raw text in javax.xml.transform.Transformer

Update XML attribute value in document tree

How to avoid encoding of <,>,& with Document.createTextNode

Adding namespace to an already created XML document

Categories

Resources

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to Parse Complete XML into a string using DOM PARSER? - java

For example: If I pass tagValue=1 then it should return complete xml1 as a string to me String input is in some function. String input = "<1><xml1></xml1></1><2><xml2></xml2><2>.......<10000><xml10000></xml10000></10000‌​>"; String output = "<xml1></xml1>"; // for tagValue=1;

You are not parsing XML, you are parsing a String, a replace should be enough in your particular case String input = "<xml1><note><to>Tove</to><from>Jani</from<heading>Reminder</heading><body>Don't forget me this weekend</body></note><xml1>"; input = input.replace("<xml1>", "");

Related

Import and parse an xml file without FileOutputStream

Work with raw text in javax.xml.transform.Transformer

Update XML attribute value in document tree

How to avoid encoding of <,>,& with Document.createTextNode

Adding namespace to an already created XML document

Categories

Resources

For example: If I pass tagValue=1 then it should return complete xml1 as a string to me String input is in some function. String input = "<1><xml1></xml1></1><2><xml2></xml2><2>.......<10000><xml10000></xml10000></10000‌>"; String output = "<xml1></xml1>"; // for tagValue=1;