Cloning dom.Document object

Cloning dom.Document object - java

My purpose is to read xml file into Dom object, edit the dom object, which involves removing some nodes.
After this is done i wish to restore the Dom to its original state without actually parsing the XML file.
Is there anyway i can clone the dom object i obtained after parsing the xml file for the first time. the idea is to avoid reading and parsing xml all the time, just keep a copy of original dom tree.

You could use importNode API on org.w3c.dom.Document:
Node copy = document.importNode(node, true);
Full Example
import java.io.File;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import org.w3c.dom.Document;
import org.w3c.dom.Node;
public class Demo {
public static void main(String[] args) throws Exception {
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document originalDocument = db.parse(new File("input.xml"));
Node originalRoot = originalDocument.getDocumentElement();
Document copiedDocument = db.newDocument();
Node copiedRoot = copiedDocument.importNode(originalRoot, true);
copiedDocument.appendChild(copiedRoot);
}
}

TransformerFactory tfactory = TransformerFactory.newInstance();
Transformer tx = tfactory.newTransformer();
DOMSource source = new DOMSource(doc);
DOMResult result = new DOMResult();
tx.transform(source,result);
return (Document)result.getNode();
This would be the Java 1.5 solution for making a copy of the DOM document. Take a look at Transformer Factory and Transformer

you could clone a tree or only the node with DOMs cloneNode(boolean isDeepCopy) API.
Document originalDoc = parseDoc();
Document clonedDoc = originalDoc.cloneNode(true);
unfortunately, since cloneNode() on Document is (according to API) implementation specific, we have to go for a bullet-proof way, that is, create a new Document and import cloned node's from the original document:
...
Document clonedDoc = documentFactory.newDocument();
cloneDoc.appendChild(
cloneDoc.importNode(originalDoc.getDocumentElement(), true)
);
note that none of operations are thread-safe, so either use them only locally, or Thread-Local or synchronize them.

I would stick with the second suggestion with TransformerFactory.
With importNode you don't get a full copy of the document.
The header isn't copied.
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?aid style="50" type="snippet" readerVersion="6.0" featureSet="257" product="8.0(370)" ?>
<?aid SnippetType="PageItem"?><Document DOMVersion="8.0" Self="d">
This would not return the above because this isn't copied. It's will be using what ever your new document contain.
<?xml version="1.0" encoding="UTF-8" standalone="no"?>

Related

Link XML and XSD using java

i'm trying to write the header for an xml file so it would be something like this:
<file xmlns="http://my_namespace"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://my_namespace file.xsd">
however, I can't seem to find how to do it using the Document class in java. This is what I have:
public void exportToXML() {
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder;
try {
dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.newDocument();
doc.setXmlStandalone(true);
doc.createTextNode("<file xmlns=\"http://my_namespace"\n" +
"xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\"\n" +
"xsi:schemaLocation=\"http://my_namespace file.xsd\">");
Element mainRootElement = doc.createElement("MainRootElement");
doc.appendChild(mainRootElement);
for(int i = 0; i < tipoDadosParaExportar.length; i++) {
mainRootElement.appendChild(criarFilhos(doc, tipoDadosParaExportar[i]));
}
Transformer tr = TransformerFactory.newInstance().newTransformer();
tr.transform(new DOMSource(doc),
new StreamResult(new FileOutputStream(filename)));
} catch (Exception e) {
e.printStackTrace();
}
}
I tried writing it on the file using the createTextNode but it didn't work either, it only writes the version before showing the elements.
PrintStartXMLFile
Would appreciate if you could help me. Have a nice day

Your createTextNode() method is only suitable for creating text nodes, it's not suitable for creating elements. You need to use createElement() for this. If you're doing this by building a tree, then you need to build nodes, you can't write lexical markup.
I'm not sure what MainRootElement is supposed to be; you've only given a fragment of your desired output so it's hard to tell.
Creating a DOM tree and then serializing it is a pretty laborious way of constructing an XML file. Using something like an XMLEventWriter is easier. But to be honest, I got frustrated by all the existing approaches and wrote a new library for the purpose as part of Saxon 10. It's called simply "Push", and looks something like this:
Processor proc = new Processor();
Serializer serializer = proc.newSerializer(new File(fileName));
Push push = proc.newPush(serializer);
Document doc = push.document(true);
doc.setDefaultNamespace("http://my_namespace");
Element root = doc.element("root")
.attribute(new QName("xsi", "http://www.w3.org/2001/XMLSchema-instance", "schemaLocation"),
"http://my_namespace file.xsd");
doc.close();

xquery transformation creates empty namespace in element

I'm sorry but I guess I just don't see the mistake I'm making here.
I have a camel route which returns an XML and to be able to test the output I wrote a JUnit Test which runs with SpringRunner. There I get the XML Stream from the exchange which I validate against an XSD. This works great because the XSD throws an exception because the output XML is not valid, but I don't understand why the following xquery generates an element with EMPTY NAMESPACE?
See the xquery snippet (I'm sorry again I cannot provide more code):
declare default element namespace "http://www.dppgroup.com/XXXPMS";
let $cmmdoc := $doc/*:cmmdoc
, $partner := $doc/*:cmmdoc/*:information/*:partner_gruppe/*:partner
, $sequence:= fn:substring($cmmdoc/#unifier,3)
return <ClientMMS xmlns:infra="http://www.dppgroup.com/InfraNS">
{
for $x in $partner
where $x[#partnerStatusCode = " "]
return
element {"DataGroup" } {
<Client sequenceNumber="{$sequence}" />
}
}
My problem is, that with this code the resulting XML contains the DataGroup-element with the following namespace definition:
<?xml version="1.0" encoding="UTF-8"?>
<ClientMMS xmlns="http://www.dppgroup.com/XXXPMS"
xmlns:infra="http://www.dppgroup.com/InfraNS">
<DataGroup xmlns="">
<Client sequenceNumber="170908065609671475"/>
</DataGroup>
</ClientMMS>
The snippet from the Unit-Test: I'm using jdk1.8_102
String xml = TestDataReader.readXML("/input/info/info_in.xml", PROJECT_ENCODING);
quelle.sendBody(xml);
boolean valid = false;
try {
DocumentBuilder documentBuilder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
ByteArrayInputStream byteArrayInputStream = new ByteArrayInputStream((byte[]) archiv.getExchanges().get(1).getIn().getBody());
Document document = documentBuilder.parse(byteArrayInputStream);
Transformer transformer = TransformerFactory.newInstance().newTransformer();
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
StreamResult result = new StreamResult(new StringWriter());
DOMSource source = new DOMSource(document);
transformer.transform(source, result);
String xmlString = result.getWriter().toString();
System.out.println(xmlString);
In no XQuery introduction/tutorial/explanation I can find a reason why this happens. Can you guys please explain why the DataGroup element is not in the default namespace?

The XQuery you posted should create the result fine without the namespace undeclaration you show.
In your Java code if you want to work with XML with namespaces make sure you use a namespace aware DocumentBuilder, as the default DocumentBuilderFactory is not namespace aware make sure you set setNamespaceAware(true) on the factory before creating a DocumentBuilder with it.

Import and parse an xml file without FileOutputStream

Consider the code fragment that I have at the moment which works and the right elements are found and placed into my map:
public void importXml(InputSource emailAttach)throws Exception {
Map<String, String> hWL = new HashMap<String, String>();
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(emailAttach);
FileOutputStream fos=new FileOutputStream("temp.xml");
OutputStreamWriter os = new OutputStreamWriter(fos,"UTF-8");
// Transform to XML UTF-8 format
TransformerFactory tf = TransformerFactory.newInstance();
Transformer t = tf.newTransformer();
t.transform(new DOMSource(doc), new StreamResult(os));
os.close();
fos.close();
doc = db.parse(new File("temp.xml"));
NodeList nl = doc.getElementsByTagName("Email");
Element eE=(Element)nl.item(0);
int ctr=eE.getChildNodes().getLength();
String sNName;
String sNValue;
Node nTemp;
for (int i=0;i<ctr;i++){
nTemp=eE.getChildNodes().item(i);
sNName=nTemp.getNodeName().toUpperCase().trim();
if (nTemp.getChildNodes().item(0)!=null) {
sNValue=nTemp.getChildNodes().item(0).getNodeValue().trim();
hWL.put(sNName,sNValue);
}
}
}
However I prefer not to create a temp file first after converting the data to UTF-8 and parsing from the temp file. Is there anyway I can do this?
I've tried using a ByteArrayOutputStream in place of OutputStreamWriter, and calling toString() on the ByteArrayOutputStream as such:
doc = db.parse(bos.toString("UTF-8");
But then my Map ends up being empty.

From the API docs (the ability of its meticulous studying is a valuable asset for any programmer) - the parse method with the String argument seems to take something different from what you feed to it:
Document parse(String uri)
Parse the content of the given URI as an XML document and return a new DOM >Document object.
This might be your friend:
db.parse ( new ByteArrayInputStream( bos.toByteArray()));

Update
#user2496748 sorry I should have searched for the API but instead I was looking at the source code through a decompiler which tells me the parameter is arg0 instead of uri. Big difference.
I think I understand stream readers/writers and byte to char or vice versa a little more now.
After some review I was able to simply my code to this and achieve what I wanted to do. Since I am able to get the email attachment as a InputSource:
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
emailAttach.setEncoding("UTF-8");
Document doc = db.parse(emailAttach);
Works as well and tested with non-english characters.

You don't need to write and re-read and re-parse the transformed document. Just change this:
t.transform(new DOMSource(doc), new StreamResult(os));
to this:
DOMResult result = new DOMResult();
t.transform(new DOMSource(doc), result);
doc = (Document)result.getNode();
and then continue from after your present doc = db.parse(new File("temp.xml"));.

How to avoid encoding of <,>,& with Document.createTextNode

class XMLencode
{
public static void main(String[] args)
{
try{
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder docBuilder = factory.newDocumentBuilder();
Document doc = docBuilder.newDocument();
Element root = doc.createElement("roseindia");
doc.appendChild(root);
Text elmnt=doc.createTextNode("<data>sun</data><abcdefg/><end/>");
root.appendChild(elmnt);
TransformerFactory tranFactory = TransformerFactory.newInstance();
Transformer aTransformer = tranFactory.newTransformer();
Source src = new DOMSource(doc);
Result dest = new StreamResult(System.out);
aTransformer.transform(src, dest);
}catch(Exception e){
System.out.println(e.getMessage());
}
}
}
Here is my above piece of code.
The output generated is like this
<?xml version="1.0" encoding="UTF-8" standalone="no"?><roseindia><data>sun</data><abcdefg/><end/></roseindia>
I dont want the tags to be encoded. I need the output in this fashion.
<?xml version="1.0" encoding="UTF-8" standalone="no"?><roseindia><data>sun</data><abcdefg/><end/></roseindia>
Please help me on this.
Thanks,
Mohan

Short Answer
You could leverage the CDATA mechanism in XML to prevent characters from being escaped. Below is an example of the DOM code:
doc.createCDATASection("<foo/>");
The content will be:
<![CDATA[<foo/>]]>
LONG ANSWER
Below is a complete example of leveraging a CDATA section using the DOM APIs.
package forum12525152;
import javax.xml.parsers.*;
import javax.xml.transform.*;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import org.w3c.dom.*;
public class Demo {
public static void main(String[] args) throws Exception {
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document document = db.newDocument();
Element rootElement = document.createElement("root");
document.appendChild(rootElement);
// Create Element with a Text Node
Element fooElement = document.createElement("foo");
fooElement.setTextContent("<foo/>");
rootElement.appendChild(fooElement);
// Create Element with a CDATA Section
Element barElement = document.createElement("bar");
CDATASection cdata = document.createCDATASection("<bar/>");
barElement.appendChild(cdata);
rootElement.appendChild(barElement);
TransformerFactory tf = TransformerFactory.newInstance();
Transformer t = tf.newTransformer();
DOMSource source = new DOMSource(document);
StreamResult result = new StreamResult(System.out);
t.transform(source, result);
}
}
Output
Note the difference in the foo and bar elements even though they have similar content. I have formatted the result of running the demo code to make it more readable:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<root>
<foo><foo/></foo>
<bar><![CDATA[<bar/>]]></bar>
</root>

Instead of writing like this doc.createTextNode("<data>sun</data><abcdefg/><end/>");
You should create each element.
import javax.xml.parsers.*;
import javax.xml.transform.*;
import javax.xml.transform.dom.*;
import javax.xml.transform.stream.*;
import org.w3c.dom.*;
class XMLencode {
public static void main(String[] args) {
try {
DocumentBuilderFactory factory = DocumentBuilderFactory
.newInstance();
DocumentBuilder docBuilder = factory.newDocumentBuilder();
Document doc = docBuilder.newDocument();
Element root = doc.createElement("roseindia");
doc.appendChild(root);
Element data = doc.createElement("data");
root.appendChild(data);
Text elemnt = doc.createTextNode("sun");
data.appendChild(elemnt);
Element data1 = doc.createElement("abcdefg");
root.appendChild(data1);
//Text elmnt = doc.createTextNode("<data>sun</data><abcdefg/><end/>");
//root.appendChild(elmnt);
TransformerFactory tranFactory = TransformerFactory.newInstance();
Transformer aTransformer = tranFactory.newTransformer();
Source src = new DOMSource(doc);
Result dest = new StreamResult(System.out);
aTransformer.transform(src, dest);
} catch (Exception e) {
System.out.println(e.getMessage());
}
}
}

You can use the doc.createTextNode and use a workaround (long) for the escaped characters.
SOAPMessage msg = messageContext.getMessage();
header.setTextContent(seched);
Then use
Source src = msg.getSOAPPart().getContent();
To get the content, the transform it to string
TransformerFactory tf = TransformerFactory.newInstance();
Transformer transformer = tf.newTransformer();
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
transformer. setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
StreamResult result1 = new StreamResult(new StringWriter());
transformer.transform(src, result1);
Replace the string special characters
String xmlString = result1.getWriter().toString()
.replaceAll("<", "<").
replaceAll(">", ">");
System.out.print(xmlString);
the oposite string to dom with the fixed escaped characters
DocumentBuilder db = DocumentBuilderFactory.newInstance().newDocumentBuilder();
InputSource is = new InputSource();
is.setCharacterStream(new StringReader(xmlString));
Document doc = db.parse(is);
Source src123 = new DOMSource(doc);
Then set it back to the soap message
msg.getSOAPPart().setContent(src123);

Don't use createTextNode - the whole point of it is to insert some text (as data) into the document, not a fragment of raw XML.
Use a combination of createTextNode for the text and createElement for the elements.

I dont want the tags to be encoded. I need the output in this fashion.
Then you don't want a text node at all - which is why createTextNode isn't working for you. (Or rather, it's working fine - it's just not doing what you want). You should probably just parse your XML string, then import the document node from the result into your new document.
Of course, if you know the elements beforehand, don't express them as text in the first place - use a mixture of createElement, createAttribute, createTextNode and appendChild to create the structure.
It's entirely possible that something like JDOM will make this simpler, but that's the basic approach.

Mohan,
You can't use Document.createTextNode(). That methos transforms (or escapes) the charactes in your XML.
Instead, you need to build two separate Documents from the 2 XML's and use importNode.
I use Document.importNode() like this to solve my problem:
Build your builders:
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = dbf.newDocumentBuilder();
Document oldDoc = builder.parse(isOrigXml); //this is XML as InputSource
Document newDoc = builder.parse(isInsertXml); //this is XML as InputSource
Next, build a NodeList of the Element/Node you want to import. Create a Node from the NodeList. Create another Node of what you are going to import using importNode. Build the last Node of the final XML as such:
NodeList nl = newDoc.getElementByTagName("roseindia"); //or whatever the element name is
Node xmlToInsert = nl.item(0);
Node importNode = oldDoc.importNode(xmlToImport, true);
Node target = ((NodeList) oldDoc.getElementsByTagName("ELEMENT_NAME_OF_LOCATION")).item(0);
target.appendChild(importNode);
Source source = new DOMSource(target);
....
The rest is standard Transformer - StringWriter to StreamResult stuff to get the results.

How do I extract child element from XML to a string in Java?

If I have an XML document like
<root>
<element1>
<child attr1="blah">
<child2>blahblah</child2>
<child>
</element1>
</root>
I want to get an XML string with the first child element. My output string would be
<element1>
<child attr1="blah">
<child2>blahblah</child2>
<child>
</element1>
There are many approaches, would like to see some ideas. I've been trying to use Java XML APIs for it, but it's not clear that there is a good way to do this.
thanks

You're right, with the standard XML API, there's not a good way - here's one example (may be bug ridden; it runs, but I wrote it a long time ago).
import javax.xml.*;
import javax.xml.parsers.*;
import javax.xml.transform.*;
import javax.xml.transform.dom.*;
import javax.xml.transform.stream.*;
import org.w3c.dom.*;
import java.io.*;
public class Proc
{
public static void main(String[] args) throws Exception
{
//Parse the input document
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse(new File("in.xml"));
//Set up the transformer to write the output string
TransformerFactory tFactory = TransformerFactory.newInstance();
Transformer transformer = tFactory.newTransformer();
transformer.setOutputProperty("indent", "yes");
StringWriter sw = new StringWriter();
StreamResult result = new StreamResult(sw);
//Find the first child node - this could be done with xpath as well
NodeList nl = doc.getDocumentElement().getChildNodes();
DOMSource source = null;
for(int x = 0;x < nl.getLength();x++)
{
Node e = nl.item(x);
if(e instanceof Element)
{
source = new DOMSource(e);
break;
}
}
//Do the transformation and output
transformer.transform(source, result);
System.out.println(sw.toString());
}
}
It would seem like you could get the first child just by using doc.getDocumentElement().getFirstChild(), but the problem with that is if there is any whitespace between the root and the child element, that will create a Text node in the tree, and you'll get that node instead of the actual element node. The output from this program is:
D:\home\tmp\xml>java Proc
<?xml version="1.0" encoding="UTF-8"?>
<element1>
<child attr1="blah">
<child2>blahblah</child2>
</child>
</element1>
I think you can suppress the xml version string if you don't need it, but I'm not sure on that. I would probably try to use a third party XML library if at all possible.

Since this is the top google answer and For those of you who just want the basic:
public static String serializeXml(Element element) throws Exception
{
ByteArrayOutputStream buffer = new ByteArrayOutputStream();
StreamResult result = new StreamResult(buffer);
DOMSource source = new DOMSource(element);
TransformerFactory.newInstance().newTransformer().transform(source, result);
return new String(buffer.toByteArray());
}
I use this for debug, which most likely is what you need this for

I would recommend JDOM. It's a Java XML library that makes dealing with XML much easier than the standard W3C approach.

public String getXML(String xmlContent, String tagName){
String startTag = "<"+ tagName + ">";
String endTag = "</"+ tagName + ">";
int startposition = xmlContent.indexOf(startTag);
int endposition = xmlContent.indexOf(endTag, startposition);
if (startposition == -1){
return "ddd";
}
startposition += startTag.length();
if(endposition == -1){
return "eee";
}
return xmlContent.substring(startposition, endposition);
}
Pass your xml as string to this method,and in your case pass 'element' as parameter tagname.

XMLBeans is an easy to use (once you get the hang of it) tool to deal with XML without having to deal with the annoyances of parsing.
It requires that you have a schema for the XML file, but it also provides a tool to generate a schema from an exisint XML file (depending on your needs the generated on is probably fine).

If your xml has schema backing it, you could use xmlbeans or JAXB to generate pojo objects that help you marshal/unmarshal xml.
http://xmlbeans.apache.org/
https://jaxb.dev.java.net/

As question is actually about first occurrence of string inside another string, I would use String class methods, instead of XML parsers:
public static String getElementAsString(String xml, String tagName){
int beginIndex = xml.indexOf("<" + tagName);
int endIndex = xml.indexOf("</" + tagName, beginIndex) + tagName.length() + 3;
return xml.substring(beginIndex, endIndex);
}

You can use following function to extract xml block as string by passing proper xpath expression,
private static String nodeToString(Node node) throws TransformerException
{
StringWriter buf = new StringWriter();
Transformer xform = TransformerFactory.newInstance().newTransformer();
xform.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
xform.transform(new DOMSource(node), new StreamResult(buf));
return(buf.toString());
}
public static void main(String[] args) throws Exception
{
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.parse(inputFile);
XPath xPath = XPathFactory.newInstance().newXPath();
Node result = (Node)xPath.evaluate("A/B/C", doc, XPathConstants.NODE); //"A/B[id = '1']" //"//*[#type='t1']"
System.out.println(nodeToString(result));
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Cloning dom.Document object - java

Related

Link XML and XSD using java

xquery transformation creates empty namespace in element

Import and parse an xml file without FileOutputStream

How to avoid encoding of <,>,& with Document.createTextNode

How do I extract child element from XML to a string in Java?

Categories

Resources