Setting namespaces and prefixes in a Java DOM document - java

I'm trying to convert a ResultSet to an XML file.
I've first used this example for the serialization.
import org.w3c.dom.bootstrap.DOMImplementationRegistry;
import org.w3c.dom.Document;
import org.w3c.dom.ls.DOMImplementationLS;
import org.w3c.dom.ls.LSSerializer;
...
DOMImplementationRegistry registry = DOMImplementationRegistry.newInstance();
DOMImplementationLS impl =
(DOMImplementationLS)registry.getDOMImplementation("LS");
...
LSSerializer writer = impl.createLSSerializer();
String str = writer.writeToString(document);
After I made this work, I tried to validate my XML file, there were a couple of warnings.
One about not having a doctype. So I tried another way to implement this. I came across the Transformer class. This class lets me set the encoding, doctype, etc.
The previous implementation supports automatic namespace fix-up. The following does not.
private static Document toDocument(ResultSet rs) throws Exception {
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(true);
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.newDocument();
URL namespaceURL = new URL("http://www.w3.org/2001/XMLSchema-instance");
String namespace = "xmlns:xsi="+namespaceURL.toString();
Element messages = doc.createElementNS(namespace, "messages");
doc.appendChild(messages);
ResultSetMetaData rsmd = rs.getMetaData();
int colCount = rsmd.getColumnCount();
String attributeValue = "true";
String attribute = "xsi:nil";
rs.beforeFirst();
while(rs.next()) {
amountOfRecords = 0;
Element message = doc.createElement("message");
messages.appendChild(message);
for(int i = 1; i <= colCount; i++) {
Object value = rs.getObject(i);
String columnName = rsmd.getColumnName(i);
Element messageNode = doc.createElement(columnName);
if(value != null) {
messageNode.appendChild(doc.createTextNode(value.toString()));
} else {
messageNode.setAttribute(attribute, attributeValue);
}
message.appendChild(messageNode);
}
amountOfRecords++;
}
logger.info("Amount of records archived: " + amountOfRecords);
TransformerFactory tff = TransformerFactory.newInstance();
Transformer tf = tff.newTransformer();
tf.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
tf.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
tf.setOutputProperty(OutputKeys.INDENT, "yes");
BufferedWriter bf = createFile();
StreamResult sr = new StreamResult(bf);
DOMSource source = new DOMSource(doc);
tf.transform(source, sr);
return doc;
}
While I was testing the previous implementation I got an TransformationException: Namespace for prefix 'xsi' has not been declared. As you can see I've tried to add a namespace with the xsi prefix to the root element of my document. After testing this I still got the Exception. What is the correct way to set namespaces and their prefixes?
Edit: Another problem I have with the first implementation is that the last element in the XML document doesn't have the last three closing tags.

The correct way to set a node on a namespaceAware document is by using:
rootNode.createElementNS("http://example/namespace", "PREFIX:aNodeName");
So you can replace "PREFIX" with your own custom prefix and replace "aNodeName" with the name of your node. To avoid having each node having its own namespace declaration you can define the namespaces as attributes on your root node like so:
rootNode.setAttribute("xmlns:PREFIX", "http://example/namespace");
Please be sure to set:
documentBuilderFactory.setNamespaceAware(true)
Otherwise you don't have namespaceAwareness.

Please note that setting an xmlns-prefix with setAttribute is wrong.
If you ever want to eg sign your DOM, you have to use setAttributeNS:
element.setAttributeNS("http://www.w3.org/2000/xmlns/", "xmlns:PREFIX", "http://example/namespace");

You haven't added the namespace declaration in the root node; you just declared the root node in the namespace, two entirely different things. When building a DOM, you need to reference the namespace on every relevant Node. In other words, when you add your attribute, you need to define its namespace (e.g., setAttributeNS).
Side note: Although XML namespaces look like URLs, they really aren't. There's no need to use the URL class here.

Related

extract an element out of an XML document while preserving namespace prefix definitions

I am trying to extract an element (as a String) out of an XML document. I have tried both approaches suggested in this SO answer (a similar method is also suggested here) and they both fail to properly account for namespace prefixes that may be defined in some outer-level document.
Using the following code:
// entry point method; see exampes of values for the String `s` in the question
public static String stripPayload(String s) throws Exception {
final DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
final Document doc = dbf.newDocumentBuilder().parse(new InputSource(new StringReader(s)));
final XPath xPath = XPathFactory.newInstance().newXPath();
final String xPathToGetToTheNodeWeWishToExtract = "/*[local-name()='envelope']/*[local-name()='payload']";
final Node result = (Node) xPath.evaluate(xPathToGetToTheNodeWeWishToExtract, doc, XPathConstants.NODE);
return nodeToString_A(result); // or: nodeToString_B(result)
}
public static String nodeToString_A(Node node) throws Exception {
final StringWriter buf = new StringWriter();
final Transformer xform = TransformerFactory.newInstance().newTransformer();
xform.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
xform.setOutputProperty(OutputKeys.STANDALONE, "yes");
xform.transform(new DOMSource(node), new StreamResult(buf));
return(buf.toString());
}
public static String nodeToString_B(Node node) throws Exception {
final Document document = node.getOwnerDocument();
final DOMImplementationLS domImplLS = (DOMImplementationLS) document.getImplementation();
final LSSerializer serializer = domImplLS.createLSSerializer();
final String str = serializer.writeToString(node);
return str;
}
If the stripPayload method if passed the following strings:
<envelope><payload><a></a><b></b></payload></envelope>
or
<envelope><p:payload xmlns:p='foo'><a></a><b></b></p:payload></envelope>
… both nodeToString_A and nodeToString_B methods work. However, if I pass the following equally valid XML document where the namespace prefix is defined in an outer element:
<envelope xmlns:p='foo'><p:payload><a></a><b></b></p:payload></envelope>
… then both methods fail as they simply emit:
<p:payload><a/><b/></p:payload>
Thus, they are already producing an invalid document as the namespace prefix definition is left out.
The more complicated example below (which uses namespace prefixes in attributes):
<envelope xmlns:p='foo' xmlns:a='alpha'><p:payload a:attr='dummy'><a></a><b></b></p:payload></envelope>
… actually causes nodeToString_A to fail with an exception whereas at least nodeToString_B produces the invalid:
<p:payload a:attr="dummy"><a/><b/></p:payload>
(where again, the prefixes are not defined).
So my question is:
What is a robust way to extract and stringify an inner XML element in a way that takes care of namespace prefixes that may be defined in some outer element?
You just need to enable name-space-awareness.
public static String stripPayload(String s) throws Exception {
final DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setNamespaceAware(true);
...
}
The output will be ...
<p:payload xmlns:p="foo"><a/><b/></p:payload>

Update XML attribute value in document tree

I am not too sure why i cannot modify an attribute to my xml. The code below i used to get the read the attributes from the XML. Pulls the attributes without any issues.
document = documentBuilder.parse(file);
NodeList sessionNodelist = document.getElementsByTagName("session");
if (sessionNodelist.getLength() > 0)
{
Element sessionElement = (Element) sessionNodelist.item(0);
String timeout = sessionElement.getAttribute("timeout");
String warning = sessionElement.getAttribute("warning");
}
Now when i go to set them, it doesn't work and I am not too sure why. The code is below. it's the exact same code i used to pull the atribles, but instead of the getAttribute i used setAttribute which takes two parameters. setAttribute(String name, String Value).
document = documentBuilder.parse(file);
NodeList sessionNodelist = document.getElementsByTagName("session");
if (sessionNodelist.getLength() > 0)
{
Element sessionElement = (Element) sessionNodelist.item(0);
sessionElement.setAttribute("timeout","12");
sessionElement.setAttribute("warning", "10");
}
Any ideas?
You need to write the document tree back to the XML file. See this page for how to write a DOM tree to a file.
You would use a javax.xml.transform.Transformer to write the object into the file as follows:
TransformerFactory transformerFactory = TransformerFactory.newInstance();
Transformer transformer = transformerFactory.newTransformer();
DOMSource source = new DOMSource(document);
StreamResult result = new StreamResult(file);
transformer.transform(source, result);

How to avoid encoding of <,>,& with Document.createTextNode

class XMLencode
{
public static void main(String[] args)
{
try{
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder docBuilder = factory.newDocumentBuilder();
Document doc = docBuilder.newDocument();
Element root = doc.createElement("roseindia");
doc.appendChild(root);
Text elmnt=doc.createTextNode("<data>sun</data><abcdefg/><end/>");
root.appendChild(elmnt);
TransformerFactory tranFactory = TransformerFactory.newInstance();
Transformer aTransformer = tranFactory.newTransformer();
Source src = new DOMSource(doc);
Result dest = new StreamResult(System.out);
aTransformer.transform(src, dest);
}catch(Exception e){
System.out.println(e.getMessage());
}
}
}
Here is my above piece of code.
The output generated is like this
<?xml version="1.0" encoding="UTF-8" standalone="no"?><roseindia><data>sun</data><abcdefg/><end/></roseindia>
I dont want the tags to be encoded. I need the output in this fashion.
<?xml version="1.0" encoding="UTF-8" standalone="no"?><roseindia><data>sun</data><abcdefg/><end/></roseindia>
Please help me on this.
Thanks,
Mohan
Short Answer
You could leverage the CDATA mechanism in XML to prevent characters from being escaped. Below is an example of the DOM code:
doc.createCDATASection("<foo/>");
The content will be:
<![CDATA[<foo/>]]>
LONG ANSWER
Below is a complete example of leveraging a CDATA section using the DOM APIs.
package forum12525152;
import javax.xml.parsers.*;
import javax.xml.transform.*;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import org.w3c.dom.*;
public class Demo {
public static void main(String[] args) throws Exception {
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document document = db.newDocument();
Element rootElement = document.createElement("root");
document.appendChild(rootElement);
// Create Element with a Text Node
Element fooElement = document.createElement("foo");
fooElement.setTextContent("<foo/>");
rootElement.appendChild(fooElement);
// Create Element with a CDATA Section
Element barElement = document.createElement("bar");
CDATASection cdata = document.createCDATASection("<bar/>");
barElement.appendChild(cdata);
rootElement.appendChild(barElement);
TransformerFactory tf = TransformerFactory.newInstance();
Transformer t = tf.newTransformer();
DOMSource source = new DOMSource(document);
StreamResult result = new StreamResult(System.out);
t.transform(source, result);
}
}
Output
Note the difference in the foo and bar elements even though they have similar content. I have formatted the result of running the demo code to make it more readable:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<root>
<foo><foo/></foo>
<bar><![CDATA[<bar/>]]></bar>
</root>
Instead of writing like this doc.createTextNode("<data>sun</data><abcdefg/><end/>");
You should create each element.
import javax.xml.parsers.*;
import javax.xml.transform.*;
import javax.xml.transform.dom.*;
import javax.xml.transform.stream.*;
import org.w3c.dom.*;
class XMLencode {
public static void main(String[] args) {
try {
DocumentBuilderFactory factory = DocumentBuilderFactory
.newInstance();
DocumentBuilder docBuilder = factory.newDocumentBuilder();
Document doc = docBuilder.newDocument();
Element root = doc.createElement("roseindia");
doc.appendChild(root);
Element data = doc.createElement("data");
root.appendChild(data);
Text elemnt = doc.createTextNode("sun");
data.appendChild(elemnt);
Element data1 = doc.createElement("abcdefg");
root.appendChild(data1);
//Text elmnt = doc.createTextNode("<data>sun</data><abcdefg/><end/>");
//root.appendChild(elmnt);
TransformerFactory tranFactory = TransformerFactory.newInstance();
Transformer aTransformer = tranFactory.newTransformer();
Source src = new DOMSource(doc);
Result dest = new StreamResult(System.out);
aTransformer.transform(src, dest);
} catch (Exception e) {
System.out.println(e.getMessage());
}
}
}
You can use the doc.createTextNode and use a workaround (long) for the escaped characters.
SOAPMessage msg = messageContext.getMessage();
header.setTextContent(seched);
Then use
Source src = msg.getSOAPPart().getContent();
To get the content, the transform it to string
TransformerFactory tf = TransformerFactory.newInstance();
Transformer transformer = tf.newTransformer();
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
transformer. setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
StreamResult result1 = new StreamResult(new StringWriter());
transformer.transform(src, result1);
Replace the string special characters
String xmlString = result1.getWriter().toString()
.replaceAll("<", "<").
replaceAll(">", ">");
System.out.print(xmlString);
the oposite string to dom with the fixed escaped characters
DocumentBuilder db = DocumentBuilderFactory.newInstance().newDocumentBuilder();
InputSource is = new InputSource();
is.setCharacterStream(new StringReader(xmlString));
Document doc = db.parse(is);
Source src123 = new DOMSource(doc);
Then set it back to the soap message
msg.getSOAPPart().setContent(src123);
Don't use createTextNode - the whole point of it is to insert some text (as data) into the document, not a fragment of raw XML.
Use a combination of createTextNode for the text and createElement for the elements.
I dont want the tags to be encoded. I need the output in this fashion.
Then you don't want a text node at all - which is why createTextNode isn't working for you. (Or rather, it's working fine - it's just not doing what you want). You should probably just parse your XML string, then import the document node from the result into your new document.
Of course, if you know the elements beforehand, don't express them as text in the first place - use a mixture of createElement, createAttribute, createTextNode and appendChild to create the structure.
It's entirely possible that something like JDOM will make this simpler, but that's the basic approach.
Mohan,
You can't use Document.createTextNode(). That methos transforms (or escapes) the charactes in your XML.
Instead, you need to build two separate Documents from the 2 XML's and use importNode.
I use Document.importNode() like this to solve my problem:
Build your builders:
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = dbf.newDocumentBuilder();
Document oldDoc = builder.parse(isOrigXml); //this is XML as InputSource
Document newDoc = builder.parse(isInsertXml); //this is XML as InputSource
Next, build a NodeList of the Element/Node you want to import. Create a Node from the NodeList. Create another Node of what you are going to import using importNode. Build the last Node of the final XML as such:
NodeList nl = newDoc.getElementByTagName("roseindia"); //or whatever the element name is
Node xmlToInsert = nl.item(0);
Node importNode = oldDoc.importNode(xmlToImport, true);
Node target = ((NodeList) oldDoc.getElementsByTagName("ELEMENT_NAME_OF_LOCATION")).item(0);
target.appendChild(importNode);
Source source = new DOMSource(target);
....
The rest is standard Transformer - StringWriter to StreamResult stuff to get the results.

Adding namespace to an already created XML document

I am creating a W3C Document object using a String value. Once I created the Document object, I want to add a namespace to the root element of this document. Here's my current code:
Document document = builder.parse(new InputSource(new StringReader(xmlString)));
document.getDocumentElement().setAttributeNS("http://com", "xmlns:ns2", "Test");
document.setPrefix("ns2");
TransformerFactory tranFactory = TransformerFactory.newInstance();
Transformer aTransformer = tranFactory.newTransformer();
Source src = new DOMSource(document);
Result dest = new StreamResult(new File("c:\\xmlFileName.xml"));
aTransformer.transform(src, dest);
What I use as input:
<product>
<arg0>DDDDDD</arg0>
<arg1>DDDD</arg1>
</product>
What the output should look like:
<ns2:product xmlns:ns2="http://com">
<arg0>DDDDDD</arg0>
<arg1>DDDD</arg1>
</ns2:product>
I need to add the prefix value and namespace also to the input xml string. If I try the above code I am getting this exception:
NAMESPACE_ERR: An attempt is made to create or change an object in a way which is incorrect with regard to namespaces.
Appreciate your help!
Since there is not an easy way to rename the root element, we'll have to replace it with an element that has the correct namespace and attribute, and then copy all the original children into it. Forcing the namespace declaration is not needed because by giving the element the correct namespace (URI) and setting the prefix, the declaration will be automatic.
Replace the setAttribute and setPrefix with this (line 2,3)
String namespace = "http://com";
String prefix = "ns2";
// Upgrade the DOM level 1 to level 2 with the correct namespace
Element originalDocumentElement = document.getDocumentElement();
Element newDocumentElement = document.createElementNS(namespace, originalDocumentElement.getNodeName());
// Set the desired namespace and prefix
newDocumentElement.setPrefix(prefix);
// Copy all children
NodeList list = originalDocumentElement.getChildNodes();
while(list.getLength()!=0) {
newDocumentElement.appendChild(list.item(0));
}
// Replace the original element
document.replaceChild(newDocumentElement, originalDocumentElement);
In the original code the author tried to declare an element namespace like this:
.setAttributeNS("http://com", "xmlns:ns2", "Test");
The first parameter is the namespace of the attribute, and since it's a namespace attribute it need to have the http://www.w3.org/2000/xmlns/ URI. The declared namespace should come into the 3rd parameter
.setAttributeNS("http://www.w3.org/2000/xmlns/", "xmlns:ns2", "http://com");
Bellow approach also works for me, but probably should not use in performance critical case.
Add name space to document root element as attribute.
Transform the document to XML string. The purpose of this step is to make the child element in the XML string inherit parent element namespace.
Now the xml string have name space.
You can use the XML string to build a document again or used for JAXB unmarshal, etc.
private static String addNamespaceToXml(InputStream in)
throws ParserConfigurationException, SAXException, IOException,
TransformerException {
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
/*
* Must not namespace aware, otherwise the generated XML string will
* have wrong namespace
*/
// dbf.setNamespaceAware(true);
DocumentBuilder db = dbf.newDocumentBuilder();
Document document = db.parse(in);
Element documentElement = document.getDocumentElement();
// Add name space to root element as attribute
documentElement.setAttribute("xmlns", "http://you_name_space");
String xml = transformXmlNodeToXmlString(documentElement);
return xml;
}
private static String transformXmlNodeToXmlString(Node node)
throws TransformerException {
TransformerFactory transFactory = TransformerFactory.newInstance();
Transformer transformer = transFactory.newTransformer();
StringWriter buffer = new StringWriter();
transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
transformer.transform(new DOMSource(node), new StreamResult(buffer));
String xml = buffer.toString();
return xml;
}
Partially gleaned from here, and also from a comment above, I was able to get it to work (transforming an arbitrary DOM Node and adding a prefix to it and all its children) thus:
private String addNamespacePrefix(Document doc, Node node) throws TransformerException {
Element mainRootElement = doc.createElementNS(
"http://abc.de/x/y/z", // namespace
"my-prefix:fake-header-element" // prefix to "register" it with the DOM so we don't get exceptions later...
);
List<Element> descendants = nodeListToArrayRecurse(node.getChildNodes()); // for some reason we have to grab all these before doing the first "renameNode" ... no idea why ...
mainRootElement.appendChild(node);
doc.renameNode(node, "http://abc.de/x/y/z", "my-prefix:" + node.getNodeName());
descendants.stream().forEach(c -> doc.renameNode(c, "http://abc.de/x/y/z", "my-prefix:" + c.getNodeName()));
}
private List<Element> nodeListToArrayRecurse(NodeList entryNodes) {
List<Element> allEntries = new ArrayList<>();
for (int i = 0; i < entryNodes.getLength(); i++) {
Node child = entryNodes.item(i);
if (child.getNodeType() == Node.ELEMENT_NODE) {
allEntries.add((Element) child);
allEntries.addAll(nodeListToArray(child.getChildNodes())); // recurse
} // ignore other [i.e. text] nodes https://stackoverflow.com/questions/14566596/loop-through-all-elements-in-xml-using-nodelist
}
return allEntries;
}
If it helps anybody. I then convert it to string, then manually remove the extra header and closing lines. What a pain, I must be doing something wrong...
This seems to be working for me, and it's much simpler than those answers provided:
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
document = builder.parse(new File(filename));
document.getDocumentElement().setAttributeNS("http://www.w3.org/2000/xmlns/", "xmlns:yourNamespace", "http://whatever/else");

How do I extract child element from XML to a string in Java?

If I have an XML document like
<root>
<element1>
<child attr1="blah">
<child2>blahblah</child2>
<child>
</element1>
</root>
I want to get an XML string with the first child element. My output string would be
<element1>
<child attr1="blah">
<child2>blahblah</child2>
<child>
</element1>
There are many approaches, would like to see some ideas. I've been trying to use Java XML APIs for it, but it's not clear that there is a good way to do this.
thanks
You're right, with the standard XML API, there's not a good way - here's one example (may be bug ridden; it runs, but I wrote it a long time ago).
import javax.xml.*;
import javax.xml.parsers.*;
import javax.xml.transform.*;
import javax.xml.transform.dom.*;
import javax.xml.transform.stream.*;
import org.w3c.dom.*;
import java.io.*;
public class Proc
{
public static void main(String[] args) throws Exception
{
//Parse the input document
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse(new File("in.xml"));
//Set up the transformer to write the output string
TransformerFactory tFactory = TransformerFactory.newInstance();
Transformer transformer = tFactory.newTransformer();
transformer.setOutputProperty("indent", "yes");
StringWriter sw = new StringWriter();
StreamResult result = new StreamResult(sw);
//Find the first child node - this could be done with xpath as well
NodeList nl = doc.getDocumentElement().getChildNodes();
DOMSource source = null;
for(int x = 0;x < nl.getLength();x++)
{
Node e = nl.item(x);
if(e instanceof Element)
{
source = new DOMSource(e);
break;
}
}
//Do the transformation and output
transformer.transform(source, result);
System.out.println(sw.toString());
}
}
It would seem like you could get the first child just by using doc.getDocumentElement().getFirstChild(), but the problem with that is if there is any whitespace between the root and the child element, that will create a Text node in the tree, and you'll get that node instead of the actual element node. The output from this program is:
D:\home\tmp\xml>java Proc
<?xml version="1.0" encoding="UTF-8"?>
<element1>
<child attr1="blah">
<child2>blahblah</child2>
</child>
</element1>
I think you can suppress the xml version string if you don't need it, but I'm not sure on that. I would probably try to use a third party XML library if at all possible.
Since this is the top google answer and For those of you who just want the basic:
public static String serializeXml(Element element) throws Exception
{
ByteArrayOutputStream buffer = new ByteArrayOutputStream();
StreamResult result = new StreamResult(buffer);
DOMSource source = new DOMSource(element);
TransformerFactory.newInstance().newTransformer().transform(source, result);
return new String(buffer.toByteArray());
}
I use this for debug, which most likely is what you need this for
I would recommend JDOM. It's a Java XML library that makes dealing with XML much easier than the standard W3C approach.
public String getXML(String xmlContent, String tagName){
String startTag = "<"+ tagName + ">";
String endTag = "</"+ tagName + ">";
int startposition = xmlContent.indexOf(startTag);
int endposition = xmlContent.indexOf(endTag, startposition);
if (startposition == -1){
return "ddd";
}
startposition += startTag.length();
if(endposition == -1){
return "eee";
}
return xmlContent.substring(startposition, endposition);
}
Pass your xml as string to this method,and in your case pass 'element' as parameter tagname.
XMLBeans is an easy to use (once you get the hang of it) tool to deal with XML without having to deal with the annoyances of parsing.
It requires that you have a schema for the XML file, but it also provides a tool to generate a schema from an exisint XML file (depending on your needs the generated on is probably fine).
If your xml has schema backing it, you could use xmlbeans or JAXB to generate pojo objects that help you marshal/unmarshal xml.
http://xmlbeans.apache.org/
https://jaxb.dev.java.net/
As question is actually about first occurrence of string inside another string, I would use String class methods, instead of XML parsers:
public static String getElementAsString(String xml, String tagName){
int beginIndex = xml.indexOf("<" + tagName);
int endIndex = xml.indexOf("</" + tagName, beginIndex) + tagName.length() + 3;
return xml.substring(beginIndex, endIndex);
}
You can use following function to extract xml block as string by passing proper xpath expression,
private static String nodeToString(Node node) throws TransformerException
{
StringWriter buf = new StringWriter();
Transformer xform = TransformerFactory.newInstance().newTransformer();
xform.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
xform.transform(new DOMSource(node), new StreamResult(buf));
return(buf.toString());
}
public static void main(String[] args) throws Exception
{
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.parse(inputFile);
XPath xPath = XPathFactory.newInstance().newXPath();
Node result = (Node)xPath.evaluate("A/B/C", doc, XPathConstants.NODE); //"A/B[id = '1']" //"//*[#type='t1']"
System.out.println(nodeToString(result));
}

Categories