XML Node to String in Java - java

I came across this piece of Java function to convert an XML node to a Java String representation:
private String nodeToString(Node node) {
StringWriter sw = new StringWriter();
try {
Transformer t = TransformerFactory.newInstance().newTransformer();
t.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
t.setOutputProperty(OutputKeys.INDENT, "yes");
t.transform(new DOMSource(node), new StreamResult(sw));
} catch (TransformerException te) {
System.out.println("nodeToString Transformer Exception");
}
return sw.toString();
}
It looks straightforward in that it wants the output string doesn't have any XML declaration and it must contain indentation.
But I wonder how the actual output should be, suppose I have an XML node:
<p><media type="audio" id="au008093" rights="wbowned">
<title>Bee buzz</title>
</media>Most other kinds of bees live alone instead of in a colony. These bees make
tunnels in wood or in the ground. The queen makes her own nest.</p>
Could I assume the resulting String after applying the above transformation is:
"media type="audio" id="au008093" rights="wbowned" title Bee buzz title /media"
I want to test it myself, but I have no idea on how to represent this XML node in the way this function actually wants.
I am bit confused, and thanks in advance for the generous help.

All important has already been said. I tried to compile the following code.
import java.io.ByteArrayInputStream;
import java.io.InputStream;
import java.io.StringWriter;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.transform.OutputKeys;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerException;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import org.w3c.dom.Document;
import org.w3c.dom.Node;
public class Test {
public static void main(String[] args) throws Exception {
String s =
"<p>" +
" <media type=\"audio\" id=\"au008093\" rights=\"wbowned\">" +
" <title>Bee buzz</title>" +
" " +
" Most other kinds of bees live alone instead of in a colony." +
" These bees make tunnels in wood or in the ground." +
" The queen makes her own nest." +
"</p>";
InputStream is = new ByteArrayInputStream(s.getBytes());
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document d = db.parse(is);
Node rootElement = d.getDocumentElement();
System.out.println(nodeToString(rootElement));
}
private static String nodeToString(Node node) {
StringWriter sw = new StringWriter();
try {
Transformer t = TransformerFactory.newInstance().newTransformer();
t.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
t.setOutputProperty(OutputKeys.INDENT, "yes");
t.transform(new DOMSource(node), new StreamResult(sw));
} catch (TransformerException te) {
System.out.println("nodeToString Transformer Exception");
}
return sw.toString();
}
}
And it produced the following output:
<p> <media id="au008093" rights="wbowned" type="audio"> <title>Bee buzz</title> </media> Most other kinds of bees live alone instead of in a colony. These bees make tunnels in wood or in the ground. The queen makes her own nest.</p>
You can further tweak it by yourself. Good luck!

You have an XML respesentation in a DOM tree.
For example you have opened an XML file and you have passed it in the DOM parser.
As a result a DOM tree in memory with your XML is created.
Now you can only access the XML info via traversal of the DOM tree.
If you need though, a String representation of the XML info of the DOM tree you use a transformation.
This happens since it is not possible to get the String representation directly from a DOM tree.
So if for example as Node node you pass in nodeToString is the root element of the XML doc then the result is a String containing the original XML data.
The tags will still be there. I.e. you will have a valid XML representation. Only this time will be in a String variable.
For example:
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder parser = factory.newDocumentBuilder();
Document xmlDoc = parser.parse(file);//file has the xml
String xml = nodeToString(xmlDoc.getDocumentElement());//pass in the root
//xml has the xml info. E.g no xml declaration. Add it
xml = "<?xml version=\"1.0\" encoding=\"UTF-8\" ?> + xml;//bad to append this way...
System.out.println("XML is:"+xml);
DISCLAIMER: Did not even attempt to compile code. Hopefully you understand what you have to do

Related

Transformer escapes CR

Suggest the following program:
import java.io.StringReader;
import javax.xml.transform.OutputKeys;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.stream.StreamResult;
import javax.xml.transform.stream.StreamSource;
public class CrDemo {
public static void main(String[] args) throws Exception {
final String xml = "<a>foo
\nbar
\n</a>";
final TransformerFactory tf = TransformerFactory.newInstance();
final Transformer t = tf.newTransformer();
t.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
t.setOutputProperty(OutputKeys.INDENT, "no");
t.setOutputProperty(OutputKeys.STANDALONE, "yes");
t.transform(new StreamSource(new StringReader(xml)), new StreamResult(System.out));
}
}
The output looks like this:
<a>foo
bar
</a>
Is it possible to prevent the Transformer from escaping CR?
If the input XML contained literal CR characters, they would be removed during parsing. XML parsers normalize line endings to a single NL character; but this doesn't apply if the CR is escaped as 
.
So if a text node contains a CR character, the XSLT processor assumes you have worked hard to put it there and that you really want it, and it therefore outputs it in such a way that it will survive round-tripping where the resulting serialized output is re-processed by an XML parser.
Of course, you can get rid of CR characters in your XSLT code, just as you can get rid of any other characters. But it won't happen automatically.

Having trouble formatting multiple nodes from a text to XML conversion in Java

I have a Java program which converts text files to XML. I need the following format:
<app:defaults>
<app:schedules>
<app:run>
<app:schedule>schedule frequency value</app:schedule>
</app:run>
</app:schedules>
<app:rununit>
<app:agent>agent hostname value</app:agent>
</app:rununit>
</app:defaults>
The ending "/app:schedules" tag is not appending in the correct place
after the "/app:run" tag. The program is instead generating the following (which is not correct):
<app:defaults>
<app:schedules>
<app:run>
<app:schedule>schedule frequency value</app:schedule>
</app:run>
<app:rununit>
<app:agent>agent hostname value</app:agent>
</app:rununit>
</app:schedules>
</app:defaults>
The method in the java program is as follows: for this example i expilicitly added the text to each node to show what the data should be. - this method takes String args otherwise from the input text file.
public static void main(String[] args) {
String infile = args[0];
String outxml = args[1];
BufferedReader in;
StreamResult out;
DocumentBuilderFactory icFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder icBuilder;
try {
in = new BufferedReader(new FileReader(infile));
out = new StreamResult(outxml);
icBuilder = icFactory.newDocumentBuilder();
Document doc = icBuilder.newDocument();
Element mainRootElement = doc.createElementNS ("http://dto.cybermation.com/application", "app:appl");
mainRootElement.setAttribute("name", "TESTSHEDULE");
doc.appendChild(mainRootElement);
...
private static Node processTagElements3(Document doc, String "app:defaults") {
Element node1 = doc.createElement("app:schedules");
Element node2 = doc.createElement("app:run");
Element node3 = doc.createElement("app:schedule");
Element node4 = doc.createElement("app:rununit");
Element node5 = doc.createElement("app:agent");
node1.appendChild(node2);
node2.appendChild(node3);
node3.appendChild(doc.createTextNode("schedule frequency value"));
node1.appendChild(node4);
node4.appendChild(node5);
node5.appendChild(doc.createTextNode("agent hostname value"));
return node1;
}
I've tested this using different appenchild parameters between these nodes but ran up against a brick wall with formatiing this output. Any suggestions, advice on the best way to organize the node tag insertions is really appreciated. There could be somthing simple I am missing.
Note: I'm not an expert in for XML parsing in Java.
Just trying to stitch some example codes I got in my machine and see if that solves your problem. So here it is.
Example code:
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.NodeList;
import org.xml.sax.InputSource;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import java.io.StringReader;
import java.io.StringWriter;\
public class test {
public static void main(String[] args) throws Exception {
String xml = "<app:defaults>\n" +
" <app:schedules>\n" +
" <app:run>\n" +
" <app:schedule>schedule frequency value</app:schedule>\n" +
" </app:run>\n" +
" </app:schedules>\n" +
" <app:rununit>\n" +
" <app:agent>agent hostname value</app:agent>\n" +
" </app:rununit> \n" +
" </app:defaults>";
Document doc = DocumentBuilderFactory.newInstance().newDocumentBuilder()
.parse(new InputSource(new StringReader(xml)));
NodeList errNodes = doc.getElementsByTagName("error");
if (errNodes.getLength() > 0) {
Element err = (Element)errNodes.item(0);
System.out.println(err.getElementsByTagName("errorMessage")
.item(0).getTextContent());
} else {
// success
DOMSource domSource = new DOMSource(doc);
StringWriter writer = new StringWriter();
StreamResult result = new StreamResult(writer);
TransformerFactory tf = TransformerFactory.newInstance();
Transformer transformer = tf.newTransformer();
transformer.transform(domSource, result);
System.out.println(writer.toString());
}
}
Output:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<app:defaults>
<app:schedules>
<app:run>
<app:schedule>schedule frequency value</app:schedule>
</app:run>
</app:schedules>
<app:rununit>
<app:agent>agent hostname value</app:agent>
</app:rununit>
</app:defaults>
This code seems to be working as what you will expecting it to be. Give it a try and let me know whether the solution is okay.
I think the idea here is to use pre-baked Java APIs than writing our own parser. Because these APIs are generally more reliable since many others would be using it daily.
Things would be way easier if you had named your nodes with meaningful names (let's say runNode, etc), don't you think?
That being said, this is probably what you want:
Element defaultNode = doc.createElement("app:default");
Element schedulesNode = doc.createElement("app:schedules");
Element runNode = doc.createElement("app:run");
Element scheduleNode = doc.createElement("app:schedule");
Element rununitNode = doc.createElement("app:rununit");
Element agentNode = doc.createElement("app:agent");
defaultNode.appendChild(schedulesNode);
schedulesNode.appendChild(runNode);
runNode.appendChild(scheduleNode);
scheduleNode.appendChild(doc.createTextNode("schedule frequency value"));
defaultNode.appendChild(rununitNode);
rununitNode.appendChild(agentNode);
agentNode.appendChild(doc.createTextNode("agent hostname value"));
Note the defaultNode used.
Thanks all!
I decided to modify the script to accept file input as my arg - this works fine now and is a simpler solution:
public class test2 {
public static void main(String[] args) throws Exception {
File file = new File(args[0]);
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
try{
DocumentBuilder builder = factory.newDocumentBuilder();
FileInputStream fis = new FileInputStream(file);
InputSource is = new InputSource(fis);
Document doc = builder.parse(is);
NodeList errNodes = doc.getElementsByTagName("error");
if (errNodes.getLength() > 0) {
Element err = (Element)errNodes.item(0);
System.out.println(err.getElementsByTagName("errorMessage").item(0).getTextContent());
} else {
// success
DOMSource domSource = new DOMSource(doc);
StringWriter writer = new StringWriter();
StreamResult result = new StreamResult(writer);
TransformerFactory tf = TransformerFactory.newInstance();
Transformer transformer = tf.newTransformer();
transformer.transform(domSource, result);
System.out.println(writer.toString());
}
} catch (Exception e) {
e.printStackTrace();
}
}
}

Adding linebreak in xml file before root node

I am trying to add line break after my comments above the root node in XML document.
I need something like this:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!--DO NOT EDIT THIS FILE-->
<projects>
</projects>
But What I was able to get is this(Line break inside the root but I need line break after the comment):
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!--DO NOT EDIT THIS FILE--><projects>
</projects>
I need to add the line break just after my comment. Is there a way to do this?
My code:
import java.io.File;
import java.io.FileInputStream;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.transform.OutputKeys;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import org.w3c.dom.Comment;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.Text;
public class XMLNewLine {
/**
* #param args
*/
public static void main(String[] args) {
System.out.println("Adding comment..");
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setValidating(false);
DocumentBuilder db;
try {
Document doc;
StreamResult result;
result = new StreamResult(new File("abc.xml"));
db = dbf.newDocumentBuilder();
doc = db.parse(new FileInputStream(new File("abc.xml")));
Element element = doc.getDocumentElement();
Text lineBreak = doc.createTextNode("\n");
element.appendChild(lineBreak);
Comment comment = doc
.createComment("DO NOT EDIT THIS FILE");
element.getParentNode().insertBefore(comment, element);
doc.getDocumentElement().normalize();
TransformerFactory transformerFactory = TransformerFactory
.newInstance();
Transformer transformer = transformerFactory.newTransformer();
DOMSource source = new DOMSource(doc);
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
transformer.transform(source, result);
} catch (Exception e) {
// TODO Auto-generated catch block
}
}
}
You basically want a text node containing a line break after the comment node.
Element docElem = doc.getDocumentElement();
doc.insertBefore(doc.createComment("DO NOT EDIT THIS FILE"), docElem);
doc.insertBefore(doc.createTextNode("\\n"), docElem);
EDIT: It seems that appending even whitespace-only text nodes is not allowed at the root node of an org.w3c.dom.Document. This is 100% formally correct, but also unhelpful.
The way comments are rendered in the output of the Transformer is determined by the serializer it uses (there are different serializers for HTML, XML and plain text outputs). In the built-in XML serializer the end of a comment is defined as --> - without a newline.
Since the internals of javax.xml.transform.Transformer are hard-wired, the serializers are not public API and the class is marked as final, overriding that behavior or setting a custom serializer is impossible.
In other words, you are out of luck adding your line break in a clean way.
You can, however, safely add it in a slightly unclean way:
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
FileInputStream inputXml = new FileInputStream(new File("input.xml"));
Document doc = db.parse(inputXml);
// add the comment node
doc.insertBefore(doc.createComment("THIS IS A COMMENT"), doc.getDocumentElement());
StringWriter outputXmlStringWriter = new StringWriter();
Transformer transformer = transformerFactory.newTransformer();
// "xml" + "UTF-8" "include XML declaration" is the default anyway, but let's be explicit
transformer.setOutputProperty(OutputKeys.METHOD, "xml");
transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "no");
transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
transformer.transform(new DOMSource(doc), new StreamResult(outputXmlStringWriter));
// now insert our newline into the string & write an UTF-8 file
String outputXmlString = outputXmlStringWriter.toString()
.replaceFirst("<!--", "\n<!--").replaceFirst("-->", "-->\n");
FileOutputStream outputXml = new FileOutputStream(new File("output.xml"));
outputXml.write(outputXmlString.getBytes("UTF-8"));
Doing search-and-replace operations on XML strings is highly discouraged in general, but in this case there is little that can go wrong.
Revisiting this after some time because I had the same issue. I found another solution that does not need to buffer the output in a String:
Write only the XML-declaration by passing an empty document. This will also append a linebreak.
Write the document content without XML-declaration
Code:
StreamResult streamResult = new StreamResult(writer);
// output XML declaration with an empty document
transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "no");
transformer.transform(new DOMSource(), streamResult);
// output the document without XML declaration
transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
transformer.transform(new DOMSource(doc), streamResult);
You can achieve this by not adding the comment node to your document, but instead partially transforming your document. First transform your own XML processing instruction and comment separately, and then the rest of document:
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(new FileInputStream(new File("abc.xml")));
Result output = new StreamResult(new File("abc.xml"));
Source input = new DOMSource(doc);
// xml processing instruction and comment node
ProcessingInstruction xmlpi = doc.createProcessingInstruction("xml", "version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"");
Comment comment = doc.createComment("DO NOT EDIT THIS FILE");
// first transform the processing instruction and comment
transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
transformer.transform(new DOMSource(xmlpi), output);
transformer.transform(new DOMSource(comment), output);
// then the document
transformer.transform(input, output);
There is a JDK bug concerning this. It was not fixed (as you would expect) because that would likely cause many problems to users' existing applications.
Adding the following output property fixes this:
transformer.setOutputProperty("http://www.oracle.com/xml/is-standalone", "yes");
Had the same issue.
I solved it by putting the comment inside the root element.
Not exactly the same, but I think acceptable.
This is my solution. I just take writer and write to it declaration and the header comment. After that I disable declaration in transformer this way
transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
All code:
public static String xmlToTree(String xml, String headerComment) {
try (StringReader reader = new StringReader(xml)) {
StreamResult result = new StreamResult(new StringWriter());
result.getWriter().write("<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n");
result.getWriter().write(headerComment + "\n");
Transformer transformer = TransformerFactory.newInstance().newTransformer();
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "4");
StreamSource source = new StreamSource(reader);
transformer.transform(source, result);
String xmlTree = result.getWriter().toString();
return xmlTree;
} catch (Exception ex) {
ex.printStackTrace();
return null;
}
}

DOM parser in java not encoding quotes in UTF-8

I am trying to use the code available from this tutorial :http://www.mkyong.com/java/how-to-create-xml-file-in-java-dom/
I've pasted the code below as well, the problem it seems to encode all the predef characters <,> and & etc. but not single or double quotes (" and '). I'd really appreciate a fix. Also the code below has an edit to make the resultant xml appear properly formatted
More specifically:
import java.io.File;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerException;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import org.w3c.dom.Attr;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
public class WriteXMLFile {
public static void main(String argv[]) {
try {
DocumentBuilderFactory docFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder docBuilder = docFactory.newDocumentBuilder();
// root elements
Document doc = docBuilder.newDocument();
Element rootElement = doc.createElement("company");
doc.appendChild(rootElement);
// staff elements
Element staff = doc.createElement("Staff");
rootElement.appendChild(staff);
// set attribute to staff element
Attr attr = doc.createAttribute("id");
attr.setValue("1");
staff.setAttributeNode(attr);
// shorten way
// staff.setAttribute("id", "1");
// firstname elements
Element firstname = doc.createElement("firstname");
firstname.appendChild(doc.createTextNode("yong"));
staff.appendChild(firstname);
// lastname elements
Element lastname = doc.createElement("lastname");
lastname.appendChild(doc.createTextNode("mook kim"));
staff.appendChild(lastname);
// nickname elements
Element nickname = doc.createElement("nickname");
nickname.appendChild(doc.createTextNode("mkyong"));
staff.appendChild(nickname);
// salary elements
Element salary = doc.createElement("salary");
salary.appendChild(doc.createTextNode("100000"));
staff.appendChild(salary);
// write the content into xml file
TransformerFactory transformerFactory = TransformerFactory.newInstance();
Transformer transformer = transformerFactory.newTransformer();
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
DOMSource source = new DOMSource(doc);
StreamResult result = new StreamResult(new File("C:\\file.xml"));
// Output to console for testing
// StreamResult result = new StreamResult(System.out);
transformer.transform(source, result);
System.out.println("File saved!");
} catch (ParserConfigurationException pce) {
pce.printStackTrace();
} catch (TransformerException tfe) {
tfe.printStackTrace();
}
}
}
I think your code works fine. Put a double quote in an attribute value and see what happens.
Read section 2.4 of the XML specification. Production 14 of the grammar
[14] CharData ::= [^<&]* - ([^<&]* ']]>' [^<&]*)
tells you that character data can be any (valid XML) character except '<' and '&' (or the ']]>' sequence). It is not strictly necessary to escape '>', although recommended.

How do I extract child element from XML to a string in Java?

If I have an XML document like
<root>
<element1>
<child attr1="blah">
<child2>blahblah</child2>
<child>
</element1>
</root>
I want to get an XML string with the first child element. My output string would be
<element1>
<child attr1="blah">
<child2>blahblah</child2>
<child>
</element1>
There are many approaches, would like to see some ideas. I've been trying to use Java XML APIs for it, but it's not clear that there is a good way to do this.
thanks
You're right, with the standard XML API, there's not a good way - here's one example (may be bug ridden; it runs, but I wrote it a long time ago).
import javax.xml.*;
import javax.xml.parsers.*;
import javax.xml.transform.*;
import javax.xml.transform.dom.*;
import javax.xml.transform.stream.*;
import org.w3c.dom.*;
import java.io.*;
public class Proc
{
public static void main(String[] args) throws Exception
{
//Parse the input document
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse(new File("in.xml"));
//Set up the transformer to write the output string
TransformerFactory tFactory = TransformerFactory.newInstance();
Transformer transformer = tFactory.newTransformer();
transformer.setOutputProperty("indent", "yes");
StringWriter sw = new StringWriter();
StreamResult result = new StreamResult(sw);
//Find the first child node - this could be done with xpath as well
NodeList nl = doc.getDocumentElement().getChildNodes();
DOMSource source = null;
for(int x = 0;x < nl.getLength();x++)
{
Node e = nl.item(x);
if(e instanceof Element)
{
source = new DOMSource(e);
break;
}
}
//Do the transformation and output
transformer.transform(source, result);
System.out.println(sw.toString());
}
}
It would seem like you could get the first child just by using doc.getDocumentElement().getFirstChild(), but the problem with that is if there is any whitespace between the root and the child element, that will create a Text node in the tree, and you'll get that node instead of the actual element node. The output from this program is:
D:\home\tmp\xml>java Proc
<?xml version="1.0" encoding="UTF-8"?>
<element1>
<child attr1="blah">
<child2>blahblah</child2>
</child>
</element1>
I think you can suppress the xml version string if you don't need it, but I'm not sure on that. I would probably try to use a third party XML library if at all possible.
Since this is the top google answer and For those of you who just want the basic:
public static String serializeXml(Element element) throws Exception
{
ByteArrayOutputStream buffer = new ByteArrayOutputStream();
StreamResult result = new StreamResult(buffer);
DOMSource source = new DOMSource(element);
TransformerFactory.newInstance().newTransformer().transform(source, result);
return new String(buffer.toByteArray());
}
I use this for debug, which most likely is what you need this for
I would recommend JDOM. It's a Java XML library that makes dealing with XML much easier than the standard W3C approach.
public String getXML(String xmlContent, String tagName){
String startTag = "<"+ tagName + ">";
String endTag = "</"+ tagName + ">";
int startposition = xmlContent.indexOf(startTag);
int endposition = xmlContent.indexOf(endTag, startposition);
if (startposition == -1){
return "ddd";
}
startposition += startTag.length();
if(endposition == -1){
return "eee";
}
return xmlContent.substring(startposition, endposition);
}
Pass your xml as string to this method,and in your case pass 'element' as parameter tagname.
XMLBeans is an easy to use (once you get the hang of it) tool to deal with XML without having to deal with the annoyances of parsing.
It requires that you have a schema for the XML file, but it also provides a tool to generate a schema from an exisint XML file (depending on your needs the generated on is probably fine).
If your xml has schema backing it, you could use xmlbeans or JAXB to generate pojo objects that help you marshal/unmarshal xml.
http://xmlbeans.apache.org/
https://jaxb.dev.java.net/
As question is actually about first occurrence of string inside another string, I would use String class methods, instead of XML parsers:
public static String getElementAsString(String xml, String tagName){
int beginIndex = xml.indexOf("<" + tagName);
int endIndex = xml.indexOf("</" + tagName, beginIndex) + tagName.length() + 3;
return xml.substring(beginIndex, endIndex);
}
You can use following function to extract xml block as string by passing proper xpath expression,
private static String nodeToString(Node node) throws TransformerException
{
StringWriter buf = new StringWriter();
Transformer xform = TransformerFactory.newInstance().newTransformer();
xform.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
xform.transform(new DOMSource(node), new StreamResult(buf));
return(buf.toString());
}
public static void main(String[] args) throws Exception
{
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.parse(inputFile);
XPath xPath = XPathFactory.newInstance().newXPath();
Node result = (Node)xPath.evaluate("A/B/C", doc, XPathConstants.NODE); //"A/B[id = '1']" //"//*[#type='t1']"
System.out.println(nodeToString(result));
}

Categories