Adding linebreak in xml file before root node - java

I am trying to add line break after my comments above the root node in XML document.
I need something like this:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!--DO NOT EDIT THIS FILE-->
<projects>
</projects>
But What I was able to get is this(Line break inside the root but I need line break after the comment):
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!--DO NOT EDIT THIS FILE--><projects>
</projects>
I need to add the line break just after my comment. Is there a way to do this?
My code:
import java.io.File;
import java.io.FileInputStream;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.transform.OutputKeys;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import org.w3c.dom.Comment;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.Text;
public class XMLNewLine {
/**
* #param args
*/
public static void main(String[] args) {
System.out.println("Adding comment..");
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setValidating(false);
DocumentBuilder db;
try {
Document doc;
StreamResult result;
result = new StreamResult(new File("abc.xml"));
db = dbf.newDocumentBuilder();
doc = db.parse(new FileInputStream(new File("abc.xml")));
Element element = doc.getDocumentElement();
Text lineBreak = doc.createTextNode("\n");
element.appendChild(lineBreak);
Comment comment = doc
.createComment("DO NOT EDIT THIS FILE");
element.getParentNode().insertBefore(comment, element);
doc.getDocumentElement().normalize();
TransformerFactory transformerFactory = TransformerFactory
.newInstance();
Transformer transformer = transformerFactory.newTransformer();
DOMSource source = new DOMSource(doc);
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
transformer.transform(source, result);
} catch (Exception e) {
// TODO Auto-generated catch block
}
}
}

You basically want a text node containing a line break after the comment node.
Element docElem = doc.getDocumentElement();
doc.insertBefore(doc.createComment("DO NOT EDIT THIS FILE"), docElem);
doc.insertBefore(doc.createTextNode("\\n"), docElem);
EDIT: It seems that appending even whitespace-only text nodes is not allowed at the root node of an org.w3c.dom.Document. This is 100% formally correct, but also unhelpful.
The way comments are rendered in the output of the Transformer is determined by the serializer it uses (there are different serializers for HTML, XML and plain text outputs). In the built-in XML serializer the end of a comment is defined as --> - without a newline.
Since the internals of javax.xml.transform.Transformer are hard-wired, the serializers are not public API and the class is marked as final, overriding that behavior or setting a custom serializer is impossible.
In other words, you are out of luck adding your line break in a clean way.
You can, however, safely add it in a slightly unclean way:
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
FileInputStream inputXml = new FileInputStream(new File("input.xml"));
Document doc = db.parse(inputXml);
// add the comment node
doc.insertBefore(doc.createComment("THIS IS A COMMENT"), doc.getDocumentElement());
StringWriter outputXmlStringWriter = new StringWriter();
Transformer transformer = transformerFactory.newTransformer();
// "xml" + "UTF-8" "include XML declaration" is the default anyway, but let's be explicit
transformer.setOutputProperty(OutputKeys.METHOD, "xml");
transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "no");
transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
transformer.transform(new DOMSource(doc), new StreamResult(outputXmlStringWriter));
// now insert our newline into the string & write an UTF-8 file
String outputXmlString = outputXmlStringWriter.toString()
.replaceFirst("<!--", "\n<!--").replaceFirst("-->", "-->\n");
FileOutputStream outputXml = new FileOutputStream(new File("output.xml"));
outputXml.write(outputXmlString.getBytes("UTF-8"));
Doing search-and-replace operations on XML strings is highly discouraged in general, but in this case there is little that can go wrong.

Revisiting this after some time because I had the same issue. I found another solution that does not need to buffer the output in a String:
Write only the XML-declaration by passing an empty document. This will also append a linebreak.
Write the document content without XML-declaration
Code:
StreamResult streamResult = new StreamResult(writer);
// output XML declaration with an empty document
transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "no");
transformer.transform(new DOMSource(), streamResult);
// output the document without XML declaration
transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
transformer.transform(new DOMSource(doc), streamResult);

You can achieve this by not adding the comment node to your document, but instead partially transforming your document. First transform your own XML processing instruction and comment separately, and then the rest of document:
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(new FileInputStream(new File("abc.xml")));
Result output = new StreamResult(new File("abc.xml"));
Source input = new DOMSource(doc);
// xml processing instruction and comment node
ProcessingInstruction xmlpi = doc.createProcessingInstruction("xml", "version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"");
Comment comment = doc.createComment("DO NOT EDIT THIS FILE");
// first transform the processing instruction and comment
transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
transformer.transform(new DOMSource(xmlpi), output);
transformer.transform(new DOMSource(comment), output);
// then the document
transformer.transform(input, output);

There is a JDK bug concerning this. It was not fixed (as you would expect) because that would likely cause many problems to users' existing applications.
Adding the following output property fixes this:
transformer.setOutputProperty("http://www.oracle.com/xml/is-standalone", "yes");

Had the same issue.
I solved it by putting the comment inside the root element.
Not exactly the same, but I think acceptable.

This is my solution. I just take writer and write to it declaration and the header comment. After that I disable declaration in transformer this way
transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
All code:
public static String xmlToTree(String xml, String headerComment) {
try (StringReader reader = new StringReader(xml)) {
StreamResult result = new StreamResult(new StringWriter());
result.getWriter().write("<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n");
result.getWriter().write(headerComment + "\n");
Transformer transformer = TransformerFactory.newInstance().newTransformer();
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "4");
StreamSource source = new StreamSource(reader);
transformer.transform(source, result);
String xmlTree = result.getWriter().toString();
return xmlTree;
} catch (Exception ex) {
ex.printStackTrace();
return null;
}
}

Related

Java XML api removes whitespace before self closing tag

I've XML file which contains only one element
<Message>
<Location URI ="XXX:XXX:XXX" />
</Message>
I want to read and print same XML using Java, but after print it loses white space before />
<Message>
<Location URI ="XXX:XXX:XXX"/>
</Message>
I have tried different configuration of DocumentBuilderFactory and Transformer but the result is same.
Any Idea?
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document requestDocument = builder.parse(this.getClass().getResourceAsStream("/message-template.xml"));
TransformerFactory tf = TransformerFactory.newInstance();
Transformer transformer = tf.newTransformer();
transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
DOMSource domSource = new DOMSource(requestDocument);
StringWriter writer = new StringWriter();
StreamResult result = new StreamResult(writer);
transformer.transform(domSource, result);
System.out.println(writer.toString());
Here :
DOMSource domSource = new DOMSource(requestDocument);
...
transformer.transform(domSource, result);
You transform a DOMSource into a StreamResult . A DOMSource is not not a textual representation of the XML file but a Document Object Model (DOM) tree.
So whitespaces that are not considered as relevant to represent the content of the tree are not kept in the DOMSource :
URI ="XXX:XXX:XXX" />
|-------> not preserved
Most of APIs to represent and manipulate XML work in this way.
If you need to keep not significant whitespace in your result, you should probably do yourself the parsing of the XML file.

Converting emoji to HTML Decimal Code or Unicode Hexadecimal Code in java

I am trying to convert text file with emoji content to the file with emoji's html code or Hex code using Java.
example :
I/p : <div id="thread" style="white-space: pre-wrap;"><div>😀😀😃🍎🍏⚽️🏀
Expected o/p :<div id="thread" style="white-space: pre-wrap;"><div>😀😀😃🍎🍏⚽️🏀
In above out put '😃' should get changed to the corresponding html entity code'& # 128512;'
Detail of Html entity code and hex code is given here :
http://character-code.com/emoticons-html-codes.php
Sample code that i tried is below :
try {
File file = new File("/inFile.txt");
str = FileUtils.readFileToString(file, "ISO-8859-1");
System.out.println(new String(str.getBytes(), "UTF-8"));
String results = StringEscapeUtils.escapeHtml4(str);
System.out.println(results);
} catch (IOException e) {
e.printStackTrace();
}
I got the work around :
public static void htmlDecimalCodeGenerator () {
DocumentBuilderFactory domFactory = DocumentBuilderFactory.newInstance();
domFactory.setValidating(false);
// File inputFile = new File("/inputFile.xml");
File inputFile = new File("/inputFile.xml");
try {
FileOutputStream fop = null;
File OutFile = new File("/outputFile.xml");
fop = new FileOutputStream(OutFile);
DocumentBuilder builder = domFactory.newDocumentBuilder();
Document doc = builder.parse(inputFile);
TransformerFactory tf = TransformerFactory.newInstance();
Transformer transformer = tf.newTransformer();
/*
no value of OMIT_XML_DECLARATION will add following xml declaration in the beginning of the file.
<?xml version='1.0' encoding='UTF-32'?>
*/
transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
/*
When the output method is "xml", the version value specifies the
version of XML to be used for outputting the result tree. The default
value for the xml output method is 1.0. When the output method is
"html", the version value indicates the version of the HTML.
The default value for the xml output method is 4.0, which specifies
that the result should be output as HTML conforming to the HTML 4.0
Recommendation [HTML]. If the output method is "text", the version
property is ignored
*/
transformer.setOutputProperty(OutputKeys.METHOD, "xml");
/*
Indent-- specifies whether the Transformer may
add additional whitespace when outputting the result tree; the value
must be yes or no.
*/
transformer.setOutputProperty(OutputKeys.INDENT, "no");
transformer.setOutputProperty(OutputKeys.ENCODING, "ISO-8859-1");
// transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "4");
transformer.transform(new DOMSource(doc),
new StreamResult(new OutputStreamWriter(System.out, "UTF-8")));
// new StreamResult(new OutputStreamWriter(fop, "UTF-8")));
} catch (Exception e) {
e.printStackTrace();
}
}
}

Java Transformer outputs < and > instead of <>

I am editing an XML file in Java with a Transformer by adding more nodes. The old XML code is unchanged but the new XML nodes have < and > instead of <> and are on the same line. How do I get <> instead of < and > and how do I get line breaks after the new nodes. I already read several similar threads but wasn't able to get the right formatting. Here is the relevant portion of the code:
// Read the XML file
DocumentBuilderFactory dbf= DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc=db.parse(xmlFile.getAbsoluteFile());
Element root = doc.getDocumentElement();
// create a new node
Element newNode = doc.createElement("Item");
// add it to the root node
root.appendChild(newNode);
// create a new attribute
Attr attribute = doc.createAttribute("Name");
// assign the attribute a value
attribute.setValue("Test...");
// add the attribute to the new node
newNode.setAttributeNode(attribute);
// transform the XML
Transformer transformer = TransformerFactory.newInstance().newTransformer();
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
transformer.setOutputProperty(OutputKeys.METHOD, "xml");
StreamResult result = new StreamResult(new FileWriter(xmlFile.getAbsoluteFile()));
DOMSource source = new DOMSource(doc);
transformer.transform(source, result);
Thanks
To replace the &gt and other tags you can use org.apache.commons.lang3:
StringEscapeUtils.unescapeXml(resp.toString());
After that you can use the following property of transformer for having line breaks in your xml:
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
based on a question posted here:
public void writeToOutputStream(Document fDoc, OutputStream out) throws Exception {
fDoc.setXmlStandalone(true);
DOMSource docSource = new DOMSource(fDoc);
Transformer transformer = TransformerFactory.newInstance().newTransformer();
transformer.setOutputProperty(OutputKeys.METHOD, "xml");
transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
transformer.setOutputProperty(OutputKeys.INDENT, "no");
transformer.transform(docSource, new StreamResult(out));
}
produces:
<?xml version="1.0" encoding="UTF-8"?>
The differences I see:
fDoc.setXmlStandalone(true);
transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
Try passing InputStream instead of Writer to StreamResult.
StreamResult result = new StreamResult(new FileInputStream(xmlFile.getAbsoluteFile()));
The Transformer documentation also suggests that.

How to avoid encoding of <,>,& with Document.createTextNode

class XMLencode
{
public static void main(String[] args)
{
try{
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder docBuilder = factory.newDocumentBuilder();
Document doc = docBuilder.newDocument();
Element root = doc.createElement("roseindia");
doc.appendChild(root);
Text elmnt=doc.createTextNode("<data>sun</data><abcdefg/><end/>");
root.appendChild(elmnt);
TransformerFactory tranFactory = TransformerFactory.newInstance();
Transformer aTransformer = tranFactory.newTransformer();
Source src = new DOMSource(doc);
Result dest = new StreamResult(System.out);
aTransformer.transform(src, dest);
}catch(Exception e){
System.out.println(e.getMessage());
}
}
}
Here is my above piece of code.
The output generated is like this
<?xml version="1.0" encoding="UTF-8" standalone="no"?><roseindia><data>sun</data><abcdefg/><end/></roseindia>
I dont want the tags to be encoded. I need the output in this fashion.
<?xml version="1.0" encoding="UTF-8" standalone="no"?><roseindia><data>sun</data><abcdefg/><end/></roseindia>
Please help me on this.
Thanks,
Mohan
Short Answer
You could leverage the CDATA mechanism in XML to prevent characters from being escaped. Below is an example of the DOM code:
doc.createCDATASection("<foo/>");
The content will be:
<![CDATA[<foo/>]]>
LONG ANSWER
Below is a complete example of leveraging a CDATA section using the DOM APIs.
package forum12525152;
import javax.xml.parsers.*;
import javax.xml.transform.*;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import org.w3c.dom.*;
public class Demo {
public static void main(String[] args) throws Exception {
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document document = db.newDocument();
Element rootElement = document.createElement("root");
document.appendChild(rootElement);
// Create Element with a Text Node
Element fooElement = document.createElement("foo");
fooElement.setTextContent("<foo/>");
rootElement.appendChild(fooElement);
// Create Element with a CDATA Section
Element barElement = document.createElement("bar");
CDATASection cdata = document.createCDATASection("<bar/>");
barElement.appendChild(cdata);
rootElement.appendChild(barElement);
TransformerFactory tf = TransformerFactory.newInstance();
Transformer t = tf.newTransformer();
DOMSource source = new DOMSource(document);
StreamResult result = new StreamResult(System.out);
t.transform(source, result);
}
}
Output
Note the difference in the foo and bar elements even though they have similar content. I have formatted the result of running the demo code to make it more readable:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<root>
<foo><foo/></foo>
<bar><![CDATA[<bar/>]]></bar>
</root>
Instead of writing like this doc.createTextNode("<data>sun</data><abcdefg/><end/>");
You should create each element.
import javax.xml.parsers.*;
import javax.xml.transform.*;
import javax.xml.transform.dom.*;
import javax.xml.transform.stream.*;
import org.w3c.dom.*;
class XMLencode {
public static void main(String[] args) {
try {
DocumentBuilderFactory factory = DocumentBuilderFactory
.newInstance();
DocumentBuilder docBuilder = factory.newDocumentBuilder();
Document doc = docBuilder.newDocument();
Element root = doc.createElement("roseindia");
doc.appendChild(root);
Element data = doc.createElement("data");
root.appendChild(data);
Text elemnt = doc.createTextNode("sun");
data.appendChild(elemnt);
Element data1 = doc.createElement("abcdefg");
root.appendChild(data1);
//Text elmnt = doc.createTextNode("<data>sun</data><abcdefg/><end/>");
//root.appendChild(elmnt);
TransformerFactory tranFactory = TransformerFactory.newInstance();
Transformer aTransformer = tranFactory.newTransformer();
Source src = new DOMSource(doc);
Result dest = new StreamResult(System.out);
aTransformer.transform(src, dest);
} catch (Exception e) {
System.out.println(e.getMessage());
}
}
}
You can use the doc.createTextNode and use a workaround (long) for the escaped characters.
SOAPMessage msg = messageContext.getMessage();
header.setTextContent(seched);
Then use
Source src = msg.getSOAPPart().getContent();
To get the content, the transform it to string
TransformerFactory tf = TransformerFactory.newInstance();
Transformer transformer = tf.newTransformer();
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
transformer. setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
StreamResult result1 = new StreamResult(new StringWriter());
transformer.transform(src, result1);
Replace the string special characters
String xmlString = result1.getWriter().toString()
.replaceAll("<", "<").
replaceAll(">", ">");
System.out.print(xmlString);
the oposite string to dom with the fixed escaped characters
DocumentBuilder db = DocumentBuilderFactory.newInstance().newDocumentBuilder();
InputSource is = new InputSource();
is.setCharacterStream(new StringReader(xmlString));
Document doc = db.parse(is);
Source src123 = new DOMSource(doc);
Then set it back to the soap message
msg.getSOAPPart().setContent(src123);
Don't use createTextNode - the whole point of it is to insert some text (as data) into the document, not a fragment of raw XML.
Use a combination of createTextNode for the text and createElement for the elements.
I dont want the tags to be encoded. I need the output in this fashion.
Then you don't want a text node at all - which is why createTextNode isn't working for you. (Or rather, it's working fine - it's just not doing what you want). You should probably just parse your XML string, then import the document node from the result into your new document.
Of course, if you know the elements beforehand, don't express them as text in the first place - use a mixture of createElement, createAttribute, createTextNode and appendChild to create the structure.
It's entirely possible that something like JDOM will make this simpler, but that's the basic approach.
Mohan,
You can't use Document.createTextNode(). That methos transforms (or escapes) the charactes in your XML.
Instead, you need to build two separate Documents from the 2 XML's and use importNode.
I use Document.importNode() like this to solve my problem:
Build your builders:
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = dbf.newDocumentBuilder();
Document oldDoc = builder.parse(isOrigXml); //this is XML as InputSource
Document newDoc = builder.parse(isInsertXml); //this is XML as InputSource
Next, build a NodeList of the Element/Node you want to import. Create a Node from the NodeList. Create another Node of what you are going to import using importNode. Build the last Node of the final XML as such:
NodeList nl = newDoc.getElementByTagName("roseindia"); //or whatever the element name is
Node xmlToInsert = nl.item(0);
Node importNode = oldDoc.importNode(xmlToImport, true);
Node target = ((NodeList) oldDoc.getElementsByTagName("ELEMENT_NAME_OF_LOCATION")).item(0);
target.appendChild(importNode);
Source source = new DOMSource(target);
....
The rest is standard Transformer - StringWriter to StreamResult stuff to get the results.

How to remove standalone attribute declaration in xml document?

Im am currently creating an xml using Java and then I transform it into a String. The xml declaration is as follows:
DocumentBuilderFactory dbfac = DocumentBuilderFactory.newInstance();
DocumentBuilder docBuilder = dbfac.newDocumentBuilder();
Document doc = docBuilder.newDocument();
doc.setXmlVersion("1.0");
For transforming the document into String, I include the following declaration:
TransformerFactory transfac = TransformerFactory.newInstance();
Transformer trans = transfac.newTransformer();
trans.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "no");
trans.setOutputProperty(OutputKeys.VERSION, "1.0");
trans.setOutputProperty(OutputKeys.ENCODING,"UTF-8");
trans.setOutputProperty(OutputKeys.INDENT, "yes");
And then I do the transformation:
StringWriter sw = new StringWriter();
StreamResult result = new StreamResult(sw);
DOMSource source = new DOMSource(doc);
trans.transform(source, result);
String xmlString = sw.toString();
The problem is that in the XML Declaration attributes, the standalone attribute is included and I don't want that, but I want the version and encoding attributes to appear:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
Is there any property where that could be specified?
From what I've read you can do this by calling the below method on Document before creating the DOMSource:
doc.setXmlStandalone(true); //before creating the DOMSource
If you set it false you cannot control it to appear or not. So setXmlStandalone(true) on Document. In transformer if you want an output use OutputKeys with whatever "yes" or "no" you need. If you setXmlStandalone(false) on Document your output will be always standalone="no" no matter what you set (if you set) in Transformer.
Read the thread in this forum

Categories