How to unescape string in XML using Transformer? - java

I've a function which takes a XML document as parameter and writes it to the file. It contains element as <tag>"some text & some text": <text> text</tag> but in output file it's written as <tag>"some text & some text": <text> text</tag> But I don't want string to be escaped while writing to the file.
Function is,
public static void function(Document doc, String fileUri, String randomId){
DOMSource source = new DOMSource(doc,ApplicationConstants.ENC_UTF_8);
FileWriterWithEncoding writer = null;
try {
File file = new File(fileUri+File.separator+randomId+".xml");
if (!new File(fileUri).exists()){
new File(fileUri).mkdirs();
}
writer = new FileWriterWithEncoding(new File(file.toString()),ApplicationConstants.ENC_UTF_8);
StreamResult result = new StreamResult(writer);
TransformerFactory transformerFactory = TransformerFactory.newInstance();
Transformer transformer = null;
transformer = transformerFactory.newTransformer();
transformer.setParameter(OutputKeys.INDENT, "yes");
transformer.transform(source, result);
writer.close();
transformer.clearParameters();
}catch (IOException | TransformerException e) {
log.error("convert Exception is :"+ e);
}
}

There are five escape characters in XML ("'<>&). According to XML grammar, they must be escaped in certain places in XML, please see this question:
What characters do I need to escape in XML documents?
So you can't to much, for instance, to avoid escaping & or < in text content.
You could use CDATA sections if you want to retain "unescaped" content. Please see this question:
Add CDATA to an xml file

Related

" is auto converting to " through Document & Transformer API

I am loading xml file (pom.xml) through org.w3c.dom.Document and editing some node's value (basically changing the version value of some dependency) through javax.xml.transform.Transformer, javax.xml.transform.TransformerFactory
& javax.xml.transform.dom.DOMSource.
But problem is that, this also convert all occurrence of " to " character, which I don't want. See below sample:
<Export-Package>!${bundle.namespace}.internal.*,${bundle.namespace}.*;version="${project.version}"</Export-Package>
converted to:
<Export-Package>!${bundle.namespace}.internal.*,${bundle.namespace}.*;version="${project.version}"</Export-Package>
Please help on this, how I can ignore these auto conversion with currently consumed API.
Code Sample:
public void writeDocument(File filePath)
{
TransformerFactory transformerFactory = TransformerFactory.newInstance();
this.thisDoc.getDocumentElement().normalize();
Transformer transformer;
try
{
transformer = transformerFactory.newTransformer();
DOMSource source = new DOMSource(thisDoc);
StreamResult result = new StreamResult(filePath);
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
transformer.transform(source, result);
}
catch (TransformerException e)
{
VersionUpdateExceptions.throwException(e, LOG);
}
}
This is the required behavior by the Document Object Model (DOM) Level 3 Load and Save Specification:
Within the character data of a document (outside of markup), any
characters that cannot be represented directly are replaced with
character references. Occurrences of '<' and '&' are replaced by the
predefined entities < and &. The other predefined entities
(>, ', and ") might not be used, except where needed
(e.g. using > in cases such as ']]>').
For example, if you use " inside an attribute:
<Export-Package id=""test"">
" will be preserved. Otherwise, it won't.
If absolutely necessary you could achieve the preserving of """ with an ugly hack.
Read the pom.xml as a String and replace ocurrences of " by some "marker" string
To parse the document use an StringReader to create an InputSource
Execute your method, but creating a StreamResult with a StringWriter.
Get the content from the StringWriter as a String and replace your marker string with "
Save the content to the file

DOM XML Public Doctype not appearing in result xml file

I have written a code to generate XML files. I am stuck at defining doctype for the XML as it should be public. I am able to get SYSTEM doctype successfully but somehow not able to get public doctype written in XML. Below code for SYSTEM doctype is working but same snippet for PUBLIC doctype is not working :
String xmldestpath = "C:/failed/tester.xml";
doctype2 = CreateDoctypeString();
StreamResult result = new StreamResult(new File(xmldestpath ));
try {
transformer = transformerFactory.newTransformer();
transformer.setOutputProperty(OutputKeys.DOCTYPE_SYSTEM,"TEST");
transformer.transform(source, result);
// logger.debug("COMPLETED Copying xml files /....!!");
System.out.println("COMPLETED Copying xml files to bulk import....!!");
Not working snippet. Its not giving error but no doctype is appearing in resultant xml:
String xmldestpath = "C:/failed/tester.xml";
doctype2 = CreateDoctypeString();
StreamResult result = new StreamResult(new File(xmldestpath ));
try {
transformer = transformerFactory.newTransformer();
transformer.setOutputProperty(OutputKeys.DOCTYPE_PUBLIC,"TEST");
transformer.transform(source, result);
// //logger.debug("COMPLETED Copying xml files /....!!");
System.out.println("COMPLETED Copying xml files to bulk import....!!");
If you know you need/want PUBLIC, perhaps you should know that a public literal cannot exist without a system literal.
The XML specification shows:
ExternalID ::= 'SYSTEM' S SystemLiteral
| 'PUBLIC' S PubidLiteral S SystemLiteral
So it should be easy to conclude that you need to specify both in order to get it to work, as demonstrated by this MCVE:
TransformerFactory transformerFactory = TransformerFactory.newInstance();
Transformer transformer = transformerFactory.newTransformer();
transformer.setOutputProperty(OutputKeys.DOCTYPE_PUBLIC, "TEST1");
transformer.setOutputProperty(OutputKeys.DOCTYPE_SYSTEM, "TEST2");
transformer.transform(new StreamSource(new StringReader("<Root></Root>")),
new StreamResult(System.out));
Output
<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE Root PUBLIC "TEST1" "TEST2">
<Root/>

Converting emoji to HTML Decimal Code or Unicode Hexadecimal Code in java

I am trying to convert text file with emoji content to the file with emoji's html code or Hex code using Java.
example :
I/p : <div id="thread" style="white-space: pre-wrap;"><div>😀😀😃🍎🍏⚽️🏀
Expected o/p :<div id="thread" style="white-space: pre-wrap;"><div>😀😀😃🍎🍏⚽️🏀
In above out put '😃' should get changed to the corresponding html entity code'& # 128512;'
Detail of Html entity code and hex code is given here :
http://character-code.com/emoticons-html-codes.php
Sample code that i tried is below :
try {
File file = new File("/inFile.txt");
str = FileUtils.readFileToString(file, "ISO-8859-1");
System.out.println(new String(str.getBytes(), "UTF-8"));
String results = StringEscapeUtils.escapeHtml4(str);
System.out.println(results);
} catch (IOException e) {
e.printStackTrace();
}
I got the work around :
public static void htmlDecimalCodeGenerator () {
DocumentBuilderFactory domFactory = DocumentBuilderFactory.newInstance();
domFactory.setValidating(false);
// File inputFile = new File("/inputFile.xml");
File inputFile = new File("/inputFile.xml");
try {
FileOutputStream fop = null;
File OutFile = new File("/outputFile.xml");
fop = new FileOutputStream(OutFile);
DocumentBuilder builder = domFactory.newDocumentBuilder();
Document doc = builder.parse(inputFile);
TransformerFactory tf = TransformerFactory.newInstance();
Transformer transformer = tf.newTransformer();
/*
no value of OMIT_XML_DECLARATION will add following xml declaration in the beginning of the file.
<?xml version='1.0' encoding='UTF-32'?>
*/
transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
/*
When the output method is "xml", the version value specifies the
version of XML to be used for outputting the result tree. The default
value for the xml output method is 1.0. When the output method is
"html", the version value indicates the version of the HTML.
The default value for the xml output method is 4.0, which specifies
that the result should be output as HTML conforming to the HTML 4.0
Recommendation [HTML]. If the output method is "text", the version
property is ignored
*/
transformer.setOutputProperty(OutputKeys.METHOD, "xml");
/*
Indent-- specifies whether the Transformer may
add additional whitespace when outputting the result tree; the value
must be yes or no.
*/
transformer.setOutputProperty(OutputKeys.INDENT, "no");
transformer.setOutputProperty(OutputKeys.ENCODING, "ISO-8859-1");
// transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "4");
transformer.transform(new DOMSource(doc),
new StreamResult(new OutputStreamWriter(System.out, "UTF-8")));
// new StreamResult(new OutputStreamWriter(fop, "UTF-8")));
} catch (Exception e) {
e.printStackTrace();
}
}
}

How to unformat xml file

I have a method which returns a String with a formatted xml. The method reads the xml from a file on the server and parses it into the string:
Esentially what the method currently does is:
private ServletConfig config;
InputStream xmlIn = null ;
xmlIn = config.getServletContext().getResourceAsStream(filename + ".xml") ;
String xml = IOUtils.toString(xmlIn);
IOUtils.closeQuietly(xmlIn);
return xml;
What I need to do is add a new input argument, and based on that value, continue returning the formatted xml, or return unformatted xml.
What I mean with formatted xml is something like:
<xml>
<root>
<elements>
<elem1/>
<elem2/>
<elements>
<root>
</xml>
And what I mean with unformatted xml is something like:
<xml><root><elements><elem1/><elem2/><elements><root></xml>
or:
<xml>
<root>
<elements>
<elem1/>
<elem2/>
<elements>
<root>
</xml>
Is there a simple way to do this?
Strip all newline characters with String xml = IOUtils.toString(xmlIn).replace("\n", ""). Or \t to keep several lines but without indentation.
if you are sure that the formatted xml like:
<xml>
<root>
<elements>
<elem1/>
<elem2/>
<elements>
<root>
</xml>
you can replace all group 1 in ^(\s*)< to "". in this way, the text in xml won't be changed.
an empty transformer with a parameter setting the indent params like so
public static String getStringFromDocument(Document dom, boolean indented) {
String signedContent = null;
try {
StringWriter sw = new StringWriter();
DOMSource domSource = new DOMSource(dom);
TransformerFactory tf = new TransformerFactoryImpl();
Transformer trans = tf.newTransformer();
trans = tf.newTransformer();
trans.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
trans.setOutputProperty(OutputKeys.INDENT, indented ? "yes" : "no");
trans.transform(domSource, new StreamResult(sw));
sw.flush();
signedContent = sw.toString();
} catch (TransformerException e) {
e.printStackTrace();
}
return signedContent;
}
works for me.
the key lies in this line
trans.setOutputProperty(OutputKeys.INDENT, indented ? "yes" : "no");
Try something like the following:
TransformerFactory factory = TransformerFactory.newInstance();
Transformer transformer = factory.newTransformer(
new StreamSource(new StringReader(
"<xsl:stylesheet version=\"1.0\"" +
" xmlns:xsl=\"http://www.w3.org/1999/XSL/Transform\">" +
"<xsl:output method=\"xml\" omit-xml-declaration=\"yes\"/>" +
" <xsl:strip-space elements=\"*\"/>" +
" <xsl:template match=\"#*|node()\">" +
" <xsl:copy>" +
" <xsl:apply-templates select=\"#*|node()\"/>" +
" </xsl:copy>" +
" </xsl:template>" +
"</xsl:stylesheet>"
))
);
Source source = new StreamSource(new StringReader("xml string here"));
StreamResult result = new StreamResult(System.out);
transformer.transform(source, result);
Instead of source being StreamSource in the second instance, it can also be DOMSource if you have an in-memory Document, if you want to modify the DOM before saving.
DOMSource source = new DOMSource(document);
To read an XML file into a Document object:
File file = new File("c:\\MyXMLFile.xml");
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(file);
doc.getDocumentElement().normalize();
Enjoy :)
If you fancy trying your hand with JAXB then the marshaller has a handy property for setting whether to format (use new lines and indent) the output or not.
JAXBContext jc = JAXBContext.newInstance(packageName);
Marshaller m = jc.createMarshaller();
m.setProperty(Marshaller.JAXB_FORMATTED_OUTPUT, Boolean.TRUE);
ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
m.marshal(element, outputStream);
Quite an overhead to get to that stage though... perhaps a good option if you already have a solid xsd
You can:
1) remove all consecutive whitespaces (but not single whitespace) and then replace all >(whitespace)< by ><
applicable only if usefull content does not have multiple consecutive significant whitespaces
2) read it in some dom tree and serialize it using some nonpretty serialization
SAXReader reader = new SAXReader();
Reader r = new StringReader(data);
Document document = reader.read(r);
OutputFormat format = OutputFormat.createCompactFormat();
StringWriter sw = new StringWriter();
XMLWriter writer = new XMLWriter(sw, format);
writer.write(document);
String string = writer.toString();
3) use Canonicalization (but you must somehow explain to it that those whitespaces you want to remove are insignificant)
Kotlin.
An indentation will usually come after new line and formatted as one space or more. Hence, to make everything in the same column, we will replace all of the new lines, following one or more spaces:
xmlTag = xmlTag.replace("(\n +)".toRegex(), " ")

How to 'transform' a String object (containing XML) to an element on an existing JSP page

Currently, I have a String object that contains XML elements:
String carsInGarage = garage.getCars();
I now want to pass this String as an input/stream source (or some kind of source), but am unsure which one to choose and how to implement it.
Most of the solutions I have looked at import the package: javax.xml.transform and accept a XML file (stylerXML.xml) and output to a HTML file (outputFile.html) (See code below).
try
{
TransformerFactory tFactory = TransformerFactory.newInstance();
Transformer transformer = tFactory.newTransformer(new StreamSource("styler.xsl"));
transformer.transform(new StreamSource("stylerXML.xml"), new StreamResult(new FileOutputStream("outputFile.html")));
}
catch (Exception e)
{
e.printStackTrace();
}
I want to accept a String object and output (using XSL) to a element within an existing JSP page. I just don't know how to implement this, even having looked at the code above.
Can someone please advise/assist. I have searched high and low for a solution, but I just can't pull anything out.
Use a StringReader and a StringWriter:
try {
StringReader reader = new StringReader("<xml>blabla</xml>");
StringWriter writer = new StringWriter();
TransformerFactory tFactory = TransformerFactory.newInstance();
Transformer transformer = tFactory.newTransformer(
new javax.xml.transform.stream.StreamSource("styler.xsl"));
transformer.transform(
new javax.xml.transform.stream.StreamSource(reader),
new javax.xml.transform.stream.StreamResult(writer));
String result = writer.toString();
} catch (Exception e) {
e.printStackTrace();
}
If at some point you want the source to contain more than just a single string, or you don't want to generate the XML wrapper element manually, create a DOM document that contains your source and pass it to the transformer using a DOMSource.
This worked for me.
String str = "<my>xml</my>"
StreamSource src = new StreamSource(new StringReader(str));
ByteArrayOutputStream baos = new ByteArrayOutputStream();
Result res = new StreamResult(baos);
transformer.transform(src, res);

Categories