Can JAXP be used to create HTML5 documents? - java

Are there elements in the HTML5 specification which can not be created with a XML library such as JAXP? One example are named HTML entities which are not defined in XML. Are there other areas which are incompatible?

JAXP apparently only works on well formed XML. You'd need to convert the HTML to XHTML before subjecting it to the JAXP's standard parser.
// Create Transformer
TransformerFactory tf = TransformerFactory.newInstance();
StreamSource xslt = new StreamSource(
"src/blog/jaxbsource/xslt/stylesheet.xsl");
Transformer transformer = tf.newTransformer(xslt);
// Source
JAXBContext jc = JAXBContext.newInstance(Library.class);
JAXBSource source = new JAXBSource(jc, catalog);
// Result
StreamResult result = new StreamResult(System.out);
// Transform
transformer.transform(source, result);
Url:[https://dzone.com/articles/using-jaxb-xslt-produce-html][1]

Related

Large XSLT (~10000 lines) failed to transform input message (Xalan)

I am facing a classic problem, I have an XSLT with ~10000 lines and trying to use Java for transformation
TransformerFactory factory = TransformerFactory.newInstance();
Source xslt = new StreamSource(new File("convertor.xslt"));
Transformer transformer = factory.newTransformer(xslt);
Source text = new StreamSource(new File("input.xml"));
transformer.transform(text, new StreamResult(new File("output.xml")));
But unfortunately ended up with below error:
com.sun.org.apache.bcel.internal.generic.ClassGenException: Not targeting 45269: nop[0](1), but null
at java.xml/com.sun.org.apache.bcel.internal.generic.BranchInstruction.updateTarget(BranchInstruction.java:217)
at java.xml/com.sun.org.apache.xalan.internal.xsltc.compiler.util.MethodGenerator.outline(MethodGenerator.java:1721)
at java.xml/com.sun.org.apache.xalan.internal.xsltc.compiler.util.MethodGenerator.outlineChunks(MethodGenerator.java:1168)
Any idea where I am going wrong?

How to prevent self-closing <tags/> in XML?

I modify XML file using the Transformer class and transform method. It correctly modify my parameters but changed XML style (write XML attributes in different way):
Original:
<a struct="b"></a>
<c></c>
After edit:
<a struct="b"/>
<c/>
I know that I can set properties: transformer.setOutputProperty(OutputKeys.KEY,value), but I did not find proper settings.
Can anyone help the transformer not change the write format?
XMLReader xr = new XMLFilterImpl(XMLReaderFactory.createXMLReader()
Source src = new SAXSource(xr, new InputSource(new
StringReader(xmlArray[i])));
<<modify xml>>
TransformerFactory transFactory = TransformerFactory.newInstance();
Transformer transformer = transFactory.newTransformer();
transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION,"yes");
StringWriter buffer = new StringWriter();
transformer.transform(src, new StreamResult(buffer));
xmlArray[i] = buffer.toString();
Those forms are semantically equivalent. No conforming XML parser will care, and neither should you.

Disable caching in Javax xml transformer

In this article (https://www.ahoi-it.de/ahoi/news/java-xslt-memory-leak/4830) it is explained that Javax xml transformer caches XML contents to its internal HashMap for later use.
This is my issue: I am reading XML messages from activemq and if something fails, I retry to convert them again using Javax XML transformer and send them to certain endpoint. The problem is that eventually my Docker container restarts because it runs out of memory.
What I would like to do is disable caching, unfortunately, after 3 hours of research I still have no idea how to do so.
I have a utils class with static methods and this is how my Javax XML Transformer looks like:
public static String getTransformedXml(Object input, String transformerFileName)
throws IOException, TransformerException {
ClassPathResource classPathResource = new ClassPathResource(transformerFileName);
InputStream xsltStream = classPathResource.getInputStream();
TransformerFactory factory = TransformerFactory.newInstance();
Source xslt = new StreamSource(xsltStream);
Transformer transformer = factory.newTransformer(xslt);
transformer.setErrorListener(new XsltTransformerErrorListener(transformerFileName));
Source text = new StreamSource(new StringReader(XmlUtils.encode(input, input.getClass())));
StringWriter writer = new StringWriter();
StreamResult result = new StreamResult(writer);
transformer.transform(text, result);
return result.getWriter().toString();
}
Try to change the TransformerFactoryImpl :
TransformerFactory tFactory = TransformerFactory.newInstance("org.apache.xalan.processor.TransformerFactoryImpl",null);

DOM XML Public Doctype not appearing in result xml file

I have written a code to generate XML files. I am stuck at defining doctype for the XML as it should be public. I am able to get SYSTEM doctype successfully but somehow not able to get public doctype written in XML. Below code for SYSTEM doctype is working but same snippet for PUBLIC doctype is not working :
String xmldestpath = "C:/failed/tester.xml";
doctype2 = CreateDoctypeString();
StreamResult result = new StreamResult(new File(xmldestpath ));
try {
transformer = transformerFactory.newTransformer();
transformer.setOutputProperty(OutputKeys.DOCTYPE_SYSTEM,"TEST");
transformer.transform(source, result);
// logger.debug("COMPLETED Copying xml files /....!!");
System.out.println("COMPLETED Copying xml files to bulk import....!!");
Not working snippet. Its not giving error but no doctype is appearing in resultant xml:
String xmldestpath = "C:/failed/tester.xml";
doctype2 = CreateDoctypeString();
StreamResult result = new StreamResult(new File(xmldestpath ));
try {
transformer = transformerFactory.newTransformer();
transformer.setOutputProperty(OutputKeys.DOCTYPE_PUBLIC,"TEST");
transformer.transform(source, result);
// //logger.debug("COMPLETED Copying xml files /....!!");
System.out.println("COMPLETED Copying xml files to bulk import....!!");
If you know you need/want PUBLIC, perhaps you should know that a public literal cannot exist without a system literal.
The XML specification shows:
ExternalID ::= 'SYSTEM' S SystemLiteral
| 'PUBLIC' S PubidLiteral S SystemLiteral
So it should be easy to conclude that you need to specify both in order to get it to work, as demonstrated by this MCVE:
TransformerFactory transformerFactory = TransformerFactory.newInstance();
Transformer transformer = transformerFactory.newTransformer();
transformer.setOutputProperty(OutputKeys.DOCTYPE_PUBLIC, "TEST1");
transformer.setOutputProperty(OutputKeys.DOCTYPE_SYSTEM, "TEST2");
transformer.transform(new StreamSource(new StringReader("<Root></Root>")),
new StreamResult(System.out));
Output
<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE Root PUBLIC "TEST1" "TEST2">
<Root/>

Remove the XML header from an XML in Java

StringWriter writer = new StringWriter();
XmlSerializer serializer = new KXmlSerializer();
serializer.setOutput(writer);
serializer.startDocument(null, null);
serializer.setFeature("http://xmlpull.org/v1/doc/features.html#indent-output", true);
// Creating XML
serializer.endDocument();
String xmlString = writer.toString();
In the above environment, whether there are any standard API's available to remove the XML header <?xml version='1.0' ?> or do you suggest to go via string manipulation:
if (s.startsWith("<?xml ")) {
s = s.substring(s.indexOf("?>") + 2);
}
Wanted the output in the xmlString without XML header info <?xml version='1.0' ?>.
Ideally you can make an API call to exclude the XML header if desired. It doesn't appear that KXmlSerializer supports this though (skimming through the code here). If you had a org.w3c.dom.Document (or actually any other implementation of javax.xml.transform.Source) you could accomplish what you want this way:
TransformerFactory tf = TransformerFactory.newInstance();
Transformer transformer = tf.newTransformer();
transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
StringWriter writer = new StringWriter();
transformer.transform(new DOMSource(doc), new StreamResult(writer));
Otherwise if you have to use KXmlSerializer it looks like you'll have to manipulate the output.
If you use a JAXP serializer you get access to all the output properties defined in XSLT, for example omit-xml-declaration="yes". You can get this in the form of an "identity transformer", called using transformerFactory.getTransformer() with no parameters, on which you then call setOutputProperty(). Another example:
TransformerFactory tf = TransformerFactory.newInstance();
Transformer t = tf.newTransformer();
t.setOutputProperty("omit-xml-declaration", "yes");
Don't make call to:
serializer.startDocument();
It adds the XML header, though you need to call:
serializer.endDocument();
else your XML will be created as a blank String.

Categories