String manipulations for javax.xml.transform.Source - java

I am using Java and XSL style sheets to retrieve values from an XML file and outputting it to a text file.
Below is the program used:
TransformerFactory factory = TransformerFactory.newInstance();
Source xslt = new StreamSource(new File("transform.xsl"));
Transformer transformer = factory.newTransformer(xslt);
Source text = new StreamSource(new File("inputXML.txt"));
transformer.transform(text, new StreamResult(new File("output.txt"))) ;
But recently I found that the XML files I will be reading will have 2 root nodes and not one. So I am thinking of doing string manipulation to add a root node of my own programatically so that I can avoid the below error:
ERROR: 'The markup in the document following the root element must
be well-formed.' ERROR:
'com.sun.org.apache.xml.internal.utils.WrappedRuntimeException: The
markup in the document following the root element must be
well-formed.'
But I am unable to do any String manipulation's on javax.xml.transform.Source (Casting is not working).
I do not want to use intermediate files to add my root node as I fear it will prove costly as i need to be processing close to 50k XML records.

The StreamSource has several constructors
Path inputPath = Paths.get("inputXML.txt");
String input = new String(Files.readAllBytes(inputPath,
StandardCharsets.UTF_8));
input = input.replaceFirst("<quasiroot", "<root>$0")
+ "</root>";
Source text = new StreamSource(new StringReader(input));

Note that in the Java world you have XML parsers like Xerces with support for external entities so you can simply construct a file referencing your other file e.g.
<!DOCTYPE root [
<!ENTITY input SYSTEM "inputXML.txt">
]>
<root>&input;</root>
then all you need to do is load that file as the source for your XSLT. There is no need for string manipulation, at least not to manipulate the whole XML, if you want, you can construct the above directly as a string and pass it to a StreamSource over a StringReader where you set the system id to the directory of your input XML:
String input = "inputXML.txt";
File dir = new File(".");
String baseUri = dir.toURI().toASCIIString();
String inputXml = "<!DOCTYPE root [ <!ENTITY input SYSTEM \"" + input + "\">]><root>&input;</root>";
TransformerFactory factory = TransformerFactory.newInstance();
Source xslt = new StreamSource(new File("transform.xsl"));
Transformer transformer = factory.newTransformer(xslt);
Source text = new StreamSource(new StringReader(inputXml), baseUri);
transformer.transform(text, new StreamResult(new File("output.txt")));

Related

How to prevent self-closing <tags/> in XML?

I modify XML file using the Transformer class and transform method. It correctly modify my parameters but changed XML style (write XML attributes in different way):
Original:
<a struct="b"></a>
<c></c>
After edit:
<a struct="b"/>
<c/>
I know that I can set properties: transformer.setOutputProperty(OutputKeys.KEY,value), but I did not find proper settings.
Can anyone help the transformer not change the write format?
XMLReader xr = new XMLFilterImpl(XMLReaderFactory.createXMLReader()
Source src = new SAXSource(xr, new InputSource(new
StringReader(xmlArray[i])));
<<modify xml>>
TransformerFactory transFactory = TransformerFactory.newInstance();
Transformer transformer = transFactory.newTransformer();
transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION,"yes");
StringWriter buffer = new StringWriter();
transformer.transform(src, new StreamResult(buffer));
xmlArray[i] = buffer.toString();
Those forms are semantically equivalent. No conforming XML parser will care, and neither should you.

The element type "META" must be terminated by the matching end-tag "</META>". while generating PDF from XML file using XSL

Am trying to convert XML to PDF document. While parsing XML with XSL for generating HTML for PDF creating. HTML doesnt contain </meta> closing tag, due to that am getting the below error
Exception in thread "main" org.xhtmlrenderer.util.XRRuntimeException: Can't load the XML resource (using TRaX transformer). org.xml.sax.SAXParseException; lineNumber: 22; columnNumber: 3; The element type "META" must be terminated by the matching end-tag "</META>".
How i can include </meta> closing tag in HTML
Please find my java code to generate PDF from XML
public class XMLtoPDF {
public static void main(String[] args)
throws IOException, DocumentException, TransformerException,TransformerConfigurationException,FileNotFoundException {
TransformerFactory tFactory = TransformerFactory.newInstance();
Transformer transformer = tFactory.newTransformer(new StreamSource("xsl_html_pagebreak_a.xslt"));
transformer.transform(new StreamSource("xsl_html_pagebreak_input.xml"),new StreamResult(new FileOutputStream("sample3.html")));
String File_To_Convert = "sample3.html";
String url = new File(File_To_Convert).toURI().toURL().toString();
System.out.println(""+url);
String HTML_TO_PDF = "ConvertedFile3.pdf";
OutputStream os = new FileOutputStream(HTML_TO_PDF);
ITextRenderer renderer = new ITextRenderer();
renderer.setDocument(url);
renderer.layout();
renderer.createPDF(os);
os.close();
}
}
The name org.xhtmlrenderer.util suggests to me that the library you are using (ITextRenderer) expects XHTML. You can get XHTML output from your XSLT transformation by
(a) changing it to use the XHTML output method instead of HTML
(b) changing it to use an XSLT 2.0 processor such as Saxon, because to use the XHTML output method, you need XSLT 2.0.
More specifically, change
TransformerFactory tFactory = TransformerFactory.newInstance();
Transformer transformer = tFactory.newTransformer(new StreamSource("xsl_html_pagebreak_a.xslt"));
to
TransformerFactory tFactory = new net.sf.saxon.TransformerFactoryImpl();
Transformer transformer = tFactory.newTransformer(new StreamSource("xsl_html_pagebreak_a.xslt"));
transformer.setOutputProperty("method", "xhtml");
Alternatively you might find that changing the serialization method to "xml" will also work; in that case you don't need to change to XSLT 2.0.

Can JAXP be used to create HTML5 documents?

Are there elements in the HTML5 specification which can not be created with a XML library such as JAXP? One example are named HTML entities which are not defined in XML. Are there other areas which are incompatible?
JAXP apparently only works on well formed XML. You'd need to convert the HTML to XHTML before subjecting it to the JAXP's standard parser.
// Create Transformer
TransformerFactory tf = TransformerFactory.newInstance();
StreamSource xslt = new StreamSource(
"src/blog/jaxbsource/xslt/stylesheet.xsl");
Transformer transformer = tf.newTransformer(xslt);
// Source
JAXBContext jc = JAXBContext.newInstance(Library.class);
JAXBSource source = new JAXBSource(jc, catalog);
// Result
StreamResult result = new StreamResult(System.out);
// Transform
transformer.transform(source, result);
Url:[https://dzone.com/articles/using-jaxb-xslt-produce-html][1]

Replace text in XML before XSLT

I need to replace a certain text in a XML file before giving it to the XSL-Transformer.
It's the DTD-URL in the DOCTYPE tag. It points to a webserver, but I want it to be usable offline, so I want to change it to a URL pointing to a local file.
However I mustn't edit the original XML directly. I thought of reading the file into a string, use String.replaceAll() on the text and save it into another file, which I pass to the Transformer. I already tried it, but it's really slow; the file I'm using has a size of ca. 500kiB.
Is there any better (=faster) way to accomplish this?
EDIT: The code used for the transformation:
public String getPlaylist(String playlist) {
Source source = new StreamSource(library);
StreamSource xsl = new StreamSource(getClass().getResourceAsStream("M3Utransformation.xml"));
StringWriter w = new StringWriter();
Result result = new StreamResult(w);
try {
Transformer transformer = TransformerFactory.newInstance().newTransformer(xsl);
transformer.setParameter("playlist", playlist);
transformer.transform(source, result);
return w.getBuffer().toString();
} catch (Throwable t) {
t.printStackTrace();
return null;
}
}
You can create an entity resolver, and make use of it.
The following example uses the JAXP DocumentBuilder, and a CatalogResolver
public static void main(String[] args) throws ParserConfigurationException,
SAXException, IOException, TransformerConfigurationException, TransformerException {
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
db.setEntityResolver(new CatalogResolver());
File src = new File("src/article.xml");
Document doc = db.parse(src);
// Here, we execute the transformation
// Use a Transformer for output
File stylesheet = new File("src/aticle.xsl");
TransformerFactory tFactory = TransformerFactory.newInstance();
StreamSource stylesource = new StreamSource(stylesheet);
Transformer transformer = tFactory.newTransformer(stylesource);
DOMSource source = new DOMSource(document);
StreamResult result = new StreamResult(System.out);
transformer.transform(source, result);
}
create a catalog properties file, and place it on your classpath
CatalogManager.properties has to be the name, see CatalogManager API documentation
define a catalog XML file, point your properties file, above to it. From
http://www.xml.com/pub/a/2004/03/03/catalogs.html you can find a very simple catalog XML file :
<?xml version="1.0" encoding="UTF-8"?>
<catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog">
<public publicId="-//OYRM/foo" uri="src/bar.dtd"/>
</catalog>
With the above catalog.xml and CatalogManager.properties, you'll end up resolving references to the publicId "-//OYRM/foo" to the uri src/bar.dtd
xml-commons contains the resolver :
http://xerces.apache.org/mirrors.cgi#binary
for a more complete treatment of the topic of Resolvers read Tom White's article from XML.com
The transformer application was cribbed from the Java trail for Extensible StyleSheet Language Transformations > Transforming Data with XSLT

Using Saxon and XSLT to transform JDOM XML documents

I'm trying to convert some XML so that iso8879 entity strings will appear in place of characters. For example the string 1234-5678 would become 1234&hyphen;5678. I've done this using character maps and the stylesheets found at http://www.w3.org/2003/entities/iso8879doc/overview.html.
The first part of my xslt looks like this:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:import href="iso8879map.xsl"/>
<xsl:output omit-xml-declaration = "yes" use-character-maps="iso8879"/>
When I run this stylesheet in Eclipse with the Saxon XSLT engine it works fine and outputs an XML file with the hyphen entitiy string in place of the hyphen character. However, I need to automate this process so am using the JDOM package. Unfortunately, the characters are not being replaced during the transformation. The code that does the conversion looks a little like this:
System.setProperty("javax.xml.transform.TransformerFactory",
"net.sf.saxon.TransformerFactoryImpl"); // use saxon for xslt 2.0 support
SAXBuilder builder = new SAXBuilder();
builder.setExpandEntities(false);
XSLTransformer transformer = new XSLTransformer(styleSheet);
Document toTransform = builder.build(Fileref); // transform
Document transformed = transformer.transform(toTransform);
I then write the document out to a file using the following method:
public static void writeXMLDoc(File xmlDoc, Document jdomDoc){
try {
Format format = Format.getPrettyFormat();
format.setOmitDeclaration(true);
format.setEncoding("ISO-8859-1");
XMLOutputter outputter = new XMLOutputter(format);
//outputter.output((org.jdom.Document) allChapters, System.out);
FileWriter writer = new FileWriter(xmlDoc.getAbsolutePath());
outputter.output((org.jdom.Document) jdomDoc, writer);
writer.close();
}
catch (java.io.IOException exp) {
exp.printStackTrace();
}
}
I've started debugging in Eclipse and it looks like the hyphen character isn't being replaced during the xslt transformation. I've tested this using the Saxon xslt engine on it's own and it does work, so it's likely something to do with using it from Java and Jdom. Can anybody help?
Many thanks.
Jim
The problem did turn out to be with not using the JDOM wrapper class provided by Saxon. Here's the working code for reference that shows a JDOM document being transformed and being returned as a new JDOM document:
System.setProperty("javax.xml.transform.TransformerFactory", "net.sf.saxon.TransformerFactoryImpl"); // use saxon for xslt 2.0 support
File styleSheet = new File("filePath");
// Get a TransformerFactory
System.setProperty("javax.xml.transform.TransformerFactory",
"com.saxonica.config.ProfessionalTransformerFactory");
TransformerFactory tfactory = TransformerFactory.newInstance();
ProfessionalConfiguration config = (ProfessionalConfiguration)((TransformerFactoryImpl)tfactory).getConfiguration();
// Get a SAXBuilder
SAXBuilder builder = new SAXBuilder();
//Build JDOM Document
Document toTransform = builder.build(inputFileHandle);
//Give it a Saxon wrapper
DocumentWrapper docw = new DocumentWrapper(toTransform, inputHandle.getAbsolutePath(), config);
// Compile the stylesheet
Templates templates = tfactory.newTemplates(new StreamSource(styleSheet));
Transformer transformer = templates.newTransformer();
// Now do a transformation
ByteArrayOutputStream outStream = new ByteArrayOutputStream(1024);
transformer.transform(docw, new StreamResult(outStream));
ByteArrayInputStream inStream = new ByteArrayInputStream(outStream.toByteArray());
Document transformed = builder.build(inStream);

Categories