Using Saxon and XSLT to transform JDOM XML documents

Using Saxon and XSLT to transform JDOM XML documents - java

I'm trying to convert some XML so that iso8879 entity strings will appear in place of characters. For example the string 1234-5678 would become 1234&hyphen;5678. I've done this using character maps and the stylesheets found at http://www.w3.org/2003/entities/iso8879doc/overview.html.
The first part of my xslt looks like this:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:import href="iso8879map.xsl"/>
<xsl:output omit-xml-declaration = "yes" use-character-maps="iso8879"/>
When I run this stylesheet in Eclipse with the Saxon XSLT engine it works fine and outputs an XML file with the hyphen entitiy string in place of the hyphen character. However, I need to automate this process so am using the JDOM package. Unfortunately, the characters are not being replaced during the transformation. The code that does the conversion looks a little like this:
System.setProperty("javax.xml.transform.TransformerFactory",
"net.sf.saxon.TransformerFactoryImpl"); // use saxon for xslt 2.0 support
SAXBuilder builder = new SAXBuilder();
builder.setExpandEntities(false);
XSLTransformer transformer = new XSLTransformer(styleSheet);
Document toTransform = builder.build(Fileref); // transform
Document transformed = transformer.transform(toTransform);
I then write the document out to a file using the following method:
public static void writeXMLDoc(File xmlDoc, Document jdomDoc){
try {
Format format = Format.getPrettyFormat();
format.setOmitDeclaration(true);
format.setEncoding("ISO-8859-1");
XMLOutputter outputter = new XMLOutputter(format);
//outputter.output((org.jdom.Document) allChapters, System.out);
FileWriter writer = new FileWriter(xmlDoc.getAbsolutePath());
outputter.output((org.jdom.Document) jdomDoc, writer);
writer.close();
}
catch (java.io.IOException exp) {
exp.printStackTrace();
}
}
I've started debugging in Eclipse and it looks like the hyphen character isn't being replaced during the xslt transformation. I've tested this using the Saxon xslt engine on it's own and it does work, so it's likely something to do with using it from Java and Jdom. Can anybody help?
Many thanks.
Jim

The problem did turn out to be with not using the JDOM wrapper class provided by Saxon. Here's the working code for reference that shows a JDOM document being transformed and being returned as a new JDOM document:
System.setProperty("javax.xml.transform.TransformerFactory", "net.sf.saxon.TransformerFactoryImpl"); // use saxon for xslt 2.0 support
File styleSheet = new File("filePath");
// Get a TransformerFactory
System.setProperty("javax.xml.transform.TransformerFactory",
"com.saxonica.config.ProfessionalTransformerFactory");
TransformerFactory tfactory = TransformerFactory.newInstance();
ProfessionalConfiguration config = (ProfessionalConfiguration)((TransformerFactoryImpl)tfactory).getConfiguration();
// Get a SAXBuilder
SAXBuilder builder = new SAXBuilder();
//Build JDOM Document
Document toTransform = builder.build(inputFileHandle);
//Give it a Saxon wrapper
DocumentWrapper docw = new DocumentWrapper(toTransform, inputHandle.getAbsolutePath(), config);
// Compile the stylesheet
Templates templates = tfactory.newTemplates(new StreamSource(styleSheet));
Transformer transformer = templates.newTransformer();
// Now do a transformation
ByteArrayOutputStream outStream = new ByteArrayOutputStream(1024);
transformer.transform(docw, new StreamResult(outStream));
ByteArrayInputStream inStream = new ByteArrayInputStream(outStream.toByteArray());
Document transformed = builder.build(inStream);

Related

The element type "META" must be terminated by the matching end-tag "</META>". while generating PDF from XML file using XSL

Am trying to convert XML to PDF document. While parsing XML with XSL for generating HTML for PDF creating. HTML doesnt contain </meta> closing tag, due to that am getting the below error
Exception in thread "main" org.xhtmlrenderer.util.XRRuntimeException: Can't load the XML resource (using TRaX transformer). org.xml.sax.SAXParseException; lineNumber: 22; columnNumber: 3; The element type "META" must be terminated by the matching end-tag "</META>".
How i can include </meta> closing tag in HTML
Please find my java code to generate PDF from XML
public class XMLtoPDF {
public static void main(String[] args)
throws IOException, DocumentException, TransformerException,TransformerConfigurationException,FileNotFoundException {
TransformerFactory tFactory = TransformerFactory.newInstance();
Transformer transformer = tFactory.newTransformer(new StreamSource("xsl_html_pagebreak_a.xslt"));
transformer.transform(new StreamSource("xsl_html_pagebreak_input.xml"),new StreamResult(new FileOutputStream("sample3.html")));
String File_To_Convert = "sample3.html";
String url = new File(File_To_Convert).toURI().toURL().toString();
System.out.println(""+url);
String HTML_TO_PDF = "ConvertedFile3.pdf";
OutputStream os = new FileOutputStream(HTML_TO_PDF);
ITextRenderer renderer = new ITextRenderer();
renderer.setDocument(url);
renderer.layout();
renderer.createPDF(os);
os.close();
}
}

The name org.xhtmlrenderer.util suggests to me that the library you are using (ITextRenderer) expects XHTML. You can get XHTML output from your XSLT transformation by
(a) changing it to use the XHTML output method instead of HTML
(b) changing it to use an XSLT 2.0 processor such as Saxon, because to use the XHTML output method, you need XSLT 2.0.
More specifically, change
TransformerFactory tFactory = TransformerFactory.newInstance();
Transformer transformer = tFactory.newTransformer(new StreamSource("xsl_html_pagebreak_a.xslt"));
to
TransformerFactory tFactory = new net.sf.saxon.TransformerFactoryImpl();
Transformer transformer = tFactory.newTransformer(new StreamSource("xsl_html_pagebreak_a.xslt"));
transformer.setOutputProperty("method", "xhtml");
Alternatively you might find that changing the serialization method to "xml" will also work; in that case you don't need to change to XSLT 2.0.

xquery transformation creates empty namespace in element

I'm sorry but I guess I just don't see the mistake I'm making here.
I have a camel route which returns an XML and to be able to test the output I wrote a JUnit Test which runs with SpringRunner. There I get the XML Stream from the exchange which I validate against an XSD. This works great because the XSD throws an exception because the output XML is not valid, but I don't understand why the following xquery generates an element with EMPTY NAMESPACE?
See the xquery snippet (I'm sorry again I cannot provide more code):
declare default element namespace "http://www.dppgroup.com/XXXPMS";
let $cmmdoc := $doc/*:cmmdoc
, $partner := $doc/*:cmmdoc/*:information/*:partner_gruppe/*:partner
, $sequence:= fn:substring($cmmdoc/#unifier,3)
return <ClientMMS xmlns:infra="http://www.dppgroup.com/InfraNS">
{
for $x in $partner
where $x[#partnerStatusCode = " "]
return
element {"DataGroup" } {
<Client sequenceNumber="{$sequence}" />
}
}
My problem is, that with this code the resulting XML contains the DataGroup-element with the following namespace definition:
<?xml version="1.0" encoding="UTF-8"?>
<ClientMMS xmlns="http://www.dppgroup.com/XXXPMS"
xmlns:infra="http://www.dppgroup.com/InfraNS">
<DataGroup xmlns="">
<Client sequenceNumber="170908065609671475"/>
</DataGroup>
</ClientMMS>
The snippet from the Unit-Test: I'm using jdk1.8_102
String xml = TestDataReader.readXML("/input/info/info_in.xml", PROJECT_ENCODING);
quelle.sendBody(xml);
boolean valid = false;
try {
DocumentBuilder documentBuilder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
ByteArrayInputStream byteArrayInputStream = new ByteArrayInputStream((byte[]) archiv.getExchanges().get(1).getIn().getBody());
Document document = documentBuilder.parse(byteArrayInputStream);
Transformer transformer = TransformerFactory.newInstance().newTransformer();
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
StreamResult result = new StreamResult(new StringWriter());
DOMSource source = new DOMSource(document);
transformer.transform(source, result);
String xmlString = result.getWriter().toString();
System.out.println(xmlString);
In no XQuery introduction/tutorial/explanation I can find a reason why this happens. Can you guys please explain why the DataGroup element is not in the default namespace?

The XQuery you posted should create the result fine without the namespace undeclaration you show.
In your Java code if you want to work with XML with namespaces make sure you use a namespace aware DocumentBuilder, as the default DocumentBuilderFactory is not namespace aware make sure you set setNamespaceAware(true) on the factory before creating a DocumentBuilder with it.

String manipulations for javax.xml.transform.Source

I am using Java and XSL style sheets to retrieve values from an XML file and outputting it to a text file.
Below is the program used:
TransformerFactory factory = TransformerFactory.newInstance();
Source xslt = new StreamSource(new File("transform.xsl"));
Transformer transformer = factory.newTransformer(xslt);
Source text = new StreamSource(new File("inputXML.txt"));
transformer.transform(text, new StreamResult(new File("output.txt"))) ;
But recently I found that the XML files I will be reading will have 2 root nodes and not one. So I am thinking of doing string manipulation to add a root node of my own programatically so that I can avoid the below error:
ERROR: 'The markup in the document following the root element must
be well-formed.' ERROR:
'com.sun.org.apache.xml.internal.utils.WrappedRuntimeException: The
markup in the document following the root element must be
well-formed.'
But I am unable to do any String manipulation's on javax.xml.transform.Source (Casting is not working).
I do not want to use intermediate files to add my root node as I fear it will prove costly as i need to be processing close to 50k XML records.

The StreamSource has several constructors
Path inputPath = Paths.get("inputXML.txt");
String input = new String(Files.readAllBytes(inputPath,
StandardCharsets.UTF_8));
input = input.replaceFirst("<quasiroot", "<root>$0")
+ "</root>";
Source text = new StreamSource(new StringReader(input));

Note that in the Java world you have XML parsers like Xerces with support for external entities so you can simply construct a file referencing your other file e.g.
<!DOCTYPE root [
<!ENTITY input SYSTEM "inputXML.txt">
]>
<root>&input;</root>
then all you need to do is load that file as the source for your XSLT. There is no need for string manipulation, at least not to manipulate the whole XML, if you want, you can construct the above directly as a string and pass it to a StreamSource over a StringReader where you set the system id to the directory of your input XML:
String input = "inputXML.txt";
File dir = new File(".");
String baseUri = dir.toURI().toASCIIString();
String inputXml = "<!DOCTYPE root [ <!ENTITY input SYSTEM \"" + input + "\">]><root>&input;</root>";
TransformerFactory factory = TransformerFactory.newInstance();
Source xslt = new StreamSource(new File("transform.xsl"));
Transformer transformer = factory.newTransformer(xslt);
Source text = new StreamSource(new StringReader(inputXml), baseUri);
transformer.transform(text, new StreamResult(new File("output.txt")));

Replace text in XML before XSLT

I need to replace a certain text in a XML file before giving it to the XSL-Transformer.
It's the DTD-URL in the DOCTYPE tag. It points to a webserver, but I want it to be usable offline, so I want to change it to a URL pointing to a local file.
However I mustn't edit the original XML directly. I thought of reading the file into a string, use String.replaceAll() on the text and save it into another file, which I pass to the Transformer. I already tried it, but it's really slow; the file I'm using has a size of ca. 500kiB.
Is there any better (=faster) way to accomplish this?
EDIT: The code used for the transformation:
public String getPlaylist(String playlist) {
Source source = new StreamSource(library);
StreamSource xsl = new StreamSource(getClass().getResourceAsStream("M3Utransformation.xml"));
StringWriter w = new StringWriter();
Result result = new StreamResult(w);
try {
Transformer transformer = TransformerFactory.newInstance().newTransformer(xsl);
transformer.setParameter("playlist", playlist);
transformer.transform(source, result);
return w.getBuffer().toString();
} catch (Throwable t) {
t.printStackTrace();
return null;
}
}

You can create an entity resolver, and make use of it.
The following example uses the JAXP DocumentBuilder, and a CatalogResolver
public static void main(String[] args) throws ParserConfigurationException,
SAXException, IOException, TransformerConfigurationException, TransformerException {
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
db.setEntityResolver(new CatalogResolver());
File src = new File("src/article.xml");
Document doc = db.parse(src);
// Here, we execute the transformation
// Use a Transformer for output
File stylesheet = new File("src/aticle.xsl");
TransformerFactory tFactory = TransformerFactory.newInstance();
StreamSource stylesource = new StreamSource(stylesheet);
Transformer transformer = tFactory.newTransformer(stylesource);
DOMSource source = new DOMSource(document);
StreamResult result = new StreamResult(System.out);
transformer.transform(source, result);
}
create a catalog properties file, and place it on your classpath
CatalogManager.properties has to be the name, see CatalogManager API documentation
define a catalog XML file, point your properties file, above to it. From
http://www.xml.com/pub/a/2004/03/03/catalogs.html you can find a very simple catalog XML file :
<?xml version="1.0" encoding="UTF-8"?>
<catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog">
<public publicId="-//OYRM/foo" uri="src/bar.dtd"/>
</catalog>
With the above catalog.xml and CatalogManager.properties, you'll end up resolving references to the publicId "-//OYRM/foo" to the uri src/bar.dtd
xml-commons contains the resolver :
http://xerces.apache.org/mirrors.cgi#binary
for a more complete treatment of the topic of Resolvers read Tom White's article from XML.com
The transformer application was cribbed from the Java trail for Extensible StyleSheet Language Transformations > Transforming Data with XSLT

Java:XML Parser

I have a response XML something like this -
<Response> <aa> <Fromhere> <a1>Content</a1> <a2>Content</a2> </Fromhere> </aa> </Response>
I want to extract the whole content from <Fromhere> to </Fromhere> in a string. Is it possible to do that through any string function or through XML parser?
Please advice.

You could try an XPath approach for simpleness in XML parsing:
InputStream response = new ByteArrayInputStream("<Response> <aa> "
+ "<Fromhere> <a1>Content</a1> <a2>Content</a2> </Fromhere> "
+ "</aa> </Response>".getBytes()); /* Or whatever. */
DocumentBuilder builder = DocumentBuilderFactory
.newInstance().newDocumentBuilder();
Document doc = builder.parse(response);
XPath xpath = XPathFactory.newInstance().newXPath();
XPathExpression expr = xpath.compile("string(/Response/aa/FromHere)");
String result = (String)expr.evaluate(doc, XPathConstants.STRING);
Note that I haven't tried this code. It may need tweaking.

Through an XML parser. Using string functions to parse XML is a bad idea...
Beside the Sun tutorials pointed out above, you can check the DZone Refcardz on Java and XML, I found it was a good, terse explanation how to do it.
But well, there is probably plenty of Web resources on the topic, including on this very site.

You can apply an XSLT stylesheet to extract the desired content.
This stylesheet should fit your example:
<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/Response/aa/Fromhere/*">
<xsl:copy>
<xsl:apply-templates/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
Apply it with something like the following (exception handling not included):
String xml = "<Response> <aa> <Fromhere> <a1>Content</a1> <a2>Content</a2> </Fromhere> </aa> </Response>";
Source xsl = new StreamSource(new FileReader("/path/to/file.xsl");
TransformerFactory transformerFactory = TransformerFactory.newInstance();
Transformer transformer = transformerFactory.newTransformer(xsl);
transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
StringWriter out = new StringWriter();
transformer.transform(new StreamSource(new StringReader(xml)), new StreamResult(out));
System.out.println(out.toString());
This should work with any version of Java starting with 1.4.

This should work
import java.util.regex.*
Pattern p = Pattern.compile("<Fromhere>.*</Fromhere>");
Matcher m = p.matcher(responseString);
String whatYouWant = m.group();
It would be a little more verbose to use Scanner, but that could work too.
Whether this is a good idea is for someone more experienced than I.

One option is to use a StreamFilter:
class MyFilter implements StreamFilter {
private boolean on;
#Override
public boolean accept(XMLStreamReader reader) {
final String element = "Fromhere";
if (reader.isStartElement() && element.equals(reader.getLocalName())) {
on = true;
} else if (reader.isEndElement()
&& element.equals(reader.getLocalName())) {
on = false;
return true;
}
return on;
}
}
Combined with a Transformer, you can use this to safely parse logically-equivalent markup like this:
<Response>
<!-- <Fromhere></Fromhere> -->
<aa>
<Fromhere>
<a1>Content</a1> <a2>Content</a2>
</Fromhere>
</aa>
</Response>
Demo:
StringWriter writer = new StringWriter();
XMLInputFactory inputFactory = XMLInputFactory.newInstance();
XMLStreamReader reader = inputFactory
.createXMLStreamReader(new StringReader(xmlString));
reader = inputFactory.createFilteredReader(reader, new MyFilter());
TransformerFactory transFactory = TransformerFactory.newInstance();
Transformer transformer = transFactory.newTransformer();
transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
transformer.transform(new StAXSource(reader), new StreamResult(writer));
System.out.println(writer.toString());
This is a programmatic variation on Massimiliano Fliri's approach.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Using Saxon and XSLT to transform JDOM XML documents - java

Related

The element type "META" must be terminated by the matching end-tag "</META>". while generating PDF from XML file using XSL

xquery transformation creates empty namespace in element

String manipulations for javax.xml.transform.Source

Replace text in XML before XSLT

Java:XML Parser

Categories

Resources