Replace text in XML before XSLT - java

I need to replace a certain text in a XML file before giving it to the XSL-Transformer.
It's the DTD-URL in the DOCTYPE tag. It points to a webserver, but I want it to be usable offline, so I want to change it to a URL pointing to a local file.
However I mustn't edit the original XML directly. I thought of reading the file into a string, use String.replaceAll() on the text and save it into another file, which I pass to the Transformer. I already tried it, but it's really slow; the file I'm using has a size of ca. 500kiB.
Is there any better (=faster) way to accomplish this?
EDIT: The code used for the transformation:
public String getPlaylist(String playlist) {
Source source = new StreamSource(library);
StreamSource xsl = new StreamSource(getClass().getResourceAsStream("M3Utransformation.xml"));
StringWriter w = new StringWriter();
Result result = new StreamResult(w);
try {
Transformer transformer = TransformerFactory.newInstance().newTransformer(xsl);
transformer.setParameter("playlist", playlist);
transformer.transform(source, result);
return w.getBuffer().toString();
} catch (Throwable t) {
t.printStackTrace();
return null;
}
}

You can create an entity resolver, and make use of it.
The following example uses the JAXP DocumentBuilder, and a CatalogResolver
public static void main(String[] args) throws ParserConfigurationException,
SAXException, IOException, TransformerConfigurationException, TransformerException {
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
db.setEntityResolver(new CatalogResolver());
File src = new File("src/article.xml");
Document doc = db.parse(src);
// Here, we execute the transformation
// Use a Transformer for output
File stylesheet = new File("src/aticle.xsl");
TransformerFactory tFactory = TransformerFactory.newInstance();
StreamSource stylesource = new StreamSource(stylesheet);
Transformer transformer = tFactory.newTransformer(stylesource);
DOMSource source = new DOMSource(document);
StreamResult result = new StreamResult(System.out);
transformer.transform(source, result);
}
create a catalog properties file, and place it on your classpath
CatalogManager.properties has to be the name, see CatalogManager API documentation
define a catalog XML file, point your properties file, above to it. From
http://www.xml.com/pub/a/2004/03/03/catalogs.html you can find a very simple catalog XML file :
<?xml version="1.0" encoding="UTF-8"?>
<catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog">
<public publicId="-//OYRM/foo" uri="src/bar.dtd"/>
</catalog>
With the above catalog.xml and CatalogManager.properties, you'll end up resolving references to the publicId "-//OYRM/foo" to the uri src/bar.dtd
xml-commons contains the resolver :
http://xerces.apache.org/mirrors.cgi#binary
for a more complete treatment of the topic of Resolvers read Tom White's article from XML.com
The transformer application was cribbed from the Java trail for Extensible StyleSheet Language Transformations > Transforming Data with XSLT

Related

How do I write a DOM Document to File?

How do I write this document to the local filesystem?
public void docToFile(org.w3c.dom.Document document, URI path) throws Exception {
File file = new File(path);
}
I need to iterate the document, or might there be a "to xml/html/string" method? I was looking at:
document.getXmlEncoding();
Not quite what I'm after -- but something like that. Looking for the String representation and then to write that to file like:
Path file = ...;
byte[] buf = ...;
Files.write(file, buf);
https://docs.oracle.com/javase/tutorial/essential/io/file.html
I would use a transformer class to convert the DOM content to an xml file, something like below:
Document doc =...
// write the content into xml file
DOMSource source = new DOMSource(doc);
FileWriter writer = new FileWriter(new File("/tmp/output.xml"));
StreamResult result = new StreamResult(writer);
TransformerFactory transformerFactory = TransformerFactory.newInstance();
Transformer transformer = transformerFactory.newTransformer();
transformer.transform(source, result);
I hope this ends up working for you!

Disable caching in Javax xml transformer

In this article (https://www.ahoi-it.de/ahoi/news/java-xslt-memory-leak/4830) it is explained that Javax xml transformer caches XML contents to its internal HashMap for later use.
This is my issue: I am reading XML messages from activemq and if something fails, I retry to convert them again using Javax XML transformer and send them to certain endpoint. The problem is that eventually my Docker container restarts because it runs out of memory.
What I would like to do is disable caching, unfortunately, after 3 hours of research I still have no idea how to do so.
I have a utils class with static methods and this is how my Javax XML Transformer looks like:
public static String getTransformedXml(Object input, String transformerFileName)
throws IOException, TransformerException {
ClassPathResource classPathResource = new ClassPathResource(transformerFileName);
InputStream xsltStream = classPathResource.getInputStream();
TransformerFactory factory = TransformerFactory.newInstance();
Source xslt = new StreamSource(xsltStream);
Transformer transformer = factory.newTransformer(xslt);
transformer.setErrorListener(new XsltTransformerErrorListener(transformerFileName));
Source text = new StreamSource(new StringReader(XmlUtils.encode(input, input.getClass())));
StringWriter writer = new StringWriter();
StreamResult result = new StreamResult(writer);
transformer.transform(text, result);
return result.getWriter().toString();
}
Try to change the TransformerFactoryImpl :
TransformerFactory tFactory = TransformerFactory.newInstance("org.apache.xalan.processor.TransformerFactoryImpl",null);

String manipulations for javax.xml.transform.Source

I am using Java and XSL style sheets to retrieve values from an XML file and outputting it to a text file.
Below is the program used:
TransformerFactory factory = TransformerFactory.newInstance();
Source xslt = new StreamSource(new File("transform.xsl"));
Transformer transformer = factory.newTransformer(xslt);
Source text = new StreamSource(new File("inputXML.txt"));
transformer.transform(text, new StreamResult(new File("output.txt"))) ;
But recently I found that the XML files I will be reading will have 2 root nodes and not one. So I am thinking of doing string manipulation to add a root node of my own programatically so that I can avoid the below error:
ERROR: 'The markup in the document following the root element must
be well-formed.' ERROR:
'com.sun.org.apache.xml.internal.utils.WrappedRuntimeException: The
markup in the document following the root element must be
well-formed.'
But I am unable to do any String manipulation's on javax.xml.transform.Source (Casting is not working).
I do not want to use intermediate files to add my root node as I fear it will prove costly as i need to be processing close to 50k XML records.
The StreamSource has several constructors
Path inputPath = Paths.get("inputXML.txt");
String input = new String(Files.readAllBytes(inputPath,
StandardCharsets.UTF_8));
input = input.replaceFirst("<quasiroot", "<root>$0")
+ "</root>";
Source text = new StreamSource(new StringReader(input));
Note that in the Java world you have XML parsers like Xerces with support for external entities so you can simply construct a file referencing your other file e.g.
<!DOCTYPE root [
<!ENTITY input SYSTEM "inputXML.txt">
]>
<root>&input;</root>
then all you need to do is load that file as the source for your XSLT. There is no need for string manipulation, at least not to manipulate the whole XML, if you want, you can construct the above directly as a string and pass it to a StreamSource over a StringReader where you set the system id to the directory of your input XML:
String input = "inputXML.txt";
File dir = new File(".");
String baseUri = dir.toURI().toASCIIString();
String inputXml = "<!DOCTYPE root [ <!ENTITY input SYSTEM \"" + input + "\">]><root>&input;</root>";
TransformerFactory factory = TransformerFactory.newInstance();
Source xslt = new StreamSource(new File("transform.xsl"));
Transformer transformer = factory.newTransformer(xslt);
Source text = new StreamSource(new StringReader(inputXml), baseUri);
transformer.transform(text, new StreamResult(new File("output.txt")));

Using Saxon and XSLT to transform JDOM XML documents

I'm trying to convert some XML so that iso8879 entity strings will appear in place of characters. For example the string 1234-5678 would become 1234&hyphen;5678. I've done this using character maps and the stylesheets found at http://www.w3.org/2003/entities/iso8879doc/overview.html.
The first part of my xslt looks like this:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:import href="iso8879map.xsl"/>
<xsl:output omit-xml-declaration = "yes" use-character-maps="iso8879"/>
When I run this stylesheet in Eclipse with the Saxon XSLT engine it works fine and outputs an XML file with the hyphen entitiy string in place of the hyphen character. However, I need to automate this process so am using the JDOM package. Unfortunately, the characters are not being replaced during the transformation. The code that does the conversion looks a little like this:
System.setProperty("javax.xml.transform.TransformerFactory",
"net.sf.saxon.TransformerFactoryImpl"); // use saxon for xslt 2.0 support
SAXBuilder builder = new SAXBuilder();
builder.setExpandEntities(false);
XSLTransformer transformer = new XSLTransformer(styleSheet);
Document toTransform = builder.build(Fileref); // transform
Document transformed = transformer.transform(toTransform);
I then write the document out to a file using the following method:
public static void writeXMLDoc(File xmlDoc, Document jdomDoc){
try {
Format format = Format.getPrettyFormat();
format.setOmitDeclaration(true);
format.setEncoding("ISO-8859-1");
XMLOutputter outputter = new XMLOutputter(format);
//outputter.output((org.jdom.Document) allChapters, System.out);
FileWriter writer = new FileWriter(xmlDoc.getAbsolutePath());
outputter.output((org.jdom.Document) jdomDoc, writer);
writer.close();
}
catch (java.io.IOException exp) {
exp.printStackTrace();
}
}
I've started debugging in Eclipse and it looks like the hyphen character isn't being replaced during the xslt transformation. I've tested this using the Saxon xslt engine on it's own and it does work, so it's likely something to do with using it from Java and Jdom. Can anybody help?
Many thanks.
Jim
The problem did turn out to be with not using the JDOM wrapper class provided by Saxon. Here's the working code for reference that shows a JDOM document being transformed and being returned as a new JDOM document:
System.setProperty("javax.xml.transform.TransformerFactory", "net.sf.saxon.TransformerFactoryImpl"); // use saxon for xslt 2.0 support
File styleSheet = new File("filePath");
// Get a TransformerFactory
System.setProperty("javax.xml.transform.TransformerFactory",
"com.saxonica.config.ProfessionalTransformerFactory");
TransformerFactory tfactory = TransformerFactory.newInstance();
ProfessionalConfiguration config = (ProfessionalConfiguration)((TransformerFactoryImpl)tfactory).getConfiguration();
// Get a SAXBuilder
SAXBuilder builder = new SAXBuilder();
//Build JDOM Document
Document toTransform = builder.build(inputFileHandle);
//Give it a Saxon wrapper
DocumentWrapper docw = new DocumentWrapper(toTransform, inputHandle.getAbsolutePath(), config);
// Compile the stylesheet
Templates templates = tfactory.newTemplates(new StreamSource(styleSheet));
Transformer transformer = templates.newTransformer();
// Now do a transformation
ByteArrayOutputStream outStream = new ByteArrayOutputStream(1024);
transformer.transform(docw, new StreamResult(outStream));
ByteArrayInputStream inStream = new ByteArrayInputStream(outStream.toByteArray());
Document transformed = builder.build(inStream);

Bad Characters when parsing GML in Java

I'm using the org.w3c.dom package to parse the gml schemas (http://schemas.opengis.net/gml/3.1.0/base/).
When I parse the gmlBase.xsd schema and then save it back out, the quote characters around GeometryCollections in the BagType complex type come out converted to bad characters (See code below).
Is there something wrong with how I'm parsing or saving the xml, or is there something in the schema that is off?
Thanks,
Curtis
public static void main(String[] args) throws IOException
{
File schemaFile = File.createTempFile("gml_", ".xsd");
FileUtils.writeStringToFile(schemaFile, getSchema(new URL("http://schemas.opengis.net/gml/3.1.0/base/gmlBase.xsd")));
System.out.println("wrote file: " + schemaFile.getAbsolutePath());
}
public static String getSchema(URL schemaURL)
{
try
{
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(new InputSource(new StringReader(IOUtils.toString(schemaURL.openStream()))));
Element rootElem = doc.getDocumentElement();
rootElem.normalize();
TransformerFactory tFactory = TransformerFactory.newInstance();
Transformer transformer = tFactory.newTransformer();
DOMSource source = new DOMSource(doc);
ByteArrayOutputStream xmlOutStream = new ByteArrayOutputStream();
StreamResult result = new StreamResult(xmlOutStream);
transformer.transform(source, result);
return xmlOutStream.toString();
}
catch (Exception e)
{
e.printStackTrace();
}
return "";
}
I'm suspicious of this line:
Document doc = db.parse(new InputSource(
new StringReader(IOUtils.toString(schemaURL.openStream()))));
I don't know what IOUtils.toString does here but presumably it's assuming a particular encoding, without taking account of the XML declaration.
Why not just use:
Document doc = db.parse(schemaURL.openStream());
Likewise your FileUtils.writeStringToFile doesn't appear to specify a character encoding... which encoding does it use, and why encoding is in the StreamResult?

Categories