TransformerFactory - avoiding network lookups to verify DTDs - java

I am needing to program for offline transformation of XML documents.
I have been able to stop DTD network lookups when loading the original XML file with the following :
DocumentBuilderFactory factory;
factory = DocumentBuilderFactory.newInstance();
factory.setValidating(false);
factory.setNamespaceAware(true);
factory.setFeature("http://xml.org/sax/features/namespaces", false);
factory.setFeature("http://xml.org/sax/features/validation", false);
factory.setFeature("http://apache.org/xml/features/nonvalidating/load-dtd-grammar", false);
factory.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);
// open up the xml document
docbuilder = factory.newDocumentBuilder();
doc = docbuilder.parse(new FileInputStream(m_strFilePath));
However, I am unable to apply this to the TransformerFactory object.
The DTDs are available locally, but I do not know how to direct the transformer to look at the local files as opposed to trying to do a network lookup.
From what I can see, the transformer needs these documents to correctly do the transformation.
For information, I am transforming MusicXML documents from Partwise to Timewise.
As you have probably guessed, XSLT is not my strong point (far from it).
Do I need to modify the XSLT files to reference local files, or can this be done differently ?
Further to the comments below, here is an excerpt of the xsl file. It is the only place that I see which refers to an external file :
<!--
XML output, with a DOCTYPE refering the timewise DTD.
Here we use the full Internet URL.
-->
<xsl:output method="xml" indent="yes" encoding="UTF-8"
omit-xml-declaration="no" standalone="no"
doctype-system="http://www.musicxml.org/dtds/timewise.dtd"
doctype-public="-//Recordare//DTD MusicXML 2.0 Timewise//EN" />
Is the mentioned technique valid for this also ?
The DTD file contains references to a number of MOD files like this :
<!ENTITY % layout PUBLIC
"-//Recordare//ELEMENTS MusicXML 2.0 Layout//EN"
"layout.mod">
I presume that these files will also be imported in turn also.

Ok, here is the answer which works for me.
1st step : load the original document, turning off validation and dtd loading within the factory.
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
// stop the network loading of DTD files
factory.setValidating(false);
factory.setNamespaceAware(true);
factory.setFeature("http://xml.org/sax/features/namespaces", false);
factory.setFeature("http://xml.org/sax/features/validation", false);
factory.setFeature("http://apache.org/xml/features/nonvalidating/load-dtd-grammar", false);
factory.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);
// open up the xml document
DocumentBuilder docbuilder = factory.newDocumentBuilder();
Document doc = docbuilder.parse(new FileInputStream(m_strFilePath));
2nd step : Now that I have got the document in memory ... and after having detected that I need to transform it -
TransformerFactory transformfactory = TransformerFactory.newInstance();
Templates xsl = transformfactory.newTemplates(new StreamSource(new FileInputStream((String)m_XslFile)));
Transformer transformer = xsl.newTransformer();
Document newdoc = docbuilder.newDocument();
Result XmlResult = new DOMResult(newdoc);
// now transform
transformer.transform(
new DOMSource(doc.getDocumentElement()),
XmlResult);
I needed to do this as I have further processing going on afterwards and did not want the overhead of outputting to file and reloading.
Little explanation :
The trick is to use the original DOM object which has had all the validation features turned off. You can see this here :
transformer.transform(
new DOMSource(doc.getDocumentElement()), // <<-----
XmlResult);
This has been tested with network access TURNED OFF.
So I know that there are no more network lookups.
However, if the DTDs, MODs, etc are available locally, then, as per the suggestions, the use of an EntityResolver is the answer. This to be applied, again, to the original docbuilder object.
I now have a transformed document stored in newdoc, ready to play with.
I hope this will help others.

You can use a library like Apache xml-commons-resolver and write a catalog file to map web URLs to your local copy of the relevant files. To wire this catalog up to the transformer mechanism you would need to use a SAXSource instead of a StreamSource as the source of your stylesheet:
SAXSource styleSource = new SAXSource(new InputSource("file:/path/to/stylesheet.xsl"));
CatalogResolver resolver = new CatalogResolver();
styleSource.getXMLReader().setEntityResolver(resolver);
TransformerFactory tf = TransformerFactory.newInstance();
tf.setURIResolver(resolver);
Transformer transformer = tf.newTransformer(styleSource);

The usual way to do this in Java is to use an LSResourceResolver to resolve the system ID (and/or public ID) to your local file. This is documented at http://docs.oracle.com/javase/7/docs/api/org/w3c/dom/ls/LSResourceResolver.html. You shouldn't need anything outside of standard Java XML parser features to get this working.

Related

Java - How does an xml document loads a DTD using XML Catalogs?

I want to know this so I can apply xsl transformations to the xml document without losing some entities like –
How do I tell the parser (any parser I dont care) which catalog to use and then execute the xsl transformations?, How do I connect the new configured parser to the transformation factory.
The code below represents the transformations I want to execute on the xml file (it works fine). I just want to know how can I add the XML Catalog approach so the xml-document loads correctly its DTD and continue with the xsl transformations steps.
try {
SAXTransformerFactory stf = (SAXTransformerFactory) TransformerFactory.newInstance();
Templates step1Template = stf.newTemplates(new StreamSource(
this.getClass().getResourceAsStream("xsltransformation_step1.xsl")
));
Templates step2Template = stf.newTemplates(new StreamSource(
this.getClass().getResourceAsStream("xsltransformation_step2.xsl")
));
Templates step3Template = stf.newTemplates(new StreamSource(
this.getClass().getResourceAsStream("xsltransformation_step3.xsl")
));
TransformerHandler th1 = stf.newTransformerHandler(step1Template);
TransformerHandler th2 = stf.newTransformerHandler(step2Template);
TransformerHandler th3 = stf.newTransformerHandler(step3Template);
StreamSource xmlStreamSource = new StreamSource(new File(xmlInputFile));
StreamResult outputStreamSource1 = new StreamResult(new File (outputNewFile1));
StreamResult outputStreamSource2 = new StreamResult(new File (outputNewFile2));
th1.setResult(new SAXResult(th2));
th2.setResult(new SAXResult(th3));
th3.setResult(outputStreamSource1);
Transformer t = stf.newTransformer();
t.transform(xmlStreamSource, new SAXResult(th1));
}catch (TransformerException e){
e.printStackTrace();
return false;
}
This is an example of the xmlInputFile containing entities
<?xml version='1.0' encoding='utf-8'?>
<!DOCTYPE manual PUBLIC '-//docufy//Docufy Standard DTD 20080125//EN' '/system/cosimago/dtd/manual.dtd'>
<chapter>
<title>LEDs „5 – 8“ am CPU-Board prüfen</title>
<body>
<!-- just content -->
</body>
</chapter>
Please I would be really thankful if some good soul help me out with this.
Thank you in advance.
Andres
It's simplest to create your own XML parser (XMLReader) using SAXTransformerFactory.newInstance(). Then set the CatalogResolver on the parser using XMLReader.setEntityResolver(). Then wrap the XMLReader in a SAXSource, and supply this as the Source object to Transformer.transform().
With Saxon it's also possible to supply the entity resolver indirectly via a configuration property, but this is much more convoluted and is only needed if you aren't able to control the creation and configuration of the XMLReader yourself.

XML parser configured does not prevent nor limit external entities resolution. This can expose the parser to an XML External Entities attack

We had a security audit on our code, and they mentioned that our code is vulnerable to EXternal Entity (XXE) attack.
Explanation- XML External Entities attacks benefit from an XML feature to build documents dynamically at the time of processing. An XML entity allows inclusion of data dynamically from a given resource. External entities allow an XML document to include data from an external URI. Unless configured to do otherwise, external entities force the XML parser to access the resource specified by the URI, e.g., a file on the local machine or on a remote system. This behavior exposes the application to XML External Entity (XXE) attacks, which can be used to perform denial of service of the local system, gain unauthorized access to files on the local machine, scan remote machines, and perform denial of service of remote systems. The following XML document shows an example of an XXE attack.
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE foo [
<!ELEMENT foo ANY >
<!ENTITY xxe SYSTEM "file:///dev/random" >]><foo>&xxe;</foo>
This example could crash the server (on a UNIX system), if the XML parser attempts to substitute the entity with the contents of the /dev/random file.
Recommendation- The XML unmarshaller should be configured securely so that it does not allow external entities as part of an incoming XML document. To avoid XXE injection do not use unmarshal methods that process an XML source directly as java.io.File, java.io.Reader or java.io.InputStream. Parse the document with a securely configured parser and use an unmarshal method that takes the secure
parser as the XML source as shown in the following example:
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setExpandEntityReferences(false);
DocumentBuilder db = dbf.newDocumentBuilder();
Document document = db.parse(<XML Source>);
Model model = (Model) u.unmarshal(document);
And written code is below where found the XXE attack-
1- DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
2- DocumentBuilder db = dbf.newDocumentBuilder();
3- InputSource is = new InputSource();
4- is.setCharacterStream(new StringReader(xml));
5-
6- Document doc = db.parse(is);
7- NodeList nodes = doc.getElementsByTagName(elementsByTagName);
8-
9- return nodes;
I am getting XXE attack on the line no 6.
Please help how can I resolve the above issue. Anyone help is appreciated !
For a detailed explanation and options for remediation I suggest you look at OWASP XEE Cheat Sheet
We had a similar issue raised and resolved it by disabling DOCTYPES (the first suggestion on the link above) as we didn't need them:
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true);
For javax.xml.parsers.DocumentBuilderFactory,the following setting would be enough to prevent XXE attack
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
// Disallow the DTDs (doctypes) entirely.
dbf.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true);
// Or do the following:
dbf.setFeature("http://xml.org/sax/features/external-general-entities", false);
dbf.setFeature("http://xml.org/sax/features/external-parameter-entities", false);
dbf.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);
dbf.setXIncludeAware(false);
dbf.setExpandEntityReferences(false);

Java replacing / removing / writing text in a certain spot in a txt file

So they the file I wanna edit is this,
<?xml version="1.0" encoding="utf-16"?>
<UserSettingsXml xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<AuthType>Google</AuthType> <!-- Google/Ptc -->
<DefaultLatitude>425</DefaultLatitude>
<DefaultLongitude>5555</DefaultLongitude>
The places I wanna edit, and remove text from is
<DefaultLatitude>425</DefaultLatitude>
<DefaultLongitude>5555</DefaultLongitude>
I've googled a bit but couldn't find what I was looking for.
If you want to alter an XML file you should be using something like DOM.
That's easier and safer to do than just altering the text in the file.
Reading the xml-file into a DOM-Document is quite easy:
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(new File(filename));
After that you'll have to find the right spot in the XML-Document and alter it:
Element root = doc.getDocumentElement();
NodeList found = root.getElementsByTagName("DefaultLatitude");
Node element = found.item(0);
String textContent = element.getTextContent(); // contains 425 now
element.setTextContent("987"); // set new text
After that you'll have to write out the changed document to a file:
TransformerFactory tranformerFactory = TransformerFactory.newInstance();
Transformer transformer = tranformerFactory.newTransformer();
DOMSource source = new DOMSource(doc);
StreamResult result = new StreamResult(new File(outputFile));
transformer.transform(source, result);
Please note that this is just an example on how to read, alter and save a XML-File. To properly work with DOM you'll have to read some tutorials!
The file you are trying to edit is in xml format. The best way to do what you want is to:
1. parse the file in xml
2. manipulate the xml
3. build the file again
A good page where to learn how to convert from text to xml is this:
http://www.mkyong.com/java/how-to-read-xml-file-in-java-dom-parser/
The rest is easier. I hope this helps.

Java DOM appendChild: namespaces and signed documents

I'm having a problem generating a document with signed nodes. I have a bunch of signed XMLs, in text format. They signature is valid, I've tested it with xmlsec1. I have to load all the XMLs and put them in another XML document, for sending it to another service.
So, first of all I create the container document ("sobre" is a local variable, a JAXB root element):
JAXBContext context = JAXBContext.newInstance("org.importe.test");
StringWriter writer = new StringWriter();
Marshaller marshaller = context.createMarshaller();
marshaller.marshal(sobre, writer);
String xml = writer.toString();
Document doc = loadXMLFromString(xml);
and then I add the XMLs to the container:
for (String cfexml : cfexmls) {
Document cfe = loadXMLFromString(cfexml);
Node newNode = doc.importNode(cfe.getDocumentElement(), true);
doc.getElementsByTagName("EnvioCFE").item(0).appendChild(newNode);
}
finally I get the xml from the container document:
TransformerFactory tf = TransformerFactory.newInstance();
Transformer trans = tf.newTransformer();
StringWriter outputWriter = new StringWriter();
trans.transform(new DOMSource(doc), new StreamResult(outputWriter));
String signedxml = outputWriter.toString();
The point is that I'm getting the child node modified, without namespaces, and so the signature validation of the extract node fails. Here is an excerpt of the XML that I've to add as a child node:
<?xml version="1.0" encoding="UTF-8"?>
<CFE xmlns="http://org.importe.test" xmlns:ns2="http://www.w3.org/2000/09/xmldsig#" xmlns:ns3="http://www.w3.org/2001/04/xmlenc#" version="1.0">
<data>
[...]
</data>
<Signature xmlns="http://www.w3.org/2000/09/xmldsig#">
[...]
</Signature>
</CFE>
but when the node is imported in the container document it get modified (actually I've noticed that it loose namespace declarations):
<?xml version="1.0" encoding="UTF-8"?>
<EnvioCFE xmlns="http://org.importe.test" xmlns:ns2="http://www.w3.org/2000/09/xmldsig#" xmlns:ns3="http://www.w3.org/2001/04/xmlenc#" version="1.0">
<Header version="1.0">
</Header>
<CFE version="1.0">
<data>
[...]
</data>
<Signature xmlns="http://www.w3.org/2000/09/xmldsig#">
[...]
</Signature>
</CFE>
</EnvioCFE>
How can I add a node to the container keeping the namespace declaration? More generally can I add it being sure it's added "as is", without any modifications?
(I'm using glassfish 4 and Java 7)
Thank you
EDIT:
This is the code of loadXMLFromString:
public Document loadXMLFromString(String xml) throws ParserConfigurationException, SAXException, IOException {
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(true);
DocumentBuilder builder = factory.newDocumentBuilder();
return builder.parse(new InputSource(new StringReader(xml)));
}
Ok, I've done research and I've found the problem:
if I use xalan-2.7.1.jar, xercesImpl-2.9.1.jar, xml-apis-1.3.04.jar and xmlsec-1.5.6.jar for DOM and XML security implementation, it works as I expect: adding a node does not modify the contents, so the digital signature is valid;
if I use the standard java libraries not only adding a node change the node itself, but sometimes the digital signature doesn't pass the xmlsec1 check, I imagine for a problem in the transformation from DOM to String.
I hope this can help someone, anyway I don't know why it happens, I'm sure I'm not doing weird things, but I've no time now to going deeply.
Actually I'm stuck to find how tell glassfish to load the DOM implementation I want, or to set a priority: I've found no way to say "load xerces instead of standard lib".

java xalan transform with and without xinclude in the same app

I write a book database application in java. The books stored in XML format. Every book a XML. The books can contains short stories and a short story can be a XML.
In that case the book look like:
<?xml version="1.0" encoding="UTF-8"?>
...
<book>
<content>
<xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="shortStory.xml"/>
</content>
</book>
Because the user can upload only one xml, and it can be the "book.xml" wihtout the "shortStory.xml" (the shortStory.xml always uploaded before) I need to do the XSLT Transform without xinclude. (et case the two file is not the same path)
But after the upload (in other usecase) I need to do the XSLT transform with the XInclude (the two file is the same path)
Every solution what use the Xinclude set the System Property before get a instance from Transformerfactory:
System.setProperty(
"org.apache.xerces.xni.parser.XMLParserConfiguration",
"org.apache.xerces.parsers.XIncludeParserConfiguration");
Or use DocumentBuilderFactory.setXIncludeAware().
I'd like two javax.xml.transform.Transformer one set up use the xinclude and one without.
Or one transformert but a simple method for javax.xml.transform.stream.StreamSource to turn on/ot the xinclude.
thanx
EDIT
Try out Martin Honnen's solution, but there was problem with the transform, so I change the SAXreader to Documentbuilder:
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setXIncludeAware(true);
DocumentBuilder docBuilder = factory.newDocumentBuilder();
Document doc = docBuilder.parse(input);
Source source = new DOMSource(doc);
...
transformer.transform(source, result);
I think that is more a question related to the XML parser than to the XSLT processor. According to http://xerces.apache.org/xerces2-j/features.html you can set
import javax.xml.parsers.SAXParser;
import org.xml.sax.SAXException;
import org.xml.sax.XMLReader;
SAXParser parser = /* created from SAXParserFactory */;
XMLReader reader = parser.getXMLReader();
try {
reader.setFeature("http://apache.org/xml/features/xinclude",
true);
}
catch (SAXException e) {
System.err.println("could not set parser feature");
}
Then you can build a SAXSource with that reader I think:
SAXSource source = new SAXSource();
source.setXMLReader(reader);
that way you have a way to build a source with XInclude turned on or off, without needing to set a system property.

Categories