java xalan transform with and without xinclude in the same app

java xalan transform with and without xinclude in the same app - java

I write a book database application in java. The books stored in XML format. Every book a XML. The books can contains short stories and a short story can be a XML.
In that case the book look like:
<?xml version="1.0" encoding="UTF-8"?>
...
<book>
<content>
<xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="shortStory.xml"/>
</content>
</book>
Because the user can upload only one xml, and it can be the "book.xml" wihtout the "shortStory.xml" (the shortStory.xml always uploaded before) I need to do the XSLT Transform without xinclude. (et case the two file is not the same path)
But after the upload (in other usecase) I need to do the XSLT transform with the XInclude (the two file is the same path)
Every solution what use the Xinclude set the System Property before get a instance from Transformerfactory:
System.setProperty(
"org.apache.xerces.xni.parser.XMLParserConfiguration",
"org.apache.xerces.parsers.XIncludeParserConfiguration");
Or use DocumentBuilderFactory.setXIncludeAware().
I'd like two javax.xml.transform.Transformer one set up use the xinclude and one without.
Or one transformert but a simple method for javax.xml.transform.stream.StreamSource to turn on/ot the xinclude.
thanx
EDIT
Try out Martin Honnen's solution, but there was problem with the transform, so I change the SAXreader to Documentbuilder:
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setXIncludeAware(true);
DocumentBuilder docBuilder = factory.newDocumentBuilder();
Document doc = docBuilder.parse(input);
Source source = new DOMSource(doc);
...
transformer.transform(source, result);

I think that is more a question related to the XML parser than to the XSLT processor. According to http://xerces.apache.org/xerces2-j/features.html you can set
import javax.xml.parsers.SAXParser;
import org.xml.sax.SAXException;
import org.xml.sax.XMLReader;
SAXParser parser = /* created from SAXParserFactory */;
XMLReader reader = parser.getXMLReader();
try {
reader.setFeature("http://apache.org/xml/features/xinclude",
true);
}
catch (SAXException e) {
System.err.println("could not set parser feature");
}
Then you can build a SAXSource with that reader I think:
SAXSource source = new SAXSource();
source.setXMLReader(reader);
that way you have a way to build a source with XInclude turned on or off, without needing to set a system property.

Related

Java - How does an xml document loads a DTD using XML Catalogs?

I want to know this so I can apply xsl transformations to the xml document without losing some entities like –
How do I tell the parser (any parser I dont care) which catalog to use and then execute the xsl transformations?, How do I connect the new configured parser to the transformation factory.
The code below represents the transformations I want to execute on the xml file (it works fine). I just want to know how can I add the XML Catalog approach so the xml-document loads correctly its DTD and continue with the xsl transformations steps.
try {
SAXTransformerFactory stf = (SAXTransformerFactory) TransformerFactory.newInstance();
Templates step1Template = stf.newTemplates(new StreamSource(
this.getClass().getResourceAsStream("xsltransformation_step1.xsl")
));
Templates step2Template = stf.newTemplates(new StreamSource(
this.getClass().getResourceAsStream("xsltransformation_step2.xsl")
));
Templates step3Template = stf.newTemplates(new StreamSource(
this.getClass().getResourceAsStream("xsltransformation_step3.xsl")
));
TransformerHandler th1 = stf.newTransformerHandler(step1Template);
TransformerHandler th2 = stf.newTransformerHandler(step2Template);
TransformerHandler th3 = stf.newTransformerHandler(step3Template);
StreamSource xmlStreamSource = new StreamSource(new File(xmlInputFile));
StreamResult outputStreamSource1 = new StreamResult(new File (outputNewFile1));
StreamResult outputStreamSource2 = new StreamResult(new File (outputNewFile2));
th1.setResult(new SAXResult(th2));
th2.setResult(new SAXResult(th3));
th3.setResult(outputStreamSource1);
Transformer t = stf.newTransformer();
t.transform(xmlStreamSource, new SAXResult(th1));
}catch (TransformerException e){
e.printStackTrace();
return false;
}
This is an example of the xmlInputFile containing entities
<?xml version='1.0' encoding='utf-8'?>
<!DOCTYPE manual PUBLIC '-//docufy//Docufy Standard DTD 20080125//EN' '/system/cosimago/dtd/manual.dtd'>
<chapter>
<title>LEDs „5 – 8“ am CPU-Board prüfen</title>
<body>
<!-- just content -->
</body>
</chapter>
Please I would be really thankful if some good soul help me out with this.
Thank you in advance.
Andres

It's simplest to create your own XML parser (XMLReader) using SAXTransformerFactory.newInstance(). Then set the CatalogResolver on the parser using XMLReader.setEntityResolver(). Then wrap the XMLReader in a SAXSource, and supply this as the Source object to Transformer.transform().
With Saxon it's also possible to supply the entity resolver indirectly via a configuration property, but this is much more convoluted and is only needed if you aren't able to control the creation and configuration of the XMLReader yourself.

Trying to use XInclude with Java and resolving the fragment with xml:id

I've been trying to get XInclude working in my XML document and finally have it working in Oxygen XML, which I'm using to author the XML documents.
I then went to my app, written in Java, but it doesn't seem to support any form of XPointer resolution except using something like: element(/1/2).
This is, obviously, an awful scheme to have to use since every time the document is edited the XPointer needs changing to reflect the new position of the node in the XML!
The scheme I had working simply used xml:id in the target document:
<foo>
<bar xml:id="ABCD" />
</foo>
and then, in the other document:
<lorem>
<ipsum>
<xi:include href="target.xml" xpointer="ABCD" />
</ipsum>
</lorem>
Which I anticipate (and am getting in Oxygen) results in something like:
<lorem>
<ipsum>
<bar xml:id="ABCD" />
</ipsum>
</lorem>.
However, in Java it fails with:
Resource error reading file as XML (href='data/target.xml'). Reason:
XPointer resolution unsuccessful.
If, however, I change the include tag to use
xpointer="element(/1/1)"
then it works very nicely - but, as I've said, that's a very poor solution.
I'm simply using the implementations that are included with the Java runtime (1.8).
Here's the code I'm using:
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(true);
factory.setXIncludeAware(true);
Source resultSource = new
StreamSource(Gdx.files.internal("data/result.xsd").read());
Source targetSource = new
StreamSource(Gdx.files.internal("data/target.xsd").read());
Source[] schemaFiles = {targetSource, resultSource};
schema =
SchemaFactory.newInstance("http://www.w3.org/2001/XMLSchema")
.newSchema(schemaFiles);
factory.setSchema(schema);
builder = factory.newDocumentBuilder();
itemDoc = builder.parse(new
InputSource(Gdx.files.internal("data/result.xml").read()));

According to Apache Xerces's docs on XInclude (which is used internally for XML parsing by Java)
for shorthand pointers and element() XPointers, currently only DTD-determined IDs are supported.
This means that you need to put markup declarations such as the following into your target.xml file (telling the XML parser that the id attribute is to be treated as attribute with ID semantics, and telling XInclude to interpret "bare" XPointers as ID references):
<!DOCTYPE foo [
<!ATTLIST bar id ID #IMPLIED>
]>
<foo>
<bar id="ABCD"/>
</foo>
If you now use the following document as your source XML (which you've named result.xml in your example code, and which I've edited to contain an XInclude namespace URI binding for xi)
<lorem xmlns:xi="http://www.w3.org/2001/XInclude">
<ipsum>
<xi:include href="target.xml" xpointer="ABCD"/>
</ipsum>
</lorem>
then Xerces will build up a Document where XInclude processing has been performed as desired (where i've put your example data into the target.xml file in the same directory as the result.xml file):
<lorem xmlns:xi="http://www.w3.org/2001/XInclude">
<ipsum>
<bar id="ABCD" xml:base="target.xml"/>
</ipsum>
</lorem>
The Java code I've used to produce the document is simplified from your example and doesn't contain third-party libs:
import java.io.*;
import javax.xml.*;
import javax.xml.parsers.*;
import javax.xml.validation.*;
import javax.xml.transform.*;
import javax.xml.transform.stream.*;
import javax.xml.transform.dom.*;
import org.w3c.dom.*;
public class t {
public static void main(String[] args) {
try {
DocumentBuilderFactory factory =
DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(true);
factory.setXIncludeAware(true);
DocumentBuilder builder = factory.newDocumentBuilder();
Document itemDoc = builder.parse(new File("result.xml"));
System.out.println(serialize(itemDoc));
}
catch (Exception ex) {
ex.printStackTrace();
}
}
static String serialize(Document doc) throws Exception {
Transformer transformer =
TransformerFactory.newInstance().newTransformer();
StreamResult result = new StreamResult(new StringWriter());
DOMSource source = new DOMSource(doc);
transformer.transform(source, result);
return result.getWriter().toString();
}
}
Seeing as you also use XML Schema validation, I'd also like to point out the potential interaction of XInclude with XML Schema as eg. discussed in XInclude Schema/Namespace Validation?, and also a potential alternative discussed in Duplicate some parts of XML without rewriting them .

Java replacing / removing / writing text in a certain spot in a txt file

So they the file I wanna edit is this,
<?xml version="1.0" encoding="utf-16"?>
<UserSettingsXml xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<AuthType>Google</AuthType> <!-- Google/Ptc -->
<DefaultLatitude>425</DefaultLatitude>
<DefaultLongitude>5555</DefaultLongitude>
The places I wanna edit, and remove text from is
<DefaultLatitude>425</DefaultLatitude>
<DefaultLongitude>5555</DefaultLongitude>
I've googled a bit but couldn't find what I was looking for.

If you want to alter an XML file you should be using something like DOM.
That's easier and safer to do than just altering the text in the file.
Reading the xml-file into a DOM-Document is quite easy:
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(new File(filename));
After that you'll have to find the right spot in the XML-Document and alter it:
Element root = doc.getDocumentElement();
NodeList found = root.getElementsByTagName("DefaultLatitude");
Node element = found.item(0);
String textContent = element.getTextContent(); // contains 425 now
element.setTextContent("987"); // set new text
After that you'll have to write out the changed document to a file:
TransformerFactory tranformerFactory = TransformerFactory.newInstance();
Transformer transformer = tranformerFactory.newTransformer();
DOMSource source = new DOMSource(doc);
StreamResult result = new StreamResult(new File(outputFile));
transformer.transform(source, result);
Please note that this is just an example on how to read, alter and save a XML-File. To properly work with DOM you'll have to read some tutorials!

The file you are trying to edit is in xml format. The best way to do what you want is to:
1. parse the file in xml
2. manipulate the xml
3. build the file again
A good page where to learn how to convert from text to xml is this:
http://www.mkyong.com/java/how-to-read-xml-file-in-java-dom-parser/
The rest is easier. I hope this helps.

Java DOM appendChild: namespaces and signed documents

I'm having a problem generating a document with signed nodes. I have a bunch of signed XMLs, in text format. They signature is valid, I've tested it with xmlsec1. I have to load all the XMLs and put them in another XML document, for sending it to another service.
So, first of all I create the container document ("sobre" is a local variable, a JAXB root element):
JAXBContext context = JAXBContext.newInstance("org.importe.test");
StringWriter writer = new StringWriter();
Marshaller marshaller = context.createMarshaller();
marshaller.marshal(sobre, writer);
String xml = writer.toString();
Document doc = loadXMLFromString(xml);
and then I add the XMLs to the container:
for (String cfexml : cfexmls) {
Document cfe = loadXMLFromString(cfexml);
Node newNode = doc.importNode(cfe.getDocumentElement(), true);
doc.getElementsByTagName("EnvioCFE").item(0).appendChild(newNode);
}
finally I get the xml from the container document:
TransformerFactory tf = TransformerFactory.newInstance();
Transformer trans = tf.newTransformer();
StringWriter outputWriter = new StringWriter();
trans.transform(new DOMSource(doc), new StreamResult(outputWriter));
String signedxml = outputWriter.toString();
The point is that I'm getting the child node modified, without namespaces, and so the signature validation of the extract node fails. Here is an excerpt of the XML that I've to add as a child node:
<?xml version="1.0" encoding="UTF-8"?>
<CFE xmlns="http://org.importe.test" xmlns:ns2="http://www.w3.org/2000/09/xmldsig#" xmlns:ns3="http://www.w3.org/2001/04/xmlenc#" version="1.0">
<data>
[...]
</data>
<Signature xmlns="http://www.w3.org/2000/09/xmldsig#">
[...]
</Signature>
</CFE>
but when the node is imported in the container document it get modified (actually I've noticed that it loose namespace declarations):
<?xml version="1.0" encoding="UTF-8"?>
<EnvioCFE xmlns="http://org.importe.test" xmlns:ns2="http://www.w3.org/2000/09/xmldsig#" xmlns:ns3="http://www.w3.org/2001/04/xmlenc#" version="1.0">
<Header version="1.0">
</Header>
<CFE version="1.0">
<data>
[...]
</data>
<Signature xmlns="http://www.w3.org/2000/09/xmldsig#">
[...]
</Signature>
</CFE>
</EnvioCFE>
How can I add a node to the container keeping the namespace declaration? More generally can I add it being sure it's added "as is", without any modifications?
(I'm using glassfish 4 and Java 7)
Thank you
EDIT:
This is the code of loadXMLFromString:
public Document loadXMLFromString(String xml) throws ParserConfigurationException, SAXException, IOException {
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(true);
DocumentBuilder builder = factory.newDocumentBuilder();
return builder.parse(new InputSource(new StringReader(xml)));
}

Ok, I've done research and I've found the problem:
if I use xalan-2.7.1.jar, xercesImpl-2.9.1.jar, xml-apis-1.3.04.jar and xmlsec-1.5.6.jar for DOM and XML security implementation, it works as I expect: adding a node does not modify the contents, so the digital signature is valid;
if I use the standard java libraries not only adding a node change the node itself, but sometimes the digital signature doesn't pass the xmlsec1 check, I imagine for a problem in the transformation from DOM to String.
I hope this can help someone, anyway I don't know why it happens, I'm sure I'm not doing weird things, but I've no time now to going deeply.
Actually I'm stuck to find how tell glassfish to load the DOM implementation I want, or to set a priority: I've found no way to say "load xerces instead of standard lib".

TransformerFactory - avoiding network lookups to verify DTDs

I am needing to program for offline transformation of XML documents.
I have been able to stop DTD network lookups when loading the original XML file with the following :
DocumentBuilderFactory factory;
factory = DocumentBuilderFactory.newInstance();
factory.setValidating(false);
factory.setNamespaceAware(true);
factory.setFeature("http://xml.org/sax/features/namespaces", false);
factory.setFeature("http://xml.org/sax/features/validation", false);
factory.setFeature("http://apache.org/xml/features/nonvalidating/load-dtd-grammar", false);
factory.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);
// open up the xml document
docbuilder = factory.newDocumentBuilder();
doc = docbuilder.parse(new FileInputStream(m_strFilePath));
However, I am unable to apply this to the TransformerFactory object.
The DTDs are available locally, but I do not know how to direct the transformer to look at the local files as opposed to trying to do a network lookup.
From what I can see, the transformer needs these documents to correctly do the transformation.
For information, I am transforming MusicXML documents from Partwise to Timewise.
As you have probably guessed, XSLT is not my strong point (far from it).
Do I need to modify the XSLT files to reference local files, or can this be done differently ?
Further to the comments below, here is an excerpt of the xsl file. It is the only place that I see which refers to an external file :
<!--
XML output, with a DOCTYPE refering the timewise DTD.
Here we use the full Internet URL.
-->
<xsl:output method="xml" indent="yes" encoding="UTF-8"
omit-xml-declaration="no" standalone="no"
doctype-system="http://www.musicxml.org/dtds/timewise.dtd"
doctype-public="-//Recordare//DTD MusicXML 2.0 Timewise//EN" />
Is the mentioned technique valid for this also ?
The DTD file contains references to a number of MOD files like this :
<!ENTITY % layout PUBLIC
"-//Recordare//ELEMENTS MusicXML 2.0 Layout//EN"
"layout.mod">
I presume that these files will also be imported in turn also.

Ok, here is the answer which works for me.
1st step : load the original document, turning off validation and dtd loading within the factory.
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
// stop the network loading of DTD files
factory.setValidating(false);
factory.setNamespaceAware(true);
factory.setFeature("http://xml.org/sax/features/namespaces", false);
factory.setFeature("http://xml.org/sax/features/validation", false);
factory.setFeature("http://apache.org/xml/features/nonvalidating/load-dtd-grammar", false);
factory.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);
// open up the xml document
DocumentBuilder docbuilder = factory.newDocumentBuilder();
Document doc = docbuilder.parse(new FileInputStream(m_strFilePath));
2nd step : Now that I have got the document in memory ... and after having detected that I need to transform it -
TransformerFactory transformfactory = TransformerFactory.newInstance();
Templates xsl = transformfactory.newTemplates(new StreamSource(new FileInputStream((String)m_XslFile)));
Transformer transformer = xsl.newTransformer();
Document newdoc = docbuilder.newDocument();
Result XmlResult = new DOMResult(newdoc);
// now transform
transformer.transform(
new DOMSource(doc.getDocumentElement()),
XmlResult);
I needed to do this as I have further processing going on afterwards and did not want the overhead of outputting to file and reloading.
Little explanation :
The trick is to use the original DOM object which has had all the validation features turned off. You can see this here :
transformer.transform(
new DOMSource(doc.getDocumentElement()), // <<-----
XmlResult);
This has been tested with network access TURNED OFF.
So I know that there are no more network lookups.
However, if the DTDs, MODs, etc are available locally, then, as per the suggestions, the use of an EntityResolver is the answer. This to be applied, again, to the original docbuilder object.
I now have a transformed document stored in newdoc, ready to play with.
I hope this will help others.

You can use a library like Apache xml-commons-resolver and write a catalog file to map web URLs to your local copy of the relevant files. To wire this catalog up to the transformer mechanism you would need to use a SAXSource instead of a StreamSource as the source of your stylesheet:
SAXSource styleSource = new SAXSource(new InputSource("file:/path/to/stylesheet.xsl"));
CatalogResolver resolver = new CatalogResolver();
styleSource.getXMLReader().setEntityResolver(resolver);
TransformerFactory tf = TransformerFactory.newInstance();
tf.setURIResolver(resolver);
Transformer transformer = tf.newTransformer(styleSource);

The usual way to do this in Java is to use an LSResourceResolver to resolve the system ID (and/or public ID) to your local file. This is documented at http://docs.oracle.com/javase/7/docs/api/org/w3c/dom/ls/LSResourceResolver.html. You shouldn't need anything outside of standard Java XML parser features to get this working.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

java xalan transform with and without xinclude in the same app - java

Related

Java - How does an xml document loads a DTD using XML Catalogs?

Trying to use XInclude with Java and resolving the fragment with xml:id

Java replacing / removing / writing text in a certain spot in a txt file

Java DOM appendChild: namespaces and signed documents

TransformerFactory - avoiding network lookups to verify DTDs

Categories

Resources