schema-aware XSLT transform on saxonica 9 EE with scala - java

what am I missing ? I can't get why my transformation is not schema-aware.
Ref:
http://www.saxonica.com/documentation/schema-processing/satransformapi.html
http://www.saxonica.com/documentation9.4-demo/html/changes/intro92/install92.html
I know the documents are fine, xsd/xslt/xml files are processed by other systems and it works fine. I was trying to create a desktop command line tool for myself.
source code
def main(args: Array[String])
{
System.setProperty( "javax.xml.transform.TransformerFactory", "com.saxonica.config.EnterpriseTransformerFactory")
val factory = new EnterpriseTransformerFactory()
factory.setAttribute(FeatureKeys.SCHEMA_VALIDATION, new Integer(Validation.STRICT))
val schemaXXX = new StreamSource( new File("PATH/to/xxx.xsd") )
val schemaYYY = new StreamSource( new File("PATH/to/yyy.xsd") )
factory.addSchema(schemaXXX)
factory.addSchema(schemaYYY)
val XSLT = new StreamSource(new File("PATH/to/zzz.xslt"))
val transformer = factory.newTransformer(XSLT)
val input = new StreamSource(new File("PATH/to/file.xml"))
val result = new StringWriter();
transformer.transform(input, new StreamResult(result))
println(result.toString())
}
Result:
The transformation is not schema-aware, so the source document must be untyped

A stylesheet in Saxon-EE is considered schema-aware if it explicitly uses xsl:import-schema, or if the XSLT compiler used to compile it is explicitly set to be schema aware. This is easier done using the s9api interface (XsltCompiler.setSchemaAware(true)), but it can also be done using JAXP by setting the property FeatureKeys.XSLT_SCHEMA_AWARE ("http://saxon.sf.net/feature/xsltSchemaAware") on the TransformerFactory. The reason you have to set this explicitly is that processing untyped documents is faster if the stylesheet knows at compile time that everything will be untyped, so we don't want people to incur extra costs when they move to Saxon-EE if they aren't using this feature.
In future please feel free to raise support questions at saxonica.plan.io where we aim to give a response within 24 hours.

Related

Document to String using DocumentBuilderFactory?

I am trying to find a way to convert Document to String and found this XML Document to String? post here. But, I want to do the conversion without using TransformerFactory because of XXE Vulnerabilities and by using DocumentBuilderFactory only. I cannot upgrade to jdk8 because of other limitations.
I haven't had any luck so far with it; all the searches are returning the same code shown in the above link.
Is it possible to do this?
This is difficult to do, but since your actual problem is the security vulnerability and not TransformerFactory, that may be a better way to go.
You should be able to configure TransformerFactory to ignore entities to prevent this sort of problem. See: Preventing XXE Injection
Another thing that may work for your security concerns is to use TransformerFactory.setFeature(XMLConstants.FEATURE_SECURE_PROCESSING). This should prevent the problems that you're worried about. See also this forum thread on coderanch.
Setting FEATURE_SECURE_PROCESSING may or may not help, depending on what implementation TransformerFactory.getInstance() actually returns.
For example in Java 7 with no additional XML libraries on classpath setting transformerFactory.setFeature(XMLConstants.FEATURE_SECURE_PROCESSING, true); does not help.
You can fix this by providing a Source other than StreamSource (which factory would need to parse using some settings that you do not control).
For example you can use StAXSource like this:
TransformerFactory transformerFactory = TransformerFactory.newInstance();
transformerFactory.setFeature(XMLConstants.FEATURE_SECURE_PROCESSING, true); // does not help in Java 7
Transformer transformer = transformerFactory.newTransformer();
// StreamSource is insecure by default:
// Source source = new StreamSource(new StringReader(xxeXml));
// Source configured to be secure:
XMLInputFactory xif = XMLInputFactory.newFactory();
xif.setProperty(XMLInputFactory.IS_SUPPORTING_EXTERNAL_ENTITIES, false);
xif.setProperty(XMLInputFactory.SUPPORT_DTD, false);
XMLEventReader xmlEventReader = xif.createXMLEventReader(new StringReader(xxeXml));
Source source = new StAXSource(xmlEventReader);
transformer.transform(
source,
new StreamResult(new ByteArrayOutputStream()));
Note the actual TrasformerFactory may not actually support StAXSource, so you need to test your code with the classpath as it would be on production. For example Saxon 9 (old one, I know) does not support StAXSource and the only clean way of "fixing" it that I know is to provide custom net.sf.saxon.Configuration instance.

Usage of compiled XSL transformations

I am producing compiled .class files (Translet) from XSL transformation files with using TransformerFactory which is implemented by org.apache.xalan.xsltc.trax.TransformerFactoryImpl.
Unfortunately, I couldn't find the way how to use these translet classes on XML transformation despite my searchings for hours.
Is there any code example or reference documentation may you give? Because this document is insufficient and complicated.
Thanks.
A standard transformation in XSLT looks like this:
public void translate(InputStream xmlStream, InputStream styleStream, OutputStream resultStream) {
Source source = new StreamSource(xmlStream);
Source style = new StreamSource(styleStream);
Result result = new StreamResult(resultStream);
TransformerFactory tFactory = TransformerFactory.newInstance();
Transformer t = tFactory.newTransformer(style);
t.transform(source, result);
}
so given that you don't use a Transformer factory, but a ready made Java class (which is an additional maintenance headache and doesn't give you that much better performance since you can keep your transformer object after the initial compilation) the same function would look like this:
public void translate(InputStream xmlStream, OutputStream resultStream) {
Source source = new StreamSource(xmlStream);
Result result = new StreamResult(resultStream);
Translet t = new YourTransletClass();
t.transform(source, result);
}
In your search you missed out to type the Interface specification into Google where the 3rd link shows the interface definition, that has the same call signature as Transformer. So you can swap a transformer object for your custom object (or keep your transformer objects in memory for reuse)
Hope that helps

Parse velocity variables within java

The following is an example of what I want to do.
I have a bunch of files such as test1.vm:
Welcome ${name}. This is test1.
Then I have a file called defaults.vm:
#set($name = "nmore")
I want render test1.vm (and the other test files) with the variable(s) in defaults.vm without using #parse as I would have to modify all the test files.
Is there a way to do this from within the accompanying java file?
I'm not sure if you have any constraints or any other specific requirements, but if you don't have you tried to use Velocity API? Something like this:
Context context = new VelocityContext();
Template template = Velocity.getTemplate("src/main/resources/defaults.vm");
template.merge(context, NullWriter.NULL_WRITER);
StringWriter writer = new StringWriter();
Template toBeParsedTemplate = Velocity.getTemplate("src/main/resources/test1.vm");
toBeParsedTemplate.merge(context, writer);
String renderedContent = writer.getBuffer().toString();
System.out.println(renderedContent);
The idea is that you fill in the Context object with the variables generated from defaults.vm and use the same context to evaluate test1.vm.
I've tried this using Velocity 1.7 and commons-io 2.4 (for the NullWriter) seems to be working fine, but I'm not sure if this can fit into your requirement or you're looking into other alternatvies (not using Velocity API).
More info on the Context object here:
http://velocity.apache.org/engine/devel/developer-guide.html#The_Context
Hope that helps.

Saxon in Java: XSLT for CSV to XML

Mostly continued from this question: XSLT: CSV (or Flat File, or Plain Text) to XML
So, I have an XSLT from here: http://andrewjwelch.com/code/xslt/csv/csv-to-xml_v2.html
And it converts a CSV file to an XML document. It does this when used with the following command on the command line:
java -jar saxon9he.jar -xsl:csv-to-xml.csv -it:main -o:output.xml
So now the question becomes: How do I do I do this in my Java code?
Right now I have code that looks like this:
TransformerFactory transformerFactory = TransformerFactory.newInstance();
StreamSource xsltSource = new StreamSource(new File("location/of/csv-to-xml.xsl"));
Transformer transformer = transformerFactory.newTransformer(xsltSource);
StringWriter stringWriter = new StringWriter();
transformer.transform(documentSource, new StreamResult(stringWriter));
String transformedDocument = stringWriter.toString().trim();
(The Transformer is an instance of net.sf.saxon.Controller.)
The trick on the command line is to specify "-it:main" to point right at the named template in the XSLT. This means you don't have to provide the source file with the "-s" flag.
The problem starts again on the Java side. Where/how would I specify this "-it:main"? Wouldn't doing so break other XSLT's that don't need that specified? Would I have to name every template in every XSLT file "main?" Given the method signature of Transformer.transform(), I have to specify the source file, so doesn't that defeat all the progress I've made in figuring this thing out?
Edit: I found the s9api hidden inside the saxon9he.jar, if anyone is looking for it.
You are using the JAXP API, which was designed for XSLT 1.0. If you want to make use of XSLT 2.0 features, like the ability to start a transformation at a named template, I would recommend using the s9api interface instead, which is much better designed for this purpose.
However, if you've got a lot of existing JAXP code and you don't want to rewrite it, you can usually achieve what you want by downcasting the JAXP objects to the underlying Saxon implementation classes. For example, you can cast the JAXP Transformer as net.sf.saxon.Controller, and that gives you access to controller.setInitialTemplate(); when it comes to calling the transform() method, just supply null as the Source parameter.
Incidentally, if you're writing code that requires a 2.0 processor then I wouldn't use TransformerFactory.newInstance(), which will give you any old XSLT processor that it finds on the classpath. Use new net.sf.saxon.TransformerFactoryImpl() instead, which (a) is more robust, and (b) much much faster.

Java: need help with optimizing a part of code

I have a simple code for transforming XML, but it is very time consuming (I have to repeat it many times). Does anyone have a recommendation how to optimize this code? Thanks.
EDIT: This is a new version of the code. I unfortunatelly can't reuse Transformer, since XSLTRuleis in most of the cases different. I'm now reusing TransformerFactory. I'm not reading from files before this so I can't use StreamSource. Largest amount of time is spent on initialization of Transformer.
private static TransformerFactory tFactory = TransformerFactory.newInstance();
public static String transform(String XML, String XSLTRule) throws TransformerException {
Source xmlInput = new StreamSource(new StringReader(XML));
Source xslInput = new StreamSource(new StringReader(XSLTRule));
Transformer transformer = tFactory.newTransformer(xslInput);
StringWriter resultWriter = new StringWriter();
Result result = new StreamResult(resultWriter);
transformer.transform(xmlInput, result);
return resultWriter.toString();
}
The first thing you should do is to skip the unnecessary conversion of the XML string to bytes (especially with a hardcoded, potentially incorrect encoding). You can use a StringReader and pass that to the StreamSource constructor. The same for the result: use a StringWriter and avoid the conversion.
Of course, if you call the method after converting your XML from a file (bytes) to a String in the first place (again with a potentially wrong encoding), it would be even better to have the StreamSource read from the file directly.
It seems like you apply an XSLT to an XML file. To speed things up, you can try compiling the XSLT, like with XSLTC.
I can only think of a couple of minor things:
The TransformerFactory could be reused.
The Transformer could be reused if it is thread confined, and the XSL input is the same each time.
If you can estimate the output size reasonably accurately, you could create the ByteArrayOutputStream with an initial size hint.
As stated in Michaels answer, you could potentially speed things up by not loading either the input or output xml entirely into memory yourself and make your api stream based.

Categories