Random error with OpenSAML XML Parser Configuration - java

I'm running a webapp in Tomcat 8 that uses OpenSAML. I've endorsed Xerces within Tomcat, I've checked that the endorsed dir path is set right, it appears that everything is working fine:
[ajp-apr-8009-exec-22] DEBUG org.opensaml.xml.Configuration - VM using JAXP parser org.apache.xerces.jaxp.DocumentBuilderFactoryImpl
I get several requests that work just fine, everything seems great, I can run through that section of code without error, then all of a sudden, I start getting this error:
OpenSAML requires an xml parser that supports JAXP 1.3 and DOM3.
The JVM is currently configured to use the Sun XML parser, which is known
to be buggy and can not be used with OpenSAML. Please endorse a functional
JAXP library(ies) such as Xerces and Xalan. For instructions on how to endorse
a new parser see http://java.sun.com/j2se/1.5.0/docs/guide/standards/index.html
at org.opensaml.xml.Configuration.validateNonSunJAXP(Configuration.java:278)
at org.opensaml.xml.parse.BasicParserPool.<init>(BasicParserPool.java:126)
Once I start getting the error, I will get an error every time but I haven't been able to isolate what it takes to trigger the problem. (Edit: it appears that this may be related in some way to docx4j usage, the errors start after a request that uses docx4j to generate a file as a word document. Since docx4j is so reliant on XML, this maybe makes some sense.)
Basically, what validateNonSunJAXP() does is pretty simple. All it does is check the class name for the DocumentBuilderFactory and if it starts with "com.sun", it throws the error.
Any ideas on what could be going on that would cause the VM to stop using the endorsed library?

docx4j manipulates:
javax.xml.parsers.SAXParserFactory
javax.xml.parsers.DocumentBuilderFactory
javax.xml.transform.TransformerFactory
You can see what it does, at https://github.com/plutext/docx4j/blob/master/src/main/java/org/docx4j/XmlUtils.java
javax.xml.parsers.SAXParserFactory
In summary, you can prevent docx4j from touching this value via a docx4j properties setting.
We found Crimson fails to parse docx4j XSLT files, which is why docx4j by default tries to use Xerces, where it is included in the JDK. (Things may be better in more recent JDKs)
If you don't want this, you can specify different behaviour via docx4j.properties:
docx4j.javax.xml.parsers.SAXParserFactory.donotset=true stops docx4j from changing the setting, or
javax.xml.parsers.SAXParserFactory allows you to specify what you want
Note that we don't restore the value to its original setting since we want to avoid Crimson being used for the life of the application.
javax.xml.parsers.DocumentBuilderFactory
This works similarly to SAXParserFactory
The relevant docx4j properties are as follows:
docx4j.javax.xml.parsers.DocumentBuilderFactory.donotset
javax.xml.parsers.DocumentBuilderFactory
We don't restore the value to its original setting (though maybe we could; would need to review whether docx4j always uses XmlUtils.getNewDocumentBuilder() )

Related

Java: Deprecated JAXB in JAva 9+. Using JAXB with DOCX4J to bind XML data with a MS Word template. Anyone know of an alternative in Java 9+

I inherited some Java code that gets data from an XML file and inserts the data into Microsoft Word document. The Java code uses two files as input. One is the XML file with data and the other is a Microsoft Word Document which is used as a template for the output file. The Word template has Content Control objects which are mapped to XML tags from the XML file. When the Java program is executed, it uses the Word template and the XML file to "bind" the two files together using the XML mappings in the Word template. The output from the code is a Word document in the format of the template, but with updated data from the XML file.
The binding is from the DOCX4J Library namely "Docx4J.bind()". This method uses JAXB.Content from a jar named Javax.xml.bind.JAXBContext. The problem is the JAXB library and methods are all deprecated and completely removed from Java 9 and forward. So it appears the only way to make this bind method work is to stay in Java version 8. I have been asked to find an alternative to JAXB.Context so that the Java code can be compiled and run in a Java 13 environment.
My question is this:
Is there a comparable replacement for JAXB in Java 11,12 or 13? If so, can you please point me to some documentation to find out more about it.
Thanx for your input and help.
JAXB is no longer shipped in recent JDKs, but JAXB itself is still readily available, in the reference implementation, and in MOXy. JAXB is not deprecated.
Have a look at https://www.docx4java.org/downloads.html
As it says there, just add ONE only of docx4j-JAXB-ReferenceImpl or docx4j-JAXB-MOXy. (It is only docx4j-JAXB-Internal which won't work on the versions of Java of interest to you)
You can use either the 8.x series or the 11.x series. The 8.x series is compiled for Java 8, whereas the 11.x series is compiled for Java 11+ and includes module-info.
Either will work on Java 13 (for docx4j 8, the class path at least). At present, the 8.x series tends to get new features first.
So, in summary, you should not encounter any problems at all running Docx4J.bind on Java 13!

Making Saxon produce new result document when run from Java

I am trying to run Saxon HE from java, using code that can be found in Saxon resources. I have tried changing it so that it doesn't create an aditional file from the java code, but instead having the xslt file doing that throught the use of "result-document".
My xslt did work as intended in Altova XMLSpy, but I wanted to see if I could get Saxon doing the same thing - no luck there, save from a massive head ache and loads of frustration and lots of wishes that Python will get support for this some day soon...
I get the following error message: The system identifier of the principal output file is unknown.
When I google it, I find an answer that the base uri can't be found, but nowhere can be seen how to set the base uri...
So my firt question is: Where is the base uri set? Is it in the java class or in the xslt file? I cannot see where I would set this in the xslt file, so my guess is that I would have to set this as a property of the compiler/transformer?
ANother question is about the actual href attribute of the result-document. If I want to point to a relative path, what is the syntax, and maybe what would an example look like?
And what about absolute paths?
In my file that is working in Altova, I somehow get the base uri for the source xml file that is to be transformed, and then I direct the output to a relative directory. In Saxon, the base uri instead seems to get the location of the xslt file... No idea why this is the case.
When setting an absolute path, I get an error stating I'm using an unknown protocol. So I entered "file:///" before the path. Now I get a warning complaining about a document not beeing available at a path that is concatenated of the xslt file path, and a lookup path I'm using during the transform.
As you can see, I'm all over the place here, so some guide lines and help would be greatly appreciated.
There are two APIs for running a Saxon transformation, and you haven't said which of them you are using.
Either way, a relative URI used in the href attribute of xsl:result-document is resolved relative to the "base output URI" of the transformation.
If you're using the JAXP transformation API, this was designed for XSLT 1.0, which doesn't recognize the concept of a base output URI. Saxon therefore uses the SystemID of the JAXP Result object provided as the destination of the transformation. If the JAXP Result object doesn't have a system ID, for example if you supply a DOMResult or StreamResult with no system ID specified, you're likely to get an error.
By contrast the s9api API was designed for XSLT 2.0 (with extensions for 3.0), and its XsltTransformer object therefore has an explicit setBaseOutputURI() method.
If you did something and it didn't work, then please tell us exactly what you did and exactly how it failed, and then we can help you get it right next time. It's hard to debug code that we can't see.

Issues with JDK 1.8 and java.xml.transform.Transformer

I'm using java.xml.transform.Transformer to add authentication header to a SOAP request. My application is running on JDK1.8.
When the line of code below is executed I get the subsequent warning message logged to the console
Code snippet:
transformer.transform(authenticationHeader, header.getResult());
Logged warning:
XML Parser does not recognize the feature http://xml.org/sax/features/validation
The output is a warning and does not prevent the request completing successfully however i would like to remove from log files.
I'm using Maven and do not explicitly specify Xerces or Saxon, Xalan etc. My application is using code which was generated using the Maven cxf-codegen-plugin
When I debug this issue i can see that the warning is generated in net.sf.saxon.event.Sender
Can anyone either;
A solution whereby my code uses a valid feature name or
A means to suppress the warning message from my log output
We may need to attack this one from both ends: (a) why is the validation feature being requested, and (b) why is the parser not recognizing it.
(a) Why is it being requested? Saxon will request this feature if the application requests DTD-based validation, for example by doing Configuration.setValidation(true). There are probably various other places this request can be made.
(b) Why is it not recognized? The name http://xml.org/sax/features/validation is documented as a feature of Apache Xerces, but I've no idea if the built-in JDK parser supports it or not: it's not easy to find that documentation. In the XMLReader javadoc, it's not described as a feature that every parser must recognize, but it is used as an example feature name. I always have my environment configured to use Apache Xerces by default, so it require some effort to run tests to see what features are supported if Apache Xerces isn't present.
Perhaps the parser isn't actually the JDK default but some other parser - perhaps a user-written filter that isn't the "real" parser, but filters the output from the real parser (this is quite common, and it's a common mistake for such filters not to pass on configuration settings to the underlying parser). Unfortunately there are some paths where Saxon reports which parser has rejected the request, but that's not the case for this particular path. I'll fix that.
It's not as simple as ignoring the warning. If the application is requesting DTD validation and it isn't happening, that could have serious consequences. Children could die.

Invoke HSSF Serializer Invocation

I have to write a very large XLS file, I have tried Apache POI but it simply takes up too much memory for me to use.
I had a quick look through StackOverflow and I noticed some references to the Cocoon project and, specifically the HSSFSerializer. It seems that this is a more memory-efficient way to write XLS files to disk (from what I've read, please correct me if I'm wrong!).
I'm interested in the use case described here: http://cocoon.apache.org/2.1/userdocs/xls-serializer.html . I've already written the code to write out the file in the Gnumeric format, but I can't seem to find how to invoke the HSSFSerializer to convert it to XLS.
On further reading it seems like the Cocoon project is a web framework of sorts. I may very well be barking up the wrong tree, but:
Could you provide an example of reading in a file, running the HSSFSerializer on it and writing that output to another file? It's not clear how to do so from the documentation.
My friend, HSSF serializer is part of POI. You are just setting certain attributes in the xml to be serialized (but you need a whole process to create it). Also, setting a whole pipeline using this framework just to create a XLS seems odd as it changes the app's architecture. ¿Is that your decision?
From the docs:
An alternate way of generating a spreadsheet is via the Cocoon
serializer (yet you'll still be using HSSF indirectly). With Cocoon
you can serialize any XML datasource (which might be a ESQL page
outputting in SQL for instance) by simply applying the stylesheet and
designating the serializer.
If memory is an issue, try XSSF or SXSSF in POI.
I don't know if by "XLS" you mean a specific, prior to Office 2007, version of this "Horrible SpreadSheet Format" (which is what HSSF stands for), or just anything you can open with a recent version of MS Office, OpenOffice, ...
So depending on your client requirements (i.e. those that will open your Excel file), another option might be available : generating a .XLSX file.
It comes down to producing an XML file in the proper grammar, which seems to be fit to your situation, as you seem to have already done that with the Gnumeric XML-based file format without technical trouble, and without hitting memory-effisciency issues.
Please note other XML-based spreadsheet formats exist, that Excel and other clients would be able to use. You might want to dig into the open document file formats.
As to wether to use Apache Cocoon or something else:
Cocoon can sure host the XSL processing ; batch (Cocoon CLI) processing is available if you require Cocoon, but require it not to run as a webapp (though as far as I remember, CLI feature was broken in the lastest builds of the 2.1 series) ; and Cocoon comes with a load of features and technologies that could address further requirements.
Cocoon might be overkill if it just comes down to running an XSL transformation, for which there is a bunch of well-known, lighter tools you can pick from.

Problems with Xalan and Java JDK 1.5

From what I believe and have read online. Sun has decided to include Xalan in JDK 1.5. I am trying to take advantage of this and try to perform an XSLT to spit out multiple files. The problem I encounter:
'Unrecognized XSLTC extension 'org.apache.xalan.xslt.extensions.Redirect:write''"
From what I have read on google that i needed to change:
xmlns:redirect="org.apache.xalan.xslt.extensions.Redirect"
to
xmlns:redirect="http://xml.apache.org/xalan/redirect"
in XSL transforms
When I apply this change to my .XSL File, I appear to be getting the same error. Need to get this working ASAP and can't seem to find an answer online. Any help will be greatly appreciated.
Just ignored the JDK's default Xalan. I just added the files from Xalan. Better, that way I can just use that rather than depending on a single JDK.

Categories