How to match and process unknown XML elements in XSLT 1.0? - java

I have a simply XSLT 1.0 stylesheet, that turns XML documents in XHTML. I really want to be able to "include" the content of an XML file in another when needed. AFAIK it is simply not possible in XSLT 1.0, so I decided to move my processing to a simple Java app that would pre-process the XML, executing the "includes" recursively, and passing it to the default JDK XSLT processor. I have a XML schema that my documents must conform to.
The most used element is called "text", and can have an "id" and/or a "class" attribute, which gets used for XHTML styling with CSS. This element gets turned into "p", "div", or "span" depending on the context.
What I would like to add, is the ability to define "unknown" elements in my input files, and have them transformed in a "text" element for further processing. If the "unknown" element's name start with a capital letter, then it becomes a "text", with "id" set to original name. Otherwise a "text" with "class" set to original name. Everything else in the unknown element should be kept as-is, and then it should be processed by XSLT as if it was originally in the input file. In other words, I would like to transform all unknown elements to for a valid XML document, and then process it with my stylesheet.
Can this be done in XSLT, possibly in a pre-processing "stylesheet", or should I do that as pre-processing in Java? Performance here is not important. I would prefer a XSLT solution, but not if it's much more complicated then doing it in Java.

Well, since no one answered, I just tried it. While is is easier to do it in Java, it has one major drawback: since the code need to know the valid elements so that it recognize the unknown ones, you end up having to hardcode that in your code and have to recompile it if the XSLT template changes.
So, I tried in XSLT and it also works. Let's say you have:
<xsl:template match="text">
*processing*
<xsl:call-template name="id_and_class"/>
*processing*
</xsl:template>
where the template named id_and_class copies your id and classes attribute in the generated element, and you want unknown elements to be mapped to "text" elements, then you can do this:
<xsl:template match="text">
<xsl:call-template name="text_processing"/>
</xsl:template>
<xsl:template name="text_processing">
*processing*
<xsl:call-template name="text_id_and_class"/>
*processing*
</xsl:template>
...
<xsl:template name="text_id_and_class">
<xsl:choose>
<!-- If name() is not "text", then we have an unknown element. -->
<xsl:when test="name()!='text'">
<!-- Processing of ID and class omitted ... -->
</xsl:when>
<xsl:otherwise>
<xsl:call-template name="id_and_class"/>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
...
<!-- MUST BE LAST : Process unknown elements like a "text" element. -->
<xsl:template match="*">
<xsl:call-template name="text_processing"/>
</xsl:template>
If yon process the content of one specific element with a named template, then you can check in that template if the name matches, and use that for your special processing. Then you just have to put a <xsl:template match="*"> at the end of your stylesheet and call the named template from there.

Related

XSLT check if String is inside a set

I am trying to check if a String is contained within a set. I have an Excel sheet that I convert to an xml file; example:
Excel sheet on left and converted sheet on right (RowData.xml):
So I have an xml file where those set of numbers may or may not be there. For example, the source xml may look like this:
Source.xml:
<Data>
<Number>5556781234</Number>
<Number>5556781235</Number>
<Number>5556781236</Number>
</Data>
As you see it can stop anywhere. The source xml file may have all the numbers listed in RowData.xml or it may have only 1 or more. So my question is, how would I check for that in my xslt file?
I want to do this:
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<!-- This is the Excel sheet converted to an XML file -->
<xsl:param name="sheet-uri" select="'RowData.xml'"/>
<xsl:param name="sheet-doc" select="document($sheet-uri)"/>
<xsl:template match="Data">
<xsl:for-each select="Data/Number">
<xsl:variable name="continue" select="$sheet-doc//Sheet/Row[Number = current()]/Continue"/>
<xsl:if test="">
<!-- Check the Source.xml against the RowData.xml and
see if the set contains any "No"'s in it. -->
<!-- If it does then don't do the following -->
<Data2>
<Number><xsl:value-of select="Number"/></Number>
<Timestamp>125222</Timestamp>
</Data2>
</xsl:if>
</xsl:for-each>
</xsl:template>
So basically, before making the <Data2> element, check the numbers in Source.xml and see if any of those numbers have a value of No for the column Continue in RowData.xml. I don't know how to make the if statement above. I know there's a contains() function in xslt; however, I don't know how I can use it here.
Is this possible? Please let me know if anything was confusing. Thanks in advance!
check the numbers in Source.xml and see if any of those numbers have a value of No for the column Continue in RowData.xml.
You can take advantage of XSLT's "existential equal" operator here:
test="doc('source.xml')/Data/Number =
$sheet-doc//Sheet/Row[Continue='No']/Number"
Essentially, if A and B are sets of values, then A = B returns true if some value in A is equal to some value in B.
I would suggest you use the key mechanism - esp. if you're using XSLT 2.0.
Define a key as:
<xsl:key name="row" match="Row" use="Number" />
then do:
<xsl:template match="/Data">
<xsl:for-each select="Number[not(key('row', ., $sheet-doc))]">
<Data2>
<xsl:copy-of select="."/>
<Timestamp>125222</Timestamp>
</Data2>
</xsl:for-each>
</xsl:template>
This selects only Numberelements that do not have a corresponding Row in the RowData.xml document.

Xalan and the document() function

Since I have spent almost a full day now on debugging this, I hope to get some valuable insight on SO on following problem:
I am running an XSL Transformation on an input document, my stylesheet loads an external XML-Document which contains lookup values I need to do some comparisons.
I am loading the external document like this:
<xsl:variable name="dictionary"
select="document('myDict.xml', document(''))/path/to/LookupElement" />
LookupElement is an element which contains the complete XML-Fragment I need to access.
Throughout the stylesheet various comparison expressions are accessing $dictionary.
Now, what happens is, that the transformation with this document() function call in place takes about 12 (!) minutes using Xalan (2.7.?, latest version, downloaded from the Apache website, not the one contained in the JRE).
The same stylesheet without the document() call (and without my comparisons accessing data in $dictionary) completes in seconds.
The same stylesheet using Saxon-B 9.1.0.8 completes in seconds as well.
Information: The external document has 25MB(!) and there is no possibility for me to reduce its size.
I am running the transformations using the xslt-Task of ant under JRE 6.
I am not sure if this has anything to do with above mentioned problem: Throughout my stylesheet I have expressions that test for existence of certain attributes in the external XML-Document. These expressions always evaluate to true, regardless of whether the attributes exist or not:
<xsl:variable name="myAttExists" select="boolean($dictionary/path/to/#myAttribute)"/>
I am at the end of my wits. I know that Xalan correctly reads the document, all references go to $dictionary, so I am not calling document() multiple times.
Anybody any idea?
Edit:
I have removed the reference to the XML-Schema from the external XML-Document to prevent Schema-Lookups of Xalan or the underlying (Xerces) Parser.
Edit:
I have verified that myAttExists will always be true, even if specifiying an attribute name that for sure does not exist in the entire external XML-Document.
I have even changed the above expression to:
<xsl:variable name="myAttExists" select="count($dictionary/path/to/#unknownAttribute) != 0"/>
which still yields true.
Edit:
I have removed the call to the document() function and all references to $dictionary for testing purposes. This reduces transformation runtime with Xalan to 16 seconds.
Edit:
Interesting detail: The Xalan version shipped with Oxygen 12.1 completes within seconds loading the external XML-Document. However, it also evaluates the existence of attributes incorrectly...
Edit:
I have the following variable declaration which always yields true:
<xsl:variable name="expectedDefaultValueExists">
<xsl:choose>
<xsl:when test="#index">
<xsl:value-of select="boolean($dictionary/epl:Object[#index = $index]/#defaultValue)"/>
</xsl:when>
<xsl:otherwise>
<xsl:value-of select="boolean($dictionary/epl:Object[#index = $index]/epl:SubObject[#subIndex = $subIndex]/#defaultValue)"/>
</xsl:otherwise>
</xsl:choose>
</xsl:variable>
Is this possible in XSLT/XPath 1.0? $index and $subIndex are calculated from the #index and #subIndex attributes of the context node. I want to load the defaultValue attribute from the external XML-Document which has an equal index and/or subIndex.
Is it possible to use variables in predicates in XPath 1.0? This works in XPath 2.0.
Regarding the incorrect variable assignment, I don't believe in a parser (Xalan) issue anymore, since PHPs XSLTProcessor does the same. It must be an issue in the variable declaration...
This only answers the last part of the question, but it's getting too unwieldy for comments...
I have the following variable declaration which always yields true when used as the test of an xsl:if or xsl:when:
<xsl:variable name="expectedDefaultValueExists">
<xsl:choose>
<xsl:when test="#index">
<xsl:value-of select="boolean($dictionary/epl:Object[#index = $index]/#defaultValue)"/>
</xsl:when>
<xsl:otherwise>
<xsl:value-of select="boolean($dictionary/epl:Object[#index = $index]/epl:SubObject[#subIndex = $subIndex]/#defaultValue)"/>
</xsl:otherwise>
</xsl:choose>
</xsl:variable>
In XSLT 1.0 a variable with a body rather than a select always becomes a "result tree fragment", in this case with a single text node child that will contain the string "true" or "false" as appropriate. Any non-empty RTF is considered true when converted to boolean.
In XSLT 2.0 it's a similar story - 2.0 doesn't distinguish between node sets and result tree fragments, but still the variable will be a "temporary tree" with a single text node child whose value is the string "true" or "false", and both these trees are true when converted to boolean. If you want to get an actual boolean value out of the variable then you need to change two things - add as="xs:boolean" to the variable declaration and use xsl:sequence instead of xsl:value-of:
<xsl:variable name="expectedDefaultValueExists" as="xs:boolean">
<xsl:choose>
<xsl:when test="#index">
<xsl:sequence select="boolean($dictionary/epl:Object[#index = $index]/#defaultValue)"/>
</xsl:when>
<xsl:otherwise>
<xsl:sequence select="boolean($dictionary/epl:Object[#index = $index]/epl:SubObject[#subIndex = $subIndex]/#defaultValue)"/>
</xsl:otherwise>
</xsl:choose>
</xsl:variable>
The xsl:value-of instruction converts the result of its select into a string and constructs a text node containing that string. The xsl:sequence instruction simply returns the value from the select directly as whatever type it happens to be.
But there are simpler ways to achieve the same thing. In XPath 2.0 you can do if/then/else constructs directly in the XPath
<xsl:variable name="expectedDefaultValueExists"
select="if (#index)
then $dictionary/epl:Object[#index = $index]/#defaultValue
else $dictionary/epl:Object[#index = $index]/epl:SubObject[#subIndex = $subIndex]/#defaultValue" />
In 1.0 you need to be slightly more creative
<xsl:variable name="expectedDefaultValueExists"
select="(#index and $dictionary/epl:Object[#index = $index]/#defaultValue)
or (not(#index) and $dictionary/epl:Object[#index = $index]/epl:SubObject[#subIndex = $subIndex]/#defaultValue)" />

Validate XML against XSD and enrich with validation results

I'm building a java tool to validate an xml document and build an html report containing the input data and validation results.
I think that a possible way is:
validate XML with XSD
enrich the XML with validation results
transform the enriched XML in the final HTML report (this point is not object of the question)
First and foremost, is this a valid approach? Or there are more suitable ways to get those things done in java?
If this is is a viable solution, how can i implement step 2?
For example, if I start from this input document:
<parent>
<child id="a correct id" type="a correct type"/>
<child id="an incorrect id" type="an incorrect type"/>
</parent>
How can I produce an enriched output document like that:
<parent>
<child id="a correct id" type="a correct type">
<results>
<result>id is correct</result>
<result>type is correct</result>
</results>
</child>
<child id="an incorrect id" type="an incorrect type">
<results>
<result>id is NOT correct</result>
<result>type is NOT correct</result>
</results>
</child>
</parent>
First, there are many ways of going about this. There are other tools like schematron that provide languages for describing validation results, and the ability to transform the results of validation into pretty HTML. There are numerous java packages that actually do schema validation, so most of what you're trying to accomplish should be "glue code". Make sure you don't attempt to do schema validation in your java code.
So next, I'm not sure what your requirements are for wanting to transform the original XML file after validation. Usually you'd dump a validation result set as a separate file. Does the schema for the original XML permit your additions that you're putting in?
In general, if you wanted to transform the original input, you could go about this by writing an XSLT program that takes the validation results file, and the original source file, and then transforms the original file using those validation results. But I don't recommend that because I think your situation might call for a different design that doesn't transform the original file, unless you have more requirements you want to go into more depth about.
Another option would be straightforward DOM manipulation. After validation, you could load the DOM for the input document, manipulate it, then write it back to the same original file.
But seriously -- before you adopt any approach for step 2, make sure that your requirements really call for it.
One approach worth exploring: Xerces-J provides access to the post-schema-validation infoset (PSVI), and can in fact serialize it as XML. For small documents, at least, you may find that XML representation of the PSVI suffices for your purposes.
The PSVI representation made available by Xerces-J (and by xsv) is not, it should be said, anything like an annotated copy of the input. But it can be transformed into a form like the one you show using normal XML processing.
I'm returning to this question after I got some deeper understanding and experience of XSD and XSLT and eventually built my project on that knowledge.
My original question had some misleading points.
My objective was to process an XML with the ONLY objective to produce a corresponding HTML report, containing the XML data in a readable form along with the "validation" results against a set of business rules.
My wrong assumption was that I should necessarily have to validate the XML against an XSD, but that brought me some significant challenges in development:
XSD 1.0 did not fit completely my business rules. So I had to switch to XSD 1.1
then I had to set up an XSD 1.1 compliant validator in java (Xerces2J)
and finally I started thinking on how to build a fine-grained validator based on those premises.
It was at that time that I realized that this process was a little overkill: what I really needed was just a TRANSFORMATION from XML to HTML: the fine-grained validation could have been done inside the transformation process, all together with an XSLT.
To answer my own question in the most generic form:
I would still use XSD for a very basic preliminary validation, and then use XSLT to check for more complex validation rules and enrich the XML.
This is the XSLT to transform the source xml of my own question into the result (still XML).
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="parent">
<xsl:for-each select="child">
<xsl:call-template name="processChildren"/>
</xsl:for-each>
</xsl:template>
<!-- this template processes the children nodes, applying a sample test clause -->
<xsl:template name="processChildren">
<xsl:copy>
<xsl:apply-templates select="#*|node()" />
<results>
<xsl:choose>
<xsl:when test="contains(#id,'incorrect')">
<result>id is NOT correct</result>
</xsl:when>
<xsl:otherwise>
<result>id is correct</result>
</xsl:otherwise>
</xsl:choose>
<xsl:choose>
<xsl:when test="contains(#type,'incorrect')">
<result>type is NOT correct</result>
</xsl:when>
<xsl:otherwise>
<result>type is correct</result>
</xsl:otherwise>
</xsl:choose>
</results>
</xsl:copy>
</xsl:template>
<!-- this template copy the contents of the node unaltered -->
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>

unclosed html tag inside xslt

I'd like to have unclosed html tag as a result of xslt. I'll add closing tag later in xslt. How can I achieve this? This one doesn't compile:
<xsl:when test="$href">
<xsl:text><a href='{$href}'></xsl:text>
</xsl:when>
Thanx
This is the kind of thing that you probably should want to avoid at all costs. I do not know your requirements but you perhaps want a link or a span tag based on something.
In these instances you can use something like this
<xsl:apply-templates select="tag"/>
then 2 templates ie
<xsl:template match="tag">
<span>hello king dave</span>
</xsl:template>
<xsl:template match="tag[#href]">
link text....
</xsl:template>
It's hard to give a definite answer without a better idea of the precise use case, but it's worth noting that you can use match and name on the same <xsl:template>. For example, if you want to produce some particular output for all <tag> elements, but also wrap this output in an <a> tag in certain cases, then you could use an idiom like
<xsl:template match="tag[#href]">
<xsl:call-template name="tagbody" />
</xsl:template>
<xsl:template match="tag" name="tagbody">
Tag content was "<xsl:value-of select="."/>"
</xsl:template>
The idea here is that tag elements with an href will match the first template, which does some additional processing before and after calling the general tag template. Tags without an href will just hit the normal template without the wrapping logic. I.e. for an input like
<root>
<tag>foo</tag>
<tag href="#">bar</tag>
</root>
you would get an output like
Tag content was "foo"
Tag content was "bar"
I had the same problem before and was only able to solve it by copying the entire <a href='{$href}'>...</a> for each when branch.
Maybe you could try setting the doctype of your XSL to some loose XML standard, but afaik XSLT is pretty strict.
Edit: apparently you can set the doctype with a <xsl:output> tag.
Found solution on the net:
<xsl:text disable-output-escaping="yes"><![CDATA[<a href=']]></xsl:text>
<xsl:value-of select="href"/>
<xsl:text disable-output-escaping="yes"><![CDATA['>]]></xsl:text>

Formatted HTML as output from method invocation from MX4J HTTP page

I have a huge set of data and want to display the data with some formatting.
This is what the method basically looks like:
#ManagedOperation(description = "return html")
#ManagedOperationParameters({#ManagedOperationParameter(name = "someVal", description = "text")})
public String returnAsHtml(String someVal)
{
return "some formatted xml";
}
Looks like XSLTProcessor can be configured to use a XSLT template. However I could not find any examples on the internet using XSLT for html transformation in the context of MX4J. Could any one provide a sample XSLT template?
In case anyone comes back to this question, two things come to mind:
1) MX4J has several default implementations of HttpCommandProcessorAdaptor. These operations are mapped from the path. For JMX operations (aka ManagedOperation in Spring parlance), MX4J uses URLs like /invoke?operation=returnAsHtml
This will be passed to the InvokeOperationCommandProcessor to create an XML document with the result being just the toString() of whatever you returned, in an attribute called 'return'. It also passes back the return type in an attribute called 'returnclass'. You can see all this if you just add &template=identity to the invoke URL.
I mention all this because one option is to implement your own 'invoke.xsl'. The one in MX4J just calls the renderobject template:
Lo and behold, you find this in mbean_attributes.xsl, with a comment showing you exactly what you need to do:
<xsl:template name="renderobject">
<xsl:param name="objectclass"/>
<xsl:param name="objectvalue"/>
<xsl:choose>
<xsl:when test="$objectclass='javax.management.ObjectName'">
<xsl:variable name="name_encoded">
<xsl:call-template name="uri-encode">
<xsl:with-param name="uri">
<xsl:value-of select="$objectvalue"/>
</xsl:with-param>
</xsl:call-template>
</xsl:variable>
<a href="/mbean?objectname={$name_encoded}">
<xsl:value-of select="$objectvalue"/>
</a>
</xsl:when>
<xsl:otherwise>
<!-- Use the following line when the result of an invocation
returns e.g. HTML or XML data
<xsl:value-of select="$objectvalue" disable-output-escaping="true" />
-->
<xsl:value-of select="$objectvalue"/>
</xsl:otherwise>
</xsl:choose>
Setting 'disable-output-escaping' to true will do the trick
2) Another option is to write your own HttpCommandProcessorAdaptor, and set it on the HttpAdapter. This could either replace the 'invoke' processor, or you could have an entirely new one.
Hope that helps
One way I figured out is to use java script in the XSL template to extract and parse the string. Make sure you test for the browser (IE vs Non IE) and use proper parser.

Categories