I'm working with a DSL based on an XML schema that supports functional language features such as loops, variable state with context, and calls to external Java classes. I'd like to write a tool which takes the XML document and converts it to, at the very least, something that looks like Java, where the <set> tags get converted to variable assignments, loops get converted to for loops, and so on.
I've been looking into ANTLR as well as standard XML parsers, and I'm wondering whether there's a recommended way to go about this. Can such an XML document be converted to something that's convertable to Java, if not directly?
I'm willing to write the parsing through SAX that writes an intermediate language based on each tag, if that's the recommended way, but the part that's giving me pause is the fact that it's context-based in the same way a language like Scheme is, with child elements of any tag being fully evaluated before the parent.
You can do it with XSLT. Then just use to generate the code snippets you need.
(remember to set the output format to plain text)
EDIT: Sample XSLT script
Input - a.xml:
<?xml version="1.0" encoding="ISO-8859-1"?>
<?xml-stylesheet type="text/xsl" href="b.xsl"?>
<set name='myVar'>
<concat>
<s>newText_</s>
<ref>otherVar</ref>
</concat>
</set>
Script - b.xsl:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:strip-space elements="*"/>
<xsl:output method="text" />
<xsl:template match="set">
<xsl:value-of select="#name"/>=<xsl:apply-templates/>
</xsl:template>
<xsl:template match="concat">
<xsl:for-each select="*">
<xsl:if test="position() > 1">+</xsl:if>
<xsl:apply-templates select="."/>
</xsl:for-each>
</xsl:template>
<xsl:template match="ref">
<xsl:apply-templates/>
</xsl:template>
<xsl:template match="s">
<xsl:text>"</xsl:text>
<xsl:apply-templates/>
<xsl:text>"</xsl:text>
</xsl:template>
</xsl:stylesheet>
Note that a.xml contain an instruction that will let XSLT-capable browsers render it with the stylesheet b.xsl. Firefox is such a browser. Open a.xml in firefox and you will see
myVar="newText_"+otherVar
Note that XSLT is a quite capable programming language, so there is a lot you can do.
Related
I want to do the following :
At this moment we receive some xml-files where some xml-tags are filled wrongly.
To help our partner, we want to catch these false values by using a "Pass-through" folder where all the xml-files are placed before importing in our application.
This folder would be read every X minutes and for every file there will need to be done some checks, like : The length of the value within a tag, the value of the tag, etc.
Because this is only a temporary solution, we don't want to implement it in our application.
I was thinking of 2 possible set-ups :
Using java and calling an XSLT-file to transform every file and put it in another folder
Using only java to check the xml-file and do the transformation.
Both of the cases would be called by a .bat that runs every X minutes.
Now my questions :
What do you think that would be the best solution? a.k.a. the quickest, the most secure, etc. (maybe something other than suggested?)
Could you also provide me some examples of the way to do something like this?
I'm not like other persons who ask strictly for the codes. If you can give me something similar, I can make it on my own.
At the time of this writing, I'm already looking for solutions on other websites, but because it is urgent, it's also helpfull to ask the community.
Thank you for your answer,
Kind regards,
Maarten
EDIT : Both answers helped me a lot. Thank you guys.
http://www.ibm.com/developerworks/xml/library/x-javaxmlvalidapi/index.html
or
http://www.java-tips.org/java-se-tips/javax.xml.validation/how-to-create-xml-validator-from-xml-s.html
or
http://docs.oracle.com/javase/1.5.0/docs/api/javax/xml/validation/package-summary.html
If you want to run your XSLT, using a .bat script, on every XML file in a given folder (your first option in the OP) I can think of 3 ways:
A. Basically do a "for" loop to process each individual file via the command line. (Eww.)
B. Use collection() to point to an input folder and use xsl:result-document to create the output files in a new folder.
Here's an example XSLT 2.0 (tested with Saxon 9):
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:param name="pInputDir" select="'input'"/>
<xsl:param name="pOutputDir" select="'output'"/>
<xsl:variable name="vCollection" select="collection(concat($pInputDir,'/?*.xml'))"/>
<xsl:template match="node()|#*">
<xsl:copy>
<xsl:apply-templates select="node()|#*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="/">
<xsl:for-each select="$vCollection">
<xsl:variable name="vOutFile" select="tokenize(document-uri(document(.)),'/')[last()]"/>
<xsl:result-document href="{concat($pOutputDir,'/',$vOutFile)}">
<xsl:apply-templates/>
</xsl:result-document>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
Notes:
This stylesheet is just doing an identity transform. It's passing the XML through unchanged. You would need to override the identity template by adding new templates to do your checks/changes.
Also notice that there are 2 parameters for the input and output folder names.
You may run into memory issues using collection() because it loads all of the XML files in the folder into memory. If this is an issue, see below...
C. Have your XSLT process a list of all the files in the directory. Use a combination of document() and the Saxon extension function saxon:discard-document() to load and discard the documents.
Here's an example I used a while back for testing.
XML file listing (input to the XSLT):
<files>
<file>file:///C:/input_xml/file1.xml</file>
<file>file:///C:/input_xml/file2.xml</file>
<file>file:///C:/input_xml/file3.xml</file>
<file>file:///C:/input_xml/file4.xml</file>
<file>file:///C:/input_xml/file5.xml</file>
<file>file:///C:/input_xml/file6.xml</file>
<file>file:///C:/input_xml/file7.xml</file>
<file>file:///C:/input_xml/file8.xml</file>
<file>file:///C:/input_xml/file9.xml</file>
<file>file:///C:/input_xml/file10.xml</file>
</files>
XSLT 2.0 (tested with Saxon 9):
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:param name="pOutputDir" select="'output'"/>
<xsl:template match="node()|#*">
<xsl:copy>
<xsl:apply-templates select="node()|#*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="files">
<xsl:apply-templates/>
</xsl:template>
<xsl:template match="file">
<xsl:variable name="vOutFile" select="tokenize(document-uri(document(.)),'/')[last()]"/>
<xsl:result-document href="{concat($pOutputDir,$vOutFile)}">
<xsl:apply-templates select="document(.)/saxon:discard-document(.)" xmlns:saxon="http://saxon.sf.net/"/>
</xsl:result-document>
</xsl:template>
</xsl:stylesheet>
Notes:
Again, this stylesheet is just doing an identity transform. It's passing the XML through unchanged. You would need to override the identity template by adding new templates to do your checks/changes.
Also notice that there is only a parameter for the output folder name.
I'm trying to validate an XML file, but I get the following error:
Can not find declaration of element
'xsl:stylesheet'.
This is the XML:
<?xml version='1.0' encoding='utf-8'?>
<xsl:stylesheet version='1.0' xmlns:xsl='http://www.w3.org/1999/XSL/Transform' xmlns:msxsl='urn:schemas-microsoft-com:xslt' exclude-result-prefixes='msxsl' xmlns:ns='http://www.ibm.com/wsla'>
<xsl:strip-space elements='*'/>
<xsl:output method='xml' indent='yes'/>
<xsl:template match='#* | node()'>
<xsl:copy>
<xsl:apply-templates select='#* | node()'/>
</xsl:copy>
</xsl:template>
<xsl:template match="/ns:SLA/ns:ServiceDefinition/ns:WSDLSOAPOperation/ns:SLAParameter/#name[.='TotalMemoryConsumption']">
<xsl:attribute name='{name()}'>
<xsl:text>MemConsumption</xsl:text>
</xsl:attribute>
</xsl:template>
</xsl:stylesheet>
Where is the mistake?
EDIT: I want to parse this XML in Java with SAX, but I get the following error:
Element type "xsl:template" must be followed by either attribute specifications, ">" or "/>".
How to get rid of it?
Assuming you are actually trying to validate your XSL as an XML document, it looks like that website requires you to point to a schema or DTD in order to validate the XML against it. You can get a non-normative schema here: http://www.w3.org/TR/xslt20/#schema-for-xslt. Here's instructions on how to reference a schema from an XML file: http://www.ibm.com/developerworks/xml/library/x-tipsch.html
You could also check "Well-Formedness only," and check the document for well-formedness, if not actually validity.
Generally, any XSL engine will report any errors in your XSL document, so you don't need to validate it separately.
Your XSL is OK, don't worry. Just that there is no DTD/XSD for XSLs 1.0. no one bothers checking XSLT stylesheets (1.0) for validity. "Wellformedness" is enough.
This question is a follow up to my earlier question:
Creating a valid XSD that is open using <all> and <any> elements
Given that I have a Java String containing an XML document of the following form:
<TRADE>
<TIME>12:12</TIME>
<MJELLO>12345</MJELLO>
<OPTIONAL>12:12</OPTIONAL>
<DATE>25-10-2011</DATE>
<HELLO>hello should be ignored</HELLO>
</TRADE>
How can I use XSLT or similar (in Java by using JAXB) to remove all elements not contained in a set of elements.
In the above example I am only interested in (TIME, OPTIONAL, DATE), so I would like to transform it into:
<TRADE>
<TIME>12:12</TIME>
<OPTIONAL>12:12</OPTIONAL>
<DATE>25-10-2011</DATE>
</TRADE>
The order of the elements is not fixed.
This transformation:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:param name="pNames" select="'|TIME|OPTIONAL|DATE|'"/>
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="node()|#*" name="identity">
<xsl:copy>
<xsl:apply-templates select="node()|#*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="*/*">
<xsl:if test="contains($pNames, concat('|', name(), '|'))">
<xsl:call-template name="identity"/>
</xsl:if>
</xsl:template>
</xsl:stylesheet>
when applied on the provided XML document:
<TRADE>
<TIME>12:12</TIME>
<MJELLO>12345</MJELLO>
<OPTIONAL>12:12</OPTIONAL>
<DATE>25-10-2011</DATE>
<HELLO>hello should be ignored</HELLO>
</TRADE>
produces the wanted, correct result:
<TRADE>
<TIME>12:12</TIME>
<OPTIONAL>12:12</OPTIONAL>
<DATE>25-10-2011</DATE>
</TRADE>
Explanation:
The identity rule (template) copies every node "as-is".
The identity rule is overridden by a template matching any element that is not the top element of the document. Inside the template a check is made if the name of the matched element is one of the names specified in the external parameter $pNames in a pipe-delimited string of wanted names.
See the documentation of your XSLT processor on how to pass a parameter to a transformation -- this is implementation-dependent and differs from processor to processor.
I haven't tried yet, but maybe the javax.xml.tranform package can help:
http://download.oracle.com/javase/6/docs/api/javax/xml/transform/package-summary.html
JAXB & XSLT
JAXB integrates very cleanly with XSLT for an example see:
How to get jaxb to Ignore certain data during unmarshalling
Your Other Question
Based on your previous question (see link below), the transform is really unnecessary as JAXB will just ignore attributes and elements that are not mapped to fields/properties in your domain object.
Creating a valid XSD that is open using <all> and <any> elements
I am having a certian issue with special characters in my XML.
Bascially I am splitting up an xml into multiple xmls using Xalan Processor.
When splitting the documents up I am using their value of the name tag as the name of the file generated. The problem is that the name contains characters that arent recognized by the XML processor like ™ (TM) and ® (R). I want to remove those characters ONLY when naming the files.
<xsl:template match="products">
<redirect:write select="concat('..\\xml\\product\\en\\',translate(string(name),'</> ',''),'.xml')">
The above is the XSL code I have writter to split the XML into multlpe XMLs. As you can see I am using hte translate method to subtitute '/','<','>' with '' from the name. I was hoping I could do the same with ™ (TM) and ® (R) but it doesnt seem to work.
Please advice me how I would be able to do that.
Thanks for you help in advance.
I don't have Xalan, but with 8 other XSLT processors this thransformation:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text"/>
<xsl:template match="text()">
<xsl:value-of select="translate(., '</>™®', '')"/>
===================
<xsl:value-of select="translate(., '</>™®', '')"/>
</xsl:template>
</xsl:stylesheet>
when applied on this XML document:
<t>XXX™ My Trademark®</t>
produces the wanted result:
XXX My Trademark
===================
XXX My Trademark
I suggest that you try to use one of the two expressions above -- at least the second may work successfully.
Following Dimitre answer, I think that if you are not sure about wich special character could be in name, maybe you should keep what you consider legal document's name characters.
As example:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text"/>
<xsl:template match="text()">
<xsl:value-of select="translate(.,
translate(.,
'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ ',
''),
'')"/>
</xsl:template>
</xsl:stylesheet>
With input:
<t>XXX™ My > Trademark®</t>
Result:
XXX My Trademark
Say I have a very simple XML with an empty tag 'B':
<Root>
<A>foo</A>
<B></B>
<C>bar</C>
</Root>
I'm currently using XSLT to remove a few tags, like 'C' for example:
<?xml version="1.0" ?>
<xsl:stylesheet version="2.0" xmlns="http://www.w3.org/1999/XSL/Transform" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="no" encoding="utf-8" omit-xml-declaration="yes" />
<xsl:template match="*">
<xsl:copy>
<xsl:copy-of select="#*" />
<xsl:apply-templates />
</xsl:copy>
</xsl:template>
<xsl:template match="C" />
</xsl:stylesheet>
So far OK, but the problem is I end up having an output like this:
<Root>
<A>foo</A>
<B/>
</Root>
when I actually really want:
<Root>
<A>foo</A>
<B></B>
</Root>
Is there a way to prevent 'B' from collapsing?
Thanks.
Ok, so here what worked for me:
<xsl:output method="html">
Try this:
<script type="..." src="..."> </script>
Your HTML output will be:
<script type="..." src="..."> </script>
The prevents the collapsing but translates to a blank space. It's worked for me in the past.
There is no standard way, as they are equivalent; You might be able to find an XSLT engine that has an option for this behaviour, but I'm not aware of any.
If you're passing this to a third party that cannot accept empty tags using this syntax, then you may have to post-process the output yourself (or convince the third party to fix their XML parsing)
It is up to the XSLT engine to decide how the XML tag is rendered, because a parser should see no difference between the two variations. However, when outputting HTML this is a common problem (for <textarea> and <script> tags for example.) The simplest (but ugly) solution is to add a single whitespace inside the tag (this does change the meaning of the tag slightly though.)
This has been a long time issue and I finally made it work with a simple solution.
Add <xsl:text/> if you have a space character. I added a space in my helper class.
<xsl:choose>
<xsl:when test="$textAreaValue=' '">
<xsl:text/>
</xsl:when>
<xsl:otherwise>
<xsl:value-of select="$textAreaValue"/>
</xsl:otherwise>
</xsl:choose>
They are NOT always equivalent. Many browsers can't deal with <script type="..." src="..." /> and want a separate closing tag. I ran into this problem while using xml/xsl with PHP. Output "html" didn't work, I'm still looking for a solution.
No. The 2 are syntactically identical, so you shouldn't have to worry
It should not be a problem if it is or . However if you are using another tool which expects empty XML tags as way only, then you have a problem. A not very elegant way to do this will be adding a space between staring and ending 'B' tags through XSLT code.
<xsl:text disable-output-escaping="yes">
<![CDATA[<div></div>]]>
</xsl:text>
This works fine with C#'s XslCompiledTransform class with .Net 2.0, but may very well fail almost anywhere else. Do not use unless you are programmatically doing the transofrm yourself; it is not portable at all.
It's 7 years late, but for future readers I will buck the trend here and propose an actual solution to the original question. A solution that does not modify the original with spaces or the output directive.
The idea was to use an empty variable to trick the parser.
If you only want to do it just for one tag B, my first thought was to use something like this to attach a dummy variable.
<xsl:variable name="dummyempty" select="''"/>
<xsl:template match="B">
<xsl:copy>
<xsl:apply-templates select="#*" />
<xsl:value-of select="concat(., $dummyempty)"/>
</xsl:copy>
</xsl:template>
But I found that in fact, even the dummy variable is not necessary. This preserved empty tags, at least when tested with xsltproc in linux :
<xsl:template match="B">
<xsl:copy>
<xsl:apply-templates select="#*" />
<xsl:value-of select="."/>
</xsl:copy>
</xsl:template>
For a more generic solution to handle ALL empty tags, try this:
<xsl:variable name="dummyempty" select="''"/>
<xsl:template match="*[. = '']">
<xsl:copy>
<xsl:apply-templates select="node()|#*" />
<xsl:value-of select="$dummyempty"/>
</xsl:copy>
</xsl:template>
Again, depending on how smart your parser is, you may not even need the dummy variable.