Most performative way to go through an XML transformation - Java - java

I'm new with java, and I want an opinion for the community.
I Have a huge XML, that contains a lot of information. Actually, this XML has approximately 140Mb of information.
In this XML I have a lot of information that is no more valid, so I need to do filter and use only the valid one, to check this I need to cross information between node, to check if deletion is needed or not. In some cases, the entire father(main) node needs to be deleted.
I'm already doing it with dom parse, using loops, inside the loops I save in variables and cross the information to check, and delete the actual node or the entire father node.
Basically, the structure is like this:
<source>
<main>
<id>98567</id>
<block_information>
<name>Block A</name>
<start_date>20120210</start_date>
<end_date>20150210</end_date>
</block_information>
<block_information>
<name>Block A.01</name>
<start_date>20150210</start_date>
<end_date>20251005</end_date>
</block_information>
<city_information>
<name>Manchester</name>
<start_date>20150210</start_date>
<end_date>20150212</end_date>
</city_information>
<city_information>
<name>New Manchester</name>
<start_date>20150212</start_date>
<end_date>20251005</end_date>
</city_information>
<phone>
<type>C</type>
<number>987466321</number>
<name></name>
</phone>
<phone>
<type>P</type>
<number>36547821</number>
<name></name>
</phone>
</main>
<main>
<id>19587</id>
<block_information>
<name>Che</name>
<start_date>20090210</start_date>
<end_date>20100210</end_date>
</block_information>
<block_information>
<name></name>
<start_date>20100210</start_date>
<end_date>20351005</end_date>
</block_information>
<city_information>
<name></name>
<start_date>20150210</start_date>
<end_date>20150212</end_date>
</city_information>
<city_information>
<name>No Name</name>
<start_date>20150212</start_date>
<end_date>20191005</end_date>
</city_information>
<phone>
<type>C</type>
<number>987466321</number>
<name>Mom</name>
</phone>
<phone>
<type>P</type>
<number>36547821</number>
<name></name>
</phone>
</main>
</source>
The output is like this:
<result>
<main>
<id>98567</id>
<block_name>Block A.01</block_name>
<city_name>New Manchester</city_name>
<cellphone></cellphone>
<phone>36547821</phone>
<contact_phone></contact_phone>
<contact_phone_name></contact_phone_name>
</main>
</result>
For the information go out in result, is mandatory that there is one <block_information> and <city_information> valid (<start_date> less than actual date and <end_date> bigger than actual date), and the <name...> is needed for both.
If there is none, or more than one valid, the <main> will be deleted.
For the phone number, <type> ['C' is for contact, 'P' for personal phone, 'M' for mobile]. So if the <type> is 'C' but there is no value in <name> the phone do not go to result. 'P' go to <phone> and 'M' go to <cellphone>.
I want your considerations on what is the best way to do that in the most performative way, and to anyone can do adjustment before in an easy way if it's needed.
thanks in advance for the inputs!
as asked by #kjhughes, I put some values on the sample XML, and some filters that I need to do. Thanks!
ps.: the XML structure used as an example is TOO simple compared to the actual one, there are a lot more complex types.

I would go with the following approach:
find a library that lets you stream the xml (file or inputsream) and produce a Stream<Main>
process the Stream<Main> and filter each Main node according to your validation logic
depending if you are I/O or CPU bottlenecked use a .parallel() stream to process the stream (read: test if .parallel() helps you in any way)
This will suffice for any sane performance requirements in the context of XML parsing (I guess?). Google for Java XML Stream and go from there (or maybe this stackoverflow question can give some pointers)

XSLT is a transformation language existing since 1999 which has now three versions, 1.0, 2.0, and 3.0, the latest version published as W3C recommendation in 2017 and supported on the Java platform by Saxon 9.8 and later, available in the open-source HE edition on Sourceforge and Maven. The use of XSLT 1 is supported in the Oracle/Sun Java JRE by incorporating Apache Xalan.
So instead of using DOM you have the option to use XSLT, here is an example using XSLT 3 (online at https://xsltfiddle.liberty-development.net/bFN1yab/0):
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:mf="http://example.com/mf"
exclude-result-prefixes="#all"
version="3.0">
<xsl:output indent="yes"/>
<xsl:function name="mf:date" as="xs:date">
<xsl:param name="input-date" as="xs:string"/>
<xsl:sequence
select="xs:date(replace($input-date, '([0-9]{4})([0-9]{2})([0-9]{2})', '$1-$2-$3'))"/>
</xsl:function>
<xsl:function name="mf:select-valid-info" as="element()*">
<xsl:param name="infos" as="element()*"/>
<xsl:sequence
select="$infos[name/normalize-space()
and mf:date(start_date) lt current-date()
and mf:date(end_date) gt current-date()]"/>
</xsl:function>
<xsl:function name="mf:valid-main" as="xs:boolean">
<xsl:param name="main" as="element(main)"/>
<xsl:sequence
select="let $valid-blocks := mf:select-valid-info($main/block_information),
$valid-cities := mf:select-valid-info($main/city_information)
return count($valid-blocks) eq 1 and count($valid-cities) eq 1"/>
</xsl:function>
<xsl:mode on-no-match="shallow-copy"/>
<xsl:template match="main[not(mf:valid-main(.))]"/>
<xsl:template match="main[mf:valid-main(.)]">
<xsl:copy>
<xsl:apply-templates
select="id,
mf:select-valid-info(block_information)/name,
mf:select-valid-info(city_information)/name,
phone"/>
</xsl:copy>
</xsl:template>
<xsl:template match="block_information/name | city_information/name">
<xsl:element name="{substring-before(local-name(..), '_')}_name">
<xsl:value-of select="."/>
</xsl:element>
</xsl:template>
<xsl:template match="main/phone[type = 'C']">
<contact_phone>
<xsl:value-of select="number[current()/normalize-space(name)]"/>
</contact_phone>
<contact_name>
<xsl:value-of select="name"/>
</contact_name>
</xsl:template>
<xsl:template match="main/phone[type = 'P']">
<phone>
<xsl:value-of select="number"/>
</phone>
</xsl:template>
<xsl:template match="main/phone[type = 'M']">
<cellphone>
<xsl:value-of select="number"/>
</cellphone>
</xsl:template>
</xsl:stylesheet>
I hope I have grasped the conditions for the main elements, I have not been able to quite understand the rules for the various phone data, but the code is meant as an example anyway.
Of course performance depends very much on the implementation but I think that XSLT is a more structured and maintainable way than doing DOM coding.
If you can afford it you can also look into Saxon 9.8 or 9.9 EE which supports streaming XSLT 3 where, with some rewrites of above code, you could have an XSLT based approach to stream forwards only through the huge document, materializing main elements as element nodes you transform while keeping the memory footprint low as that approach, in comparison to DOM or normal XSLT processing, doesn't parse the whole XML document first into a complete in-memory tree structure:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:mf="http://example.com/mf"
exclude-result-prefixes="#all"
version="3.0">
<xsl:mode streamable="yes" on-no-match="shallow-copy"/>
<xsl:template match="source">
<xsl:copy>
<xsl:apply-templates select="main!copy-of()" mode="main"/>
</xsl:copy>
</xsl:template>
<xsl:output indent="yes"/>
<xsl:function name="mf:date" as="xs:date">
<xsl:param name="input-date" as="xs:string"/>
<xsl:sequence
select="xs:date(replace($input-date, '([0-9]{4})([0-9]{2})([0-9]{2})', '$1-$2-$3'))"/>
</xsl:function>
<xsl:function name="mf:select-valid-info" as="element()*">
<xsl:param name="infos" as="element()*"/>
<xsl:sequence
select="$infos[name/normalize-space()
and mf:date(start_date) lt current-date()
and mf:date(end_date) gt current-date()]"/>
</xsl:function>
<xsl:function name="mf:valid-main" as="xs:boolean">
<xsl:param name="main" as="element(main)"/>
<xsl:sequence
select="let $valid-blocks := mf:select-valid-info($main/block_information),
$valid-cities := mf:select-valid-info($main/city_information)
return count($valid-blocks) eq 1 and count($valid-cities) eq 1"/>
</xsl:function>
<xsl:mode name="main" on-no-match="shallow-copy"/>
<xsl:template match="main[not(mf:valid-main(.))]" mode="main"/>
<xsl:template match="main[mf:valid-main(.)]" mode="main">
<xsl:copy>
<xsl:apply-templates
select="id,
mf:select-valid-info(block_information)/name,
mf:select-valid-info(city_information)/name,
phone" mode="#current"/>
</xsl:copy>
</xsl:template>
<xsl:template match="block_information/name | city_information/name" mode="main">
<xsl:element name="{substring-before(local-name(..), '_')}_name">
<xsl:value-of select="."/>
</xsl:element>
</xsl:template>
<xsl:template match="main/phone[type = 'C']" mode="main">
<contact_phone>
<xsl:value-of select="number[current()/normalize-space(name)]"/>
</contact_phone>
<contact_name>
<xsl:value-of select="name"/>
</contact_name>
</xsl:template>
<xsl:template match="main/phone[type = 'P']" mode="main">
<phone>
<xsl:value-of select="number"/>
</phone>
</xsl:template>
<xsl:template match="main/phone[type = 'M']" mode="main">
<cellphone>
<xsl:value-of select="number"/>
</cellphone>
</xsl:template>
</xsl:stylesheet>

Related

How to update a range of an XMl file tag?

My problem is I don't know How to update a XML file. In the following XML file I want to include some tags inside another tag which are already exist in the file.
**My XML file is as following: **
<?xml version="1.0" encoding="UTF-8"?>
<root>
<PayrunDetails>
<PayrunNumber>000777</PayrunNumber>
</PayrunDetails>
<PayLocation>
<LocationCode>ACT</LocationCode>
<LocationDescription>ACT</LocationDescription>
<CompanyDetails>
<CName>APPLE Limited</CName>
<Payslip>
<StaffNumber>12345</StaffNumber>
<PayDetails>
<AmountGross>9999</AmountGross>
<ComponentDetails>
<ComponentType>SALARY</ComponentType>
<Amount>1999</Amount>
<YTDAmount>10616</YTDAmount>
</ComponentDetails>
<ComponentDetails>
<ComponentType>SALARY</ComponentType>
<Amount>7305</Amount>
<YTDAmount>76703</YTDAmount>
</ComponentDetails>
</PayDetails>
</Payslip>
</CompanyDetails>
</PayLocation>
</root>
My desired output file is as following:
<?xml version="1.0" encoding="UTF-8"?>
<root>
<PayrunDetails>
<PayrunNumber>000777</PayrunNumber>
</PayrunDetails>
<PayLocation>
<LocationCode>ACT</LocationCode>
<LocationDescription>ACT</LocationDescription>
<CompanyDetails>
<CName>APPLE Limited</CName>
<Payslip>
<StaffNumber>12345</StaffNumber>
<PayDetails>
<AmountGross>9999</AmountGross>
<ComponentDetails>
<ComponentType ID="SALARY">
<Amount>1999</Amount>
<YTDAmount>10616</YTDAmount>
</ComponentType>
</ComponentDetails>
<ComponentDetails>
<ComponentType ID="TAX">
<Amount>7305</Amount>
<YTDAmount>76703</YTDAmount>
</ComponentType>
</ComponentDetails>
</PayDetails>
</Payslip>
</CompanyDetails>
</PayLocation>
</root>
In the above desired file you will find that ComponentType tag has included the rest of the tags exist inside the ComponentDetails tag.
For the above said problem I want to use XSLT but I don't know what code should I write to get the solution.
I'm fairly new to XSLT so please excuse the potential novice question. Any guidance would be appreciated here.
Thanks in advance.
First read up on the identity transform in XSLT, which involves this template
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
(If you could use XSLT 3.0, you could just write <xsl:mode on-no-match="shallow-copy"/> instead)
This will copy across all your nodes and attributes as-is, which in your case gets you almost there.
There are a number of ways you could the transform of the nodes you want. One way is to match the ComponentDetails tag, to create a new ComponentType in the output, along with code to select the other child nodes.
<xsl:template match="ComponentDetails">
<xsl:copy>
<ComponentType ID="{ComponentType}">
<xsl:apply-templates />
</ComponentType>
</xsl:copy>
</xsl:template>
This makes use of Attribute Value Templates to create the ID attribute.
Note that <xsl:apply-templates /> is short-hand for <xsl:apply-templates select="node()" /> and so this will still select the existing ComponentType element in the input document, which will then be matched by the identity template. To stop ComponentType being output twice, you need to add a template to match and ignore it.
<xsl:template match="ComponentType" />
Try this XSLT
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="xml" indent="yes" html-version="5"/>
<xsl:strip-space elements="*" />
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="ComponentDetails">
<xsl:copy>
<ComponentType ID="{ComponentType}">
<xsl:apply-templates />
</ComponentType>
</xsl:copy>
</xsl:template>
<xsl:template match="ComponentType" />
</xsl:stylesheet>

XSLT format-date for an attribute

<?xml version="1.0" encoding="UTF-8"?>
<catalog>
<cd created_at="2016-12-15T15:02:55Z">
<title created_at="2016-12-15T15:02:55Z">Empire Burlesque</title>
<artist created_at="2016-12-15T15:02:55Z">Bob Dylan</artist>
<cover created_at="2016-12-15T15:02:55Z"/>
<company>Columbia</company>
<price>10.90</price>
<year>1985</year>
</cd>
I want to format all occurrences of created_at attribute
input format YYYY-MM-DDTHH:MM:SSZ
output format YYYY-MM-DD HH:MM:SS
I am currently using this following xslt
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes"/>
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()" />
</xsl:copy>
</xsl:template>
<!-- Edit dates to conform to dbunit format-->
<xsl:template match="#created_at">
<xsl:copy>
<xsl:call-template name="formatdate">
<xsl:with-param name="datestr" select="#created_at"/>
</xsl:call-template>
</xsl:copy>
</xsl:template>
<xsl:template name="formatdate">
<xsl:param name="datestr" />
<!-- input format YYYY-MM-DDTHH:MM:SSZ -->
<!-- output format YYYY-MM-DD HH:MM:SS -->
<xsl:variable name="datetext">
<xsl:value-of select="substring-before($datestr,'T')" />
</xsl:variable>
<xsl:variable name="timetext">
<xsl:value-of select="substring($datestr,12,18)" />
</xsl:variable>
<xsl:value-of select="concat($datetext, ' ', $timetext)" />
</xsl:template>
</xsl:stylesheet>
However as I debug through the transformation xslt it does not seem to enter the formatdate call-template. Is my xpath wrong? I found articles on modifying the node, but not the attribute. Any help would be much appreciated.
Thank you
Why not simply:
<xsl:template match="#created_at">
<xsl:attribute name="created_at">
<xsl:value-of select="substring(translate(., 'T', ' '), 1, 19)" />
</xsl:attribute>
</xsl:template>
Note: you cannot use xsl:copy if you want to change an attribute's value.
From your post, it sounds like all you need is simple string processing.
Why your code isn't working the way you want
You're handling the #created_at attributes with this template:
<xsl:template match="#created_at">
<xsl:copy>
<xsl:call-template name="formatdate">
<xsl:with-param name="datestr" select="#created_at"/>
</xsl:call-template>
</xsl:copy>
</xsl:template>
The kicker here is that you're using <xsl:copy>. When used with attributes, <xsl:copy> copies the entire attribute, name and value both. And since attributes can't contain any children, the children of your <xsl:copy> instruction are ignored -- so the XSLT processor never evaluates the <xsl:call-template name="formatdate"> instruction.
A different approach that works
Instead of using <xsl:copy>, you need to instead use <xsl:attribute> to create an attribute in a way where you can also specify the value. In this case, you already know the name of the attribute you want to create, so you could hard-code the name value as created_at. For a more flexible approach, you could instead give the name value as {name(.)} -- this just grabs the name of the attribute being processed, which is closer in behavior to what you probably thought <xsl:copy> would do. :)
It is also possible to produce the desired string in a single xsl:value-of expression, without relying on so many variables.
<xsl:template match="#created_at">
<xsl:attribute name="{name(.)}">
<xsl:value-of select="concat(substring-before(., 'T'), ' ', substring-before(substring-after(., 'T'), 'Z'))"/>
</xsl:attribute>
</xsl:template>
Breaking down that select statement:
Use concat() to stitch together multiple bits of string.
Use substring-before(., 'T') to grab everything before the T -- that's the date portion.
' ' adds the single space in the middle.
substring-before(substring-after(., 'T'), 'Z') --
The inner expression substring-after(., 'T') grabs everything after the T -- that's the time portion.
However, there's that pesky Z on the end, so we use substring-before as the outer expression to lop that off.
No need for variables, and it gets the job done. Confirmed to work with XSLT 1.0.
Try this
<xsl:stylesheet version='1.0' xmlns:xsl='http://www.w3.org/1999/XSL/Transform'>
<xsl:output method='xml' indent='yes'/>
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()" />
</xsl:copy>
</xsl:template>
<!-- Edit dates to conform to dbunit format-->
<xsl:template match="#created_at">
<xsl:call-template name="formatdate">
<xsl:with-param name="datestr" select="."/>
</xsl:call-template>
</xsl:template>
<xsl:template name="formatdate">
<xsl:param name="datestr" />
<!-- input format YYYY-MM-DDTHH:MM:SSZ -->
<!-- output format YYYY-MM-DD HH:MM:SS -->
<xsl:variable name="datetext">
<xsl:value-of select="substring-before($datestr,'T')" />
</xsl:variable>
<xsl:variable name="timetext">
<xsl:value-of select="substring($datestr,12,8)" />
</xsl:variable>
<xsl:attribute name="created_at">
<xsl:value-of select="concat($datetext, ' ', $timetext)" />
</xsl:attribute>
</xsl:template>
</xsl:stylesheet>

CDATA XML masking with XSLT should return same XML with few masked fields

I have a requirement of masking few fields in XML of CDATA inside XML with XSLT.
So the resultant XML should be same like the input XML but few fileds are masked with XSLT.
I followed this link which is masking as expected but producing XML is in different format.
I tried many other solutions from SO, they are almost outputing the new XML/HTML in other format which is different from the input XML.
Please check the following example for better understading.
Input XML with CDATA content.
<XML>
<LogLevel>info</LogLevel>
<Content><![CDATA[ <Msg>
<AccountNo>2701000098983</AccountNo>
<ApplName>Testing</ApplName>
</Msg>]]></Content>
<Date>20140909</Date>
</XML>
Output XML should be:
<XML>
<LogLevel>info</LogLevel>
<Content><![CDATA[ <Msg>
<AccountNo>XXXXXXXXXX983</AccountNo>
<ApplName>Testing</ApplName>
</Msg>]]></Content>
<Date>20140909</Date>
</XML>
Edit:
I used the following XSLT
<xsl:template match="node()">
<xsl:copy>
<xsl:apply-templates/>
</xsl:copy>
</xsl:template>
<xsl:template match="text()">
<xsl:choose>
<xsl:when test="contains(.,'<AccountNo>')">
<!-- This is the CDATA that I want to mask and write back out as CDATA -->
<xsl:variable name="tcontent">
<xsl:value-of
select="substring-after(substring-before(.,'</AccountNo>'),'<AccountNo>') " />
</xsl:variable>
<xsl:text disable-output-escaping="yes"><![CDATA[<AccountNo></xsl:text>
<xsl:call-template name="maskVariable">
<xsl:with-param name="tvar" select="$tcontent" />
</xsl:call-template>
<xsl:text disable-output-escaping="yes"></AccountNo>]]></xsl:text>
</xsl:when>
<xsl:otherwise>
<xsl:copy />
</xsl:otherwise>
</xsl:choose>
</xsl:template>
<xsl:template name="maskVariable">
<xsl:param name="tvar" />
<xsl:variable name="length" select="string-length($tvar)" />
<xsl:choose>
<xsl:when test="$length > 3">
<xsl:value-of
select="concat ('************', substring($tvar,$length - 1, 2))" />
</xsl:when>
<xsl:when test="$length > 1">
***
</xsl:when>
<xsl:otherwise />
</xsl:choose>
</xsl:template>
Output of using this XSLT is :
<LogLevel>info</LogLevel>
<Content><![CDATA[<AccountNo>************02</AccountNo>]]></Content>
<Date>20140909</Date>
Here in output, only masked output of is displaying.
How to make other part of the code to get displayed ?
Please give me some idea how to do it ?
Any help is highly appreciated.
Why don't you try:
XSLT 1.0
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:strip-space elements="*"/>
<!-- identity transform -->
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="Content">
<xsl:copy>
<xsl:value-of select="substring-before(.,'<AccountNo>')" />
<xsl:text><AccountNo></xsl:text>
<xsl:variable name="acct-num" select="substring-before(substring-after(.,'<AccountNo>'), '</AccountNo>')" />
<xsl:value-of select="concat('************', substring($acct-num, string-length($acct-num) - 2))" />
<xsl:text></AccountNo></xsl:text>
<xsl:value-of select="substring-after(.,'</AccountNo>')" />
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
Applied to your input, the result will be:
<?xml version="1.0" encoding="UTF-8"?>
<XML>
<LogLevel>info</LogLevel>
<Content> <Msg>
<AccountNo>************983</AccountNo>
<ApplName>Testing</ApplName>
</Msg></Content>
<Date>20140909</Date>
</XML>
Alternatively, you could use:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes" cdata-section-elements="Content"/>
<xsl:strip-space elements="*"/>
<!-- identity transform -->
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="Content">
<xsl:copy>
<xsl:variable name="content">
<xsl:value-of select="substring-before(.,'<AccountNo>')" />
<xsl:text><AccountNo></xsl:text>
<xsl:variable name="acct-num" select="substring-before(substring-after(.,'<AccountNo>'), '</AccountNo>')" />
<xsl:value-of select="concat('************', substring($acct-num, string-length($acct-num) - 2))" />
<xsl:text></AccountNo></xsl:text>
<xsl:value-of select="substring-after(.,'</AccountNo>')" />
</xsl:variable>
<xsl:value-of select="$content" disable-output-escaping="yes"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
to produce:
<?xml version="1.0" encoding="UTF-8"?>
<XML>
<LogLevel>info</LogLevel>
<Content><![CDATA[ <Msg>
<AccountNo>************983</AccountNo>
<ApplName>Testing</ApplName>
</Msg>]]></Content>
<Date>20140909</Date>
</XML>
although this might not work with every processor (tested to work with Xalan 2.7.1: http://xsltransform.net/jyH9rMk).
The stuff inside the CDATA section is XML disguised as text. XSLT is good at transforming XML, it's not so good at transforming text, especially text with a complex grammar. So my approach would be: extract the text from the outer XML document, parse it as XML, transform it using XSLT (real XSLT that works on nodes rather than on markup), serialize it back to text, then stuff the text back into the original (outer) XML document.
Raw XSLT 1.0 can't do this within a single stylesheet. You need the functions parse() and serialize() to turn lexical XML into a node tree, and back again. These are available as extensions in some processors (such as Saxon), they become available as standard functions in XPath 3.0, and they can be written as simple extension functions (e.g. in Javascript) code in most other processors.

Populate XML template-file from XPath Expressions?

What would be the best way to populate (or generate) an XML template-file from a mapping of XPath expressions?
The requirements are that we will need to start with a template (since this might contain information not otherwise captured in the XPath expressions).
For example, a starting template might be:
<s11:Envelope xmlns:s11='http://schemas.xmlsoap.org/soap/envelope/'>
<ns1:create xmlns:ns1='http://predic8.com/wsdl/material/ArticleService/1/'>
<article xmlns:ns1='http://predic8.com/material/1/'>
<name>?XXX?</name>
<description>?XXX?</description>
<price xmlns:ns1='http://predic8.com/common/1/'>
<amount>?999.99?</amount>
<currency xmlns:ns1='http://predic8.com/common/1/'>???</currency>
</price>
<id xmlns:ns1='http://predic8.com/material/1/'>???</id>
</article>
</ns1:create>
</s11:Body>
</s11:Envelope>
Then we are supplied, something like:
expression: /create/article[1]/id => 1
expression: /create/article[1]/description => bar
expression: /create/article[1]/name[1] => foo
expression: /create/article[1]/price[1]/amount => 00.00
expression: /create/article[1]/price[1]/currency => USD
expression: /create/article[2]/id => 2
expression: /create/article[2]/description => some name
expression: /create/article[2]/name[1] => some description
expression: /create/article[2]/price[1]/amount => 00.01
expression: /create/article[2]/price[1]/currency => USD
We should then generate:
<ns1:create xmlns:ns1='http://predic8.com/wsdl/material/ArticleService/1/'>
<article xmlns:ns1='http://predic8.com/material/1/'>
<name xmlns:ns1='http://predic8.com/material/1/'>foo</name>
<description>bar</description>
<price xmlns:ns1='http://predic8.com/common/1/'>
<amount>00.00</amount>
<currency xmlns:ns1='http://predic8.com/common/1/'>USD</currency>
</price>
<id xmlns:ns1='http://predic8.com/material/1/'>1</id>
</article>
<article xmlns:ns1='http://predic8.com/material/2/'>
<name>some name</name>
<description>some description</description>
<price xmlns:ns1='http://predic8.com/common/2/'>
<amount>00.01</amount>
<currency xmlns:ns1='http://predic8.com/common/2/'>USD</currency>
</price>
<id xmlns:ns1='http://predic8.com/material/2/'>2</id>
</article>
</ns1:create>
I am implemented in Java, although I would prefer an XSLT-based solution if one is possible.
PS: This question is the reverse of another question I recently asked.
This transformation creates from the "expressions" an XML document that has the structure of the wanted result -- it remains to transform this result into the final result:
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:my="my:my">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:variable name="vPop" as="element()*">
<item path="/create/article[1]/id">1</item>
<item path="/create/article[1]/description">bar</item>
<item path="/create/article[1]/name[1]">foo</item>
<item path="/create/article[1]/price[1]/amount">00.00</item>
<item path="/create/article[1]/price[1]/currency">USD</item>
<item path="/create/article[1]/price[2]/amount">11.11</item>
<item path="/create/article[1]/price[2]/currency">AUD</item>
<item path="/create/article[2]/id">2</item>
<item path="/create/article[2]/description">some name</item>
<item path="/create/article[2]/name[1]">some description</item>
<item path="/create/article[2]/price[1]/amount">00.01</item>
<item path="/create/article[2]/price[1]/currency">USD</item>
</xsl:variable>
<xsl:template match="/">
<xsl:sequence select="my:subTree($vPop/#path/concat(.,'/',string(..)))"/>
</xsl:template>
<xsl:function name="my:subTree" as="node()*">
<xsl:param name="pPaths" as="xs:string*"/>
<xsl:for-each-group select="$pPaths"
group-adjacent=
"substring-before(substring-after(concat(., '/'), '/'), '/')">
<xsl:if test="current-grouping-key()">
<xsl:choose>
<xsl:when test=
"substring-after(current-group()[1], current-grouping-key())">
<xsl:element name=
"{substring-before(concat(current-grouping-key(), '['), '[')}">
<xsl:sequence select=
"my:subTree(for $s in current-group()
return
concat('/',substring-after(substring($s, 2),'/'))
)
"/>
</xsl:element>
</xsl:when>
<xsl:otherwise>
<xsl:value-of select="current-grouping-key()"/>
</xsl:otherwise>
</xsl:choose>
</xsl:if>
</xsl:for-each-group>
</xsl:function>
</xsl:stylesheet>
When this transformation is applied on any XML document (not used), the result is:
<create>
<article>
<id>1</id>
<description>bar</description>
<name>foo</name>
<price>
<amount>00.00</amount>
<currency>USD</currency>
</price>
<price>
<amount>11.11</amount>
<currency>AUD</currency>
</price>
</article>
<article>
<id>2</id>
<description>some name</description>
<name>some description</name>
<price>
<amount>00.01</amount>
<currency>USD</currency>
</price>
</article>
</create>
Note:
You need to transform the "expressions" you are given into the format used in this transformation -- this is easy and straightforward.
In the final transformation you need to copy every node "as-is" (using the identity rule), with the exception that the top node should be generated in the "http://predic8.com/wsdl/material/ArticleService/1/" namespace. Note that the other namespaces present in the "template" are not used and can be safely ommitted.
This solution requires you to re-organise your XPATH input information slightly, and to allow a 2-step transformation. The first transformation will write the stylesheet, which will be executed in the second transformation - Thus the client is required to do two invocations of the XSLT engine. Let us know if this is a problem.
Step One
Please re-organise your XPATH information into an XML document like so. It should not be difficult to do, and even an XSLT script could be written to do the job.
<paths>
<rule>
<match>article[1]/id[1]</match>
<namespaces>
<namespace prefix="ns1">http://predic8.com/wsdl/material/ArticleService/1/</namespace>
<!-- The namespace node declares a namespace that is used in the match expression.
There can be many of these. It is not required to define the s11: namespace,
nor the ns1 namespace. -->
</namespaces>
<replacement>1</replacement>
</rule>
<rule>
<match>article[1]/description[1]</match>
<namespaces/>
<replacement>bar</replacement>
</rule>
... etc ...
</paths>
Solution constraints
In the above rules document we are constrained so that:
The match is implicitly prefixed 'expression: /create/'. Don't put that explicitly.
All matches must begin like article[n] where n is some ordinal number.
We can't have zero rules.
Any prefixes that you use in the match, other than s11="http://schemas.xmlsoap.org/soap/envelope/" and ns1="http://predic8.com/wsdl/material/ArticleService/1/". (Note: I don't think it is valid for namespaces to end in '/' - but not sure about that), are defined in the namespaces node.
The above is the input document to the step one transformation. Apply this document to this style-sheet ...
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:step2="http://www.w3.org/1999/XSL/Transform-step2"
xmlns:s11="http://schemas.xmlsoap.org/soap/envelope/"
xmlns:ns1="http://predic8.com/wsdl/material/ArticleService/1/"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
exclude-result-prefixes='xsl'>
<xsl:output method="xml" indent="yes" encoding="UTF-8" />
<xsl:namespace-alias stylesheet-prefix="step2" result-prefix="xsl"/>
<xsl:template match="/">
<step2:stylesheet version="2.0">
<step2:output method="xml" indent="yes" encoding="UTF-8" />
<step2:variable name="replicated-template" as="element()*">
<step2:apply-templates select="/" mode="replication" />
</step2:variable>
<step2:template match="#*|node()" mode="replication">
<step2:copy>
<step2:apply-templates select="#*|node()" mode="replication" />
</step2:copy>
</step2:template>
<step2:template match="/s11:Envelope/s11:Body/ns1:create/article" mode="replication">
<step2:variable name="replicant" select="." />
<step2:for-each select="for $i in 1 to
{max(for $m in /paths/rule/match return
xs:integer(substring-before(substring-after($m,'article['),']')))}
return $i">
<step2:for-each select="$replicant">
<step2:copy>
<step2:apply-templates select="#*|node()" mode="replication" />
</step2:copy>
</step2:for-each>
</step2:for-each>
</step2:template>
<step2:template match="#*|node()">
<step2:copy>
<step2:apply-templates select="#*|node()"/>
</step2:copy>
</step2:template>
<step2:template match="/">
<step2:apply-templates select="$replicated-template" />
</step2:template>
<xsl:apply-templates select="paths/rule" />
</step2:stylesheet>
</xsl:template>
<xsl:template match="rule">
<step2:template match="s11:Envelope/s11:Body/ns1:create/{match}">
<xsl:for-each select="namespaces/namespace">
<xsl:namespace name="{#prefix}" select="." />
</xsl:for-each>
<step2:copy>
<step2:apply-templates select="#*"/>
<step2:value-of select="'{replacement}'"/>
<step2:apply-templates select="*"/>
</step2:copy>
</step2:template>
</xsl:template>
</xsl:stylesheet>
Step Two
Apply your soap envelope file, as an input document, to the style-sheet which was output from step one. The result is the original soap document, altered as required. This is a sample of a step two style-sheet, with just the first rule (/create/article[1]/id => 1) being considered for the sake of simplicity of illustration.
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:s11="http://schemas.xmlsoap.org/soap/envelope/"
version="2.0">
<xsl:output method="xml" indent="yes" encoding="UTF-8"/>
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template xmlns:ns1="http://predic8.com/wsdl/material/ArticleService/1/"
match="/s11:Envelope/s11:Body/ns1:create[1]/article[1]/id[1]">
<xsl:copy>
<xsl:apply-templates select="#*"/>
<xsl:value-of select="'1'"/>
<xsl:apply-templates select="*"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
More solution constraints
The template document must contain at least one /s11:Envelope/s11:Body/ns1:create/article . Only the article node is replicated (deeply) as required by rules. Other than than it can be any structure.
The template document cannot contain nested levels of the s11:Envelope/s11:Body/ns1:create node.
Explanation
You will notice that your XPATH expressions are not far removed from a match condition of template. Therefore it is not too difficult to write a stylesheet which re-expresses your XPATH and replacement values as template rules. When writing a style-sheet writing style-sheet the xsl:namespace-alias enables us to disambiguate "xsl:" as an instruction and "xsl:" as intended output. When XSLT 3.0 comes along, we are quiet likely to be able to reduce this algorithm into one step, as it will allow dynamic XPATH evaluation, which is really the nub of your problem. But for the moment we must be content with a 2-step process.
The second style-sheet is a two-phase transformation. The first stage replicates the template from the article level, as many times as needed by the rules. The second phase parses this replicated template, and applies the dynamic rules substituting text values as indicated by the XPATHs.
UPDATE
My original post was wrong. Thanks to Dimitre for pointing out the error. Please find updated solution above.
After-thought
If a two-step solultion is too complicated, and you are running on a wintel platform, you may consider purchasing the commercial version of Saxon. I believe that the commercial version has a dynamic XPATH evaluation function. I can't give you such a solution because I don't have the commercial version. I imagine a solution using an evaluate() function would be a lot simpler. XSLT is just a hobby for me. But if you are using XSLT for business purposes, the price is quiet reasonable.

Write at the end of an xml

i have multiple Xml files, in a List<File>. What i want is to transform those xml into one Xml with an Xsl :
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes"/>
<xsl:template match="testsuites">
<xsl:call-template name="summary"/>
</xsl:template>
<xsl:template name="summary">
<xsl:variable name="testCount" select="sum(testsuite/#tests)"/>
<xsl:variable name="errorCount" select="sum(testsuite/#errors)"/>
<xsl:variable name="failureCount" select="sum(testsuite/#failures)"/>
<xsl:variable name="timeCount" select="sum(testsuite/#time)"/>
<xsl:variable name="successRate" select="($testCount - $failureCount - $errorCount) div $testCount"/>
<xsl:attribute name="class">
<xsl:choose>
<xsl:when test="$failureCount > 0">Failure</xsl:when>
<xsl:when test="$errorCount > 0">Error</xsl:when>
</xsl:choose>
</xsl:attribute>
<Build>
<NombreTest><xsl:value-of select="$testCount"/></NombreTest>
<Failures><xsl:value-of select="$failureCount"/></Failures>
<Erreurs><xsl:value-of select="$errorCount"/></Erreurs>
<PercentSucces><xsl:call-template name="display-percent">
<xsl:with-param name="value" select="$successRate"/>
</xsl:call-template></PercentSucces>
<ExecTime><xsl:call-template name="display-time">
<xsl:with-param name="value" select="$timeCount"/>
</xsl:call-template> </ExecTime>
</Build>
</xsl:template>
<xsl:template match="failure">
<xsl:call-template name="display-failures"/>
</xsl:template>
<xsl:template match="error">
<xsl:call-template name="display-failures"/>
</xsl:template>
<xsl:template name="display-time">
<xsl:param name="value"/>
<xsl:value-of select="format-number($value,'0.000')"/>
</xsl:template>
<xsl:template name="display-percent">
<xsl:param name="value"/>
<xsl:value-of select="format-number($value,'0.00%')"/>
</xsl:template>
<xsl:template name="display-failures">
<xsl:choose>
<xsl:when test="not(#message)">N/A</xsl:when>
<xsl:otherwise>
<xsl:value-of select="#message"/>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
</xsl:stylesheet>
My problem is that when i am looping and apply the transform with a TransformerFactory it always erase the output XML. I want to edit the output instead.
I know that i can do it in java with a temporary XML and after merge it, but i'm almost sure that it is possible in XSL?
Thanks for helping
You need to pass all document URLs within a single external parameter and you will typically have a transformation like this:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:param name="pdocNames">
<name>doc1.xml</name>
<name>doc2.xml</name>
<name>doc3.xml</name>
</xsl:param>
<!-- you can directly use $pdocNames/name
if the param is provided externally -->
<xsl:variable name="vDocNames" select=
"document('')/*/xsl:param[]#name='pdocNames']/name"/>
<xsl:template match="/">
<combinedDocs>
<xsl:copy-of select="document($vDocNames)"/>
</combinedDocs>
</xsl:template>
</xsl:stylesheet>
When this transformation is applied on any XML document (not used), it performs the following:
Obtains the name elements that contain the provides in the parameter $pdocNames document URIs. These elements are contained in the variable vDocNames.
Creates the top element for the output document (in this case named combinedDocs).
Copies all XML documents, whose URIs are in the name elements contained in the vDocNames variable. The standard XSLT function document() is used here.
Do note:
The URLs of all wanted XML documents must be passed externally via a parameter to the transformation. It is vendor-dependent how to pass a parameter to the transformation. You need to read the documentation provided for your particular XSLT processor.
You have to load your documents with document(URI) XSLT function
See also: http://www.w3schools.com/Xsl/func_document.asp

Categories