xml - Optimization of tokenizing XSLT -


the input have:

i'm working sharepoint list produces rss feeds in following form:

<?xml version="1.0"?> <rss>   <channel>     <!-- irrelevant fields -->     <item>       <title type="text">title</title>       <description type="html">         &lt;div&gt;&lt;b&gt;field1:&lt;/b&gt; value 1&lt;/div&gt;         &lt;div&gt;&lt;b&gt;field2:&lt;/b&gt; value 2&lt;/div&gt;         &lt;div&gt;&lt;b&gt;field3:&lt;/b&gt; value 3&lt;/div&gt;         &lt;div&gt;&lt;b&gt;field4:&lt;/b&gt; value 4&lt;/div&gt;         &lt;div&gt;&lt;b&gt;field5:&lt;/b&gt; value 5&lt;/div&gt;       </description>     </item>     <item>       <title type="text">title</title>       <description type="html">         &lt;div&gt;&lt;b&gt;field1:&lt;/b&gt; value 1&lt;/div&gt;         &lt;div&gt;&lt;b&gt;field3:&lt;/b&gt; value 3&lt;/div&gt;         &lt;div&gt;&lt;b&gt;field4:&lt;/b&gt; value 4&lt;/div&gt;         &lt;div&gt;&lt;b&gt;field5:&lt;/b&gt; value 5&lt;/div&gt;       </description>     </item>     <item>       <title type="text">title</title>       <description type="html">         &lt;div&gt;&lt;b&gt;field1:&lt;/b&gt; value 1&lt;/div&gt;         &lt;div&gt;&lt;b&gt;field2:&lt;/b&gt; value 2&lt;/div&gt;         &lt;div&gt;&lt;b&gt;field3:&lt;/b&gt; value 3&lt;/div&gt;         &lt;div&gt;&lt;b&gt;field4:&lt;/b&gt; value 4&lt;/div&gt;         &lt;div&gt;&lt;b&gt;field5:&lt;/b&gt; value 5&lt;/div&gt;       </description>     </item>     <!-- more <item> elements -->   </channel> </rss> 

note <description> element seems define set of elements. furthermore, note not <description> elements contain markup "field2".

what need:

i need xml of following form:

<?xml version="1.0"?> <events>   <event>     <category>title</category>     <field1>value 1</field1>     <field2>value 2</field2>     <field3>value 3</field3>     <field4>value 4</field4>     <field5>value 5</field5>   </event>   <event>     <category>title</category>     <field1>value 1</field1>     <field2/>     <field3>value 3</field3>     <field4>value 4</field4>     <field5>value 5</field5>   </event>   <event>     <category>title</category>     <field1>value 1</field1>     <field2>value 2</field2>     <field3>value 3</field3>     <field4>value 4</field4>     <field5>value 5</field5>   </event> </events> 

the rules (updated):

  1. this needs xslt 1.0 solution.
  2. xxx:node-set valid extension function available me; includes extension functions written in other languages, such c# or javascript.
  3. if field's information missing, blank element should output. note in desired output empty <field2> child within second <event> element.
  4. we cannot assume field names follow particular pattern; may <peanutbutter>, <jelly>, etc.

what have far:

<?xml version="1.0"?> <xsl:stylesheet   xmlns:xsl="http://www.w3.org/1999/xsl/transform"   xmlns:exsl="http://exslt.org/common"    exclude-result-prefixes="exsl"   version="1.0">   <xsl:output method="xml" omit-xml-declaration="no" indent="yes"/>   <xsl:strip-space elements="*"/>    <xsl:template match="/*">     <events>       <xsl:apply-templates select="*/item"/>     </events>   </xsl:template>    <xsl:template match="item[contains(description, 'field2')]">     <event>       <xsl:variable name="velements">         <xsl:call-template name="tokenize">           <xsl:with-param name="text" select="description"/>           <xsl:with-param name="delimiter" select="'&#10;'"/>         </xsl:call-template>       </xsl:variable>        <category>         <xsl:value-of select="title"/>       </category>       <xsl:apply-templates         select="exsl:node-set($velements)/*[normalize-space()]" mode="token"/>     </event>   </xsl:template>    <!-- note how template identical last one,        minus blank <field2>; that's not elegant. -->   <xsl:template match="item[not(contains(description, 'field2'))]">     <event>       <xsl:variable name="velements">         <xsl:call-template name="tokenize">           <xsl:with-param name="text" select="description"/>           <xsl:with-param name="delimiter" select="'&#10;'"/>         </xsl:call-template>       </xsl:variable>        <category>         <xsl:value-of select="title"/>       </category>       <xsl:apply-templates         select="exsl:node-set($velements)/*[normalize-space()]" mode="token"/>       <field2/>     </event>   </xsl:template>    <xsl:template match="*" mode="token">     <xsl:element       name="{substring-after(                substring-before(normalize-space(), ':'),                 '&lt;div&gt;&lt;b&gt;')}">       <xsl:value-of         select="substring-before(                   substring-after(., ':&lt;/b&gt; '),                   '&lt;/div&gt;')"/>     </xsl:element>   </xsl:template>    <xsl:template name="tokenize">     <xsl:param name="text"/>     <xsl:param name="delimiter" select="' '"/>     <xsl:choose>       <xsl:when test="contains($text,$delimiter)">         <xsl:element name="token">           <xsl:value-of select="substring-before($text,$delimiter)"/>         </xsl:element>         <xsl:call-template name="tokenize">           <xsl:with-param             name="text"             select="substring-after($text,$delimiter)"/>           <xsl:with-param             name="delimiter"             select="$delimiter"/>         </xsl:call-template>       </xsl:when>       <xsl:when test="$text">         <xsl:element name="token">           <xsl:value-of select="$text"/>         </xsl:element>       </xsl:when>     </xsl:choose>   </xsl:template> </xsl:stylesheet> 

...which produces:

<?xml version="1.0"?> <events>   <event>     <category>title</category>     <field1>value 1</field1>     <field2>value 2</field2>     <field3>value 3</field3>     <field4>value 4</field4>     <field5>value 5</field5>   </event>   <event>     <category>title</category>     <field1>value 1</field1>     <field3>value 3</field3>     <field4>value 4</field4>     <field5>value 5</field5>     <field2/>   </event>   <event>     <category>title</category>     <field1>value 1</field1>     <field2>value 2</field2>     <field3>value 3</field3>     <field4>value 4</field4>     <field5>value 5</field5>   </event> </events> 

there 2 primary issues solution:

  1. it feels clunky; there's repetitive code , seems tad unwieldy. i'm thinking optimization occur?
  2. notice outputs empty <field2> elements in incorrect order , places them @ bottom. remedied, suppose, of solutions seem silly , therefore not included. :)

ready, set, go!

i appreciate more elegant solution (or, @ least, solution fixes issue #2 above). thanks!


conclusion

based on observations made @borodin in own solution, decided go following:

<?xml version="1.0"?> <xsl:stylesheet   xmlns:xsl="http://www.w3.org/1999/xsl/transform"   xmlns:exsl="http://exslt.org/common"   exclude-result-prefixes="exsl"   version="1.0">   <xsl:output method="xml" omit-xml-declaration="no" indent="yes"/>   <xsl:strip-space elements="*"/>    <xsl:variable name="vfieldnames">     <name oldname="field1" newname="fielda" />     <name oldname="field2" newname="fieldb" />     <name oldname="field3" newname="fieldc" />     <name oldname="field4" newname="fieldd" />     <name oldname="field5" newname="fielde" />   </xsl:variable>    <xsl:template match="/">     <events>       <xsl:apply-templates select="*/*/item" />     </events>   </xsl:template>    <xsl:template match="item">     <event>       <category>         <xsl:value-of select="title" />       </category>       <xsl:apply-templates select="exsl:node-set($vfieldnames)/*">         <xsl:with-param           name="pdescriptiontext"           select="current()/description" />       </xsl:apply-templates>     </event>   </xsl:template>    <xsl:template match="name">      <xsl:param name="pdescriptiontext" />      <xsl:variable        name="vrough"        select="substring-before(                  substring-after($pdescriptiontext, @oldname),                   'div')"/>       <xsl:variable        name="vvalue"        select="substring-before(                  substring-after($vrough, '&gt;'),                  '&lt;')"/>      <xsl:element name="{@newname}">        <xsl:value-of select="normalize-space($vvalue)" />      </xsl:element>   </xsl:template>  </xsl:stylesheet> 

this solution adds 1 layer: allows me change field names nicely (via oldname , newname attributes on each <name> element).

thanks answered!

you may interested in solution. have used literal field names field1 though field5 and, have access node-set, have added these names in variable can conveniently modified.

the code processes description text extract value each field name taking 2 bites @ it. first pass creates $rough selecting text after field name , before text div. give :&lt;/b&gt; value 1&lt;/ (or :</b> value 1</). next refinement takes in $rough after &gt; , before &lt;, giving value 1. spaces trimmed final value using normalize-space in xsl:value-of element.

xslt takes care of missing field2 (or field) returning null string substring-before if delimiter string isn't found in target string.

<?xml version="1.0" encoding="utf-8"?> <xsl:stylesheet     xmlns:xsl="http://www.w3.org/1999/xsl/transform"     xmlns:ext="http://exslt.org/common"     exclude-result-prefixes="ext"     version="1.0">      <xsl:strip-space elements="*"/>     <xsl:output method="xml" indent="yes"/>      <xsl:variable name="names">         <name>field1</name>         <name>field2</name>         <name>field3</name>         <name>field4</name>         <name>field5</name>     </xsl:variable>      <xsl:template match="/">         <events>             <xsl:apply-templates select="rss/channel/item"/>         </events>     </xsl:template>      <xsl:template match="item">         <xsl:variable name="description" select="description"/>         <event>             <category>                 <xsl:value-of select="title"/>             </category>             <xsl:for-each select="ext:node-set($names)/name">                 <xsl:call-template name="extract">                     <xsl:with-param name="text" select="$description"/>                     <xsl:with-param name="field-name" select="."/>                 </xsl:call-template>                 <xsl:variable name="field-name" select="."/>             </xsl:for-each>         </event>     </xsl:template>      <xsl:template name="extract">         <xsl:param name="text"/>         <xsl:param name="field-name"/>         <xsl:variable name="rough" select="substring-before(substring-after($text, $field-name), 'div')"/>         <xsl:variable name="value" select="substring-before(substring-after($rough, '&gt;'), '&lt;')"/>         <xsl:element name="{$field-name}">             <xsl:value-of select="normalize-space($value)"/>         </xsl:element>     </xsl:template>  </xsl:stylesheet> 

output

<?xml version="1.0" encoding="utf-8"?> <events>    <event>       <category>title</category>       <field1>value 1</field1>       <field2>value 2</field2>       <field3>value 3</field3>       <field4>value 4</field4>       <field5>value 5</field5>    </event>    <event>       <category>title</category>       <field1>value 1</field1>       <field2/>       <field3>value 3</field3>       <field4>value 4</field4>       <field5>value 5</field5>    </event>    <event>       <category>title</category>       <field1>value 1</field1>       <field2>value 2</field2>       <field3>value 3</field3>       <field4>value 4</field4>       <field5>value 5</field5>    </event> </events> 

Comments