XSLT - Remove full parent node if child contains specific text - xslt-1.0

I just need to transform one XML file to another XML file with the wanted data.
The transformation remove every parent nodes if child nodes doesn't contain the correct category.
Input XML:
<SHOP>
<SHOPITEM>
<ITEM_ID>142</ITEM_ID>
<PRODUCT>Mora MV 1251</PRODUCT>
<DESCRIPTION>XXXXX</DESCRIPTION>
<URL>http://www.xxx.sk/o</URL>
<IMGURL>http://xxx.jpg</IMGURL>
<PRICE>6.74</PRICE>
<PRICE_VAT>8.10</PRICE_VAT>
<VAT>0.20</VAT>
<MANUFACTURER>Mora</MANUFACTURER>
<CATEGORYTEXT>Accessories / Mobile</CATEGORYTEXT>
<EAN>8590371028526</EAN>
<PRODUCTNO></PRODUCTNO>
<WARRANTY>24</WARRANTY>
<DELIVERY_DATE>24</DELIVERY_DATE>
</SHOPITEM>
<SHOPITEM>
<ITEM_ID>XXX</ITEM_ID>
<PRODUCT>Hyundai LLF 22924 DVDR</PRODUCT>
<DESCRIPTION>XXXXX</DESCRIPTION>
<URL>http://www.xxx.sk/t</URL>
<IMGURL>http://xxx.jpg</IMGURL>
<PRICE>173.35</PRICE>
<PRICE_VAT>208.00</PRICE_VAT>
<VAT>0.20</VAT>
<MANUFACTURER>Hyundai</MANUFACTURER>
<CATEGORYTEXT>Main category / TVs</CATEGORYTEXT>
<EAN>xxxxx</EAN>
<PRODUCTNO/>
<WARRANTY>24</WARRANTY>
<DELIVERY_DATE>99999</DELIVERY_DATE>
</SHOPITEM>
</SHOP>
I need to remove every SHOPITEM nodes where the CATEGORYTEXT is not Main category / TVs
I have the below XSLT:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="node()|#*">
<xsl:copy>
<xsl:apply-templates select="node()|#*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="CATEGORYTEXT[not(text() = 'Main category / TVs')]"/>
</xsl:stylesheet>
But this only removes the CATEGORYTEXT node, not the full SHOPITEM node.
Can anyone help me please? :)
I'm new too XSLT, every help is appreciated.
Desired output is:
<SHOP>
<SHOPITEM>
<ITEM_ID>XXX</ITEM_ID>
<PRODUCT>Hyundai LLF 22924 DVDR</PRODUCT>
<DESCRIPTION>XXXXX</DESCRIPTION>
<URL>http://www.xxx.sk/t</URL>
<IMGURL>http://xxx.jpg</IMGURL>
<PRICE>173.35</PRICE>
<PRICE_VAT>208.00</PRICE_VAT>
<VAT>0.20</VAT>
<MANUFACTURER>Hyundai</MANUFACTURER>
<CATEGORYTEXT>Main category / TVs</CATEGORYTEXT>
<EAN>xxxxx</EAN>
<PRODUCTNO/>
<WARRANTY>24</WARRANTY>
<DELIVERY_DATE>99999</DELIVERY_DATE>
</SHOPITEM>
</SHOP>

Replace this
<xsl:template match="CATEGORYTEXT[not(text() = 'Main category / TVs')]"/>
with this:
<xsl:template match="SHOPITEM[not(CATEGORYTEXT = 'Main category / TVs')]"/>
Your current template filters CATEGORYTEXT elements based on the value of their text node children, whereas what you need is to filter SHOPITEM elements based on the value of their CATEGORYTEXT element children.
Note that it is only very rarely that you actually need to use text() in an XPath expression - text() gives you a set of all the individual text nodes that are direct children of the element you're dealing with, whereas usually what you care about is the complete string value of the element itself (which is, by definition, the concatenation of all its descendant text nodes)

Related

xpath1.0: post-filtering a node set by change in attribute when preceding sibling is not possible

Below, I have a subset of car nodes. I then want to process this further by filtering-in by a change in an attribute.
Because preceding-sibling works on the document, the second xpath below will not work if the preceding-sibling node is a non-yellow car. Is there a simple way to achieve this in xpath or with minimal xslt?
<xsl:template match="/">
<xsl:variable name="yellowCars" select="/cars/car[#color = 'yellow']"/>
<changedinspectiondate>
<xsl:apply-templates select="$yellowCars[not(#inspectiondate = preceding-sibling::car[1]/#inspectiondate)]"/>
</changedinspectiondate>
</xsl:template>

XSLT to put one particular XML element before all others

XSLT 1.0 solution required. My question is similar to XSLT Change element order and I'll take this answer if I have to, but I hope I can do something like 'put this_element first, and retain the original order of all the rest of them'. The input is something like this, where ... can be any set of simple elements or text nodes, but no processing instructions nor comments. See below also.
<someXML>
<recordList>
<record priref="1" created="2009-06-04T16:54:35" modification="2014-12-16T14:56:51" selected="False">
...
<collection_type>3D</collection_type>
...
<object_category>headgear</object_category>
<object_name>hat</object_name>
<object_number>060998</object_number>
...
</record>
<record priref="3" created="2009-06-04T11:54:35" modification="2020-08-05T18:24:33" selected="False">
...
<collection_type>3D</collection_type>
<description>a very elaborate coat</description>
<object_category>clothing</object_category>
<object_name>coat</object_name>
<object_number>060998</object_number>
</record>
</recordList>
</someXML>
This would be the desired output.
<someXML>
<recordList>
<record priref="1" created="2009-06-04T16:54:35" modification="2014-12-16T14:56:51" selected="False">
<object_category>clothing</object_category>
...
<collection_type>3D</collection_type>
...
<object_name>hat</object_name>
<object_number>060998</object_number>
...
</record>
<record priref="3" created="2009-06-04T11:54:35" modification="2020-08-05T18:24:33" selected="False">
<object_category>clothing</object_category>
...
<collection_type>3D</collection_type>
<description>a very elaborate coat</description>
<object_name>coat</object_name>
<object_number>060998</object_number>
</record>
</recordList>
</someXML>
It's probably OK if object_category is put first, and then occurs again later on in the record, i.e. in the tags in their original order.
I'll add some background. There's this API producing about 900.000 XML records with different tags (element names) in alphabetical order, per record. There are about 170 different element names (that's why I don't want to have to list them all individually, unless there's no other way). The XML is ingested into this graph database. That takes time, but it could be sped up if we see the object_category as the first element in the record.
Edit: We can configure the API, but not the C# code behind the API. We step through the database, step by step ingesting chunks of ~100 records. If we specify nothing else, we get the XML as exemplified above. We can also specify an XSL sheet to transform the XML. That's what we want to do here.
The example is ambiguous, because we don't know what all those ... placeholders stand for. I suppose this should work for you:
XSLT 1.0
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:strip-space elements="*"/>
<!-- identity transform -->
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="record">
<xsl:copy>
<xsl:apply-templates select="#*"/>
<xsl:apply-templates select="object_category"/>
<xsl:apply-templates select="node()[not(self::object_category)]"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>

how to transform a parameter using value-of and select

I am using xslt to transform an xml document into another xml document.
in the below code it's only getting the value of the email address for the first review and using that for all the reviews, instead of getting the email address for each review. I know it's not good to use // but when I just use Review/UserEmailAddress the value is blank and I don't know how else to do it.
Here's my input xml:
<Product id="867776000050">
<ExternalId>867776000050</ExternalId>
<Reviews>
<Review id="3924" removed="false">
<UserProfileReference id="Haliley">
<ExternalId>Haliley</ExternalId>
<DisplayName>Haliley</DisplayName>
</UserProfileReference>
<UserEmailAddress>hbonb#yahoo.com</UserEmailAddress>
</Review>
<Review id="3919" removed="false">
<UserProfileReference id="PaulineTincher">
<ExternalId>PaulineTincher</ExternalId>
<DisplayName>PaulineTincher</DisplayName>
</UserProfileReference>
<UserEmailAddress>pt59#msn.com</UserEmailAddress>
</Review>
</Reviews>
</Product>
Here's my stylesheet:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" omit-xml-declaration="no" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="node()|#*">
<xsl:copy>
<xsl:apply-templates select="node()|#*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="Review/UserProfileReference">
<xsl:variable name="userid"><xsl:value-of select="ExternalId"/>
<xsl:value-of select="substring-before(//Review/UserEmailAddress, '#')"/></xsl:variable>
<UserProfileReference id="{$userid}">
<ExternalId><xsl:value-of select="ExternalId"/><xsl:value-of select="substring-before(//Review/UserEmailAddress, '#')"/></ExternalId>
<DisplayName><xsl:value-of select="DisplayName"/></DisplayName>
</UserProfileReference>
</xsl:template>
</xsl:stylesheet>
I am trying to make the UserProfileReference unique by appending the first part of the email address to the existing value.
In my results below the UserProfileReference id value for the first review is correct, it appends the value of the UserEmailAddress to the id.
But for review 2, it uses the email address from review 1, not review 2. I've spent a ton of time on this and just can't figure it out. Please help!
<?xml version="1.0" encoding="UTF-8"?>
<Product id="867776000050">
<ExternalId>867776000050</ExternalId>
<Reviews>
<Review id="3924" removed="false">
<UserProfileReference id="Halileyhbonb">
<ExternalId>Halileyhbonb</ExternalId>
<DisplayName>Haliley</DisplayName>
</UserProfileReference>
<UserEmailAddress>hbonb#yahoo.com</UserEmailAddress>
</Review>
<Review id="3919" removed="false">
<UserProfileReference id="PaulineTincherhbonb">
<ExternalId>PaulineTincherhbonb</ExternalId>
<DisplayName>PaulineTincher</DisplayName>
</UserProfileReference>
<UserEmailAddress>pt59#msn.com</UserEmailAddress>
</Review>
</Reviews>
</Product>
You are using a wrong expression: // acts globally, hence //Review/UserEmailAddress returns a nodeset with all Review/UserEmailAddresses which can globally be found and uses its first item in the substring-before(...). So in both case it's the same item.
A solution is to use a relative path (relative to the context node) like ../UserEmailAddress. So replace both occurrences of
substring-before(//Review/UserEmailAddress, '#')
with
substring-before(../UserEmailAddress, '#')
and it will work as desired.

xslt copy deep xml if exists if not create

<root>
<xnode>
<Node1/>
<Node2/>
<Node3>
<CNode1>
<CCNode1>
<CCField1>
<CCField2>
<CCCNode1/>
</CCNode1>
<CCNode2>
<CCCNode3/>
</Node3>
<Node4/>
</xnode>
<xnode>
<Node1/>
<Node2/>
<Node3>
<CNode1>
<CCNode2>
<CCCNode3/>
</Node3>
<Node4/>
</xnode>
<xnode>
<Node1/>
<Node2/>
<Node3>
<CNode1>
<CCNode1>
<CCField1>
<CCField2>
<CCCNode1/>
</CCNode1>
<CCNode2>
<CCCNode3/>
</Node3>
<Node4/>
</xnode>
</root>
In the above xml, I need to copy all the nodes and values except for Node3 - CNode1 - CCNode1. i.e. if CCNode1 exists copy as its including the child elements, if not, create CCNode1 with the corresponding fields and child elements. For ex, here the first and third xnode has CCNode1 whereas its missing in the second xnode. So copy the first and 3rd node1 as it is and create the CCNode1 and its child elements in the 2nd xnode with some dummy values.
Please suggest how to achieve this with XSLT.
THanks
So, the best I can tell is that you need an identity template to copy everything. And, then you need a template like the following to select Node3/CNode1 nodes that do not have a CCNode1 node. There you can add your nodes.
<xsl:template match="Node3/CNode1[not(.//CCNode1)]">
<xsl:copy>
Add your ccNode1 and child nodes here.
<!-- Output other child nodes of CNode1 -->
<xsl:apply-templates select="node()|#*"/>
</xsl:copy>
</xsl:template>
<!-- Identity. -->
<xsl:template match="node()|#*">
<xsl:copy>
<xsl:apply-templates select="node()|#*"/>
</xsl:copy>
</xsl:template>

How costly is usage of unnecessary variables in XSLT?

It's more of a clarification that I am in need ..
as per this answer on a question, XSLT variables are cheap! My question is: Is this statement valid for all the scenarios? The instant variables which get created and get destroyed withing 4 line code aren't bothersome but loading a root node or child entities, in my opinion is indeed bad practice..
I have two XSLT files, designed for same input and output requirement:
XSLT1 (without unnecessary variable):
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema" exclude-result-prefixes="xs">
<xsl:output method="xml" encoding="UTF-8" indent="yes"/>
<xsl:template match="/">
<Collection>
<xsl:for-each select="CATALOG/CD">
<DVD>
<Cover>
<xsl:value-of select="string(TITLE)"/>
</Cover>
<Author>
<xsl:value-of select="string(ARTIST)"/>
</Author>
<BelongsTo>
<xsl:value-of select="concat(concat(string(COUNTRY), ' '), string(COMPANY))"/>
</BelongsTo>
<SponsoredBy>
<xsl:value-of select="string(COMPANY)"/>
</SponsoredBy>
<Price>
<xsl:value-of select="string(number(string(PRICE)))"/>
</Price>
<Year>
<xsl:value-of select="string(floor(number(string(YEAR))))"/>
</Year>
</DVD>
</xsl:for-each>
</Collection>
</xsl:template>
</xsl:stylesheet>
XSLT2 (with unnecessary variable "root" in which whole XML is loaded):
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema" exclude-result-prefixes="xs">
<xsl:output method="xml" encoding="UTF-8" indent="yes"/>
<xsl:template match="/">
<xsl:variable name="root" select="."/>
<Collection>
<xsl:for-each select="$root/CATALOG/CD">
<DVD>
<Cover>
<xsl:value-of select="string(TITLE)"/>
</Cover>
<Author>
<xsl:value-of select="string(ARTIST)"/>
</Author>
<BelongsTo>
<xsl:value-of select="concat(concat(string(COUNTRY), ' '), string(COMPANY))"/>
</BelongsTo>
<SponsoredBy>
<xsl:value-of select="string(COMPANY)"/>
</SponsoredBy>
<Price>
<xsl:value-of select="string(number(string(PRICE)))"/>
</Price>
<Year>
<xsl:value-of select="string(floor(number(string(YEAR))))"/>
</Year>
</DVD>
</xsl:for-each>
</Collection>
</xsl:template>
</xsl:stylesheet>
Approach-2 exists in realtime and infact the XML would be several KBs to few MBs, In XSLT usage of variables is extended to child entities as well..
To put-forth my proposal to change the approach, I need to verify the theory behind it..
As per my understanding incase of approach-2, system is reloading the XML data over and over in memory (incase of usage of multiple variables to load child entities the situation turns worst) and thereby slowing down the transformation process.
Before posting this question here I tested the performance of two XSLTs using timer. First approach takes few milliseconds lesser than approach-2. (I used copy-XML files to test two XSL files to avoid complexity with system cache). But again system cache might play huge confusing role here ..
Despite of this analysis of mine I still have a question in mind! Do we really need to avoid usage of variables. And as far as my system is concerned, how worthy is it to modify the realtime XSLT files, so as to use 'approach-1'?
OR Is it like XSLT variables are different than other programming languages (Incase if I'm not aware) .. Say for example, XSLT variables don't actually store the data when you do select="." but they kind of point to the data! or something like this..? AND HENCE continue using XSLT variables without hesitation..
What is your suggestion on this?
Quick Info on current system:
Host Programming Language or System: Siebel (C++ is the backend code)
XSLT Processor: Xalan (Unless Saxon is used explicitely)
I agree with the comments made that you need to measure performance with your particular XSLT processor.
But your descriptions or expectations like "approach-2, system is reloading the XML data over and over in memory" seem wrong to me. The XSLT processor builds an input tree of the primary input XML document anyway and I can't imagine that any implementation then with <xsl:variable name="root" select="."/> does anything like loading the document completely again, it would even be wrong, as node identity and generate-id would not work. The variable will simply keep a reference to the document node of the existing input tree.
Of course in your sample where you have a single input document and a single template where the current node is the document anyway the use of the variable you have is superfluous. But there are cases where you need to store the document node of the primary input document, in particular when you deal with multiple documents.