Loop Through Collection of XML Records in SQL

Loop Through Collection of XML Records in SQL - sql

I have a dataset that looks like this:
<?xml version="1.0" encoding="utf-8"?>
<process xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<return>
<approved>
<callbackTable>
<tableRow max="100" value="10" selectedRow="true" maxRow="112.0">
<tableCell term="72" selectedCell="false" maxCell="73">
<number>21.7</number>
</tableCell>
<tableCell term="74" selectedCell="true" maxCell="75">
<number>21.7</number>
</tableCell>
</tableRow>
<tableRow max="200" value="15" selectedRow="false" maxRow="113.0">
<tableCell term="76" selectedCell="false" maxCell="77">
<number>14.5</number>
</tableCell>
<tableCell term="78" selectedCell="false" maxCell="79">
<number>22.5</number>
</tableCell>
</tableRow>
<tableRow max="300" value="20" selectedRow="false" maxRow="114.0">
<tableCell term="80" selectedCell="false" maxCell="81">
<number>14.6</number>
</tableCell>
<tableCell term="82" selectedCell="false" maxCell="83">
<number>15.7</number>
</tableCell>
</tableRow>
</callbackTable>
</approved>
</return>
</process>
I need to be able to loop through the callbackTable entries and add them to a table named Options.
Here is what I need the data to ultimately look like in the Options table.
Id
Max
Value
SelectedRow
MaxRow
Term
SelectedCell
MaxCell
Number
1
100
10
true
112.0
72
false
73
21.7
2
100
10
true
112.0
74
true
75
21.7
3
200
15
false
113.0
76
false
77
14.5
4
200
15
false
113.0
78
false
79
22.5
5
300
20
false
114.0
80
false
81
14.6
6
300
20
false
114.0
82
false
83
15.7
(Note that the Id column is an identity key and does not need to be populated)
The tricky part is that I don't know exactly how many rows or how many cells are in the callbackTable collection so I will need to loop through the results and insert based on the number of items in the collection.
I could really use some help as I'm not entirely sure where to start.
Thanks in advance!

If you can change the encoding in the XML processing instruction to utf-16 or omit it, try the set-based query below. Note the Id column of the target table is omitted from the column list so that SQL Server will assign the IDENTITY value.
DECLARE #xml xml =
<process xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<return>
<approved>
<callbackTable>
<tableRow max="100" value="10" selectedRow="true" maxRow="112.0">
<tableCell term="72" selectedCell="false" maxCell="73">
<number>21.7</number>
</tableCell>
<tableCell term="74" selectedCell="true" maxCell="75">
<number>21.7</number>
</tableCell>
</tableRow>
<tableRow max="200" value="15" selectedRow="false" maxRow="113.0">
<tableCell term="76" selectedCell="false" maxCell="77">
<number>14.5</number>
</tableCell>
<tableCell term="78" selectedCell="false" maxCell="79">
<number>22.5</number>
</tableCell>
</tableRow>
<tableRow max="300" value="20" selectedRow="false" maxRow="114.0">
<tableCell term="80" selectedCell="false" maxCell="81">
<number>14.6</number>
</tableCell>
<tableCell term="82" selectedCell="false" maxCell="83">
<number>15.7</number>
</tableCell>
</tableRow>
</callbackTable>
</approved>
</return>
</process>';
INSERT INTO dbo.TargetTable([Max],[Value],[SelectedRow],[MaxRow],[Term],[SelectedCell],[MaxCell],[Number])
SELECT
tableRow.value('data(./#max)', 'varchar(10)')
,tableRow.value('data(./#value)', 'int')
,tableRow.value('data(./#selectedRow)', 'varchar(10)')
,tableRow.value('data(./#maxRow)', 'decimal(10,1)')
,tableCell.value('data(./#term)', 'int')
,tableCell.value('data(./#selectedCell)', 'varchar(10)')
,tableCell.value('data(./#maxCell)', 'int')
,tableCell.value('./number[1]', 'decimal(10,1)')
FROM #xml.nodes('//tableRow') AS tableRow(tableRow)
CROSS APPLY tableRow.nodes('//tableCell') AS tableCell(tableCell);

Related

Parse xml file in pandas

I have this xml file (it's called "LogReg.xml" and it contains some information about a logistic regression (I am interested in the name of the features and their coefficient - I'll explain in more detail below):
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<PMML xmlns="http://www.dmg.org/PMML-4_4" xmlns:data="http://jpmml.org/jpmml-model/InlineTable" version="4.4">
<Header>
<Application name="JPMML-SkLearn" version="1.6.35"/>
<Timestamp>2022-02-15T09:44:54Z</Timestamp>
</Header>
<MiningBuildTask>
<Extension name="repr">PMMLPipeline(steps=[('classifier', LogisticRegression())])</Extension>
</MiningBuildTask>
<DataDictionary>
<DataField name="Target" optype="categorical" dataType="integer">
<Value value="0"/>
<Value value="1"/>
</DataField>
<DataField name="const" optype="continuous" dataType="double"/>
<DataField name="grade" optype="continuous" dataType="double"/>
<DataField name="emp_length" optype="continuous" dataType="double"/>
<DataField name="dti" optype="continuous" dataType="double"/>
<DataField name="Orig_FicoScore" optype="continuous" dataType="double"/>
<DataField name="inq_last_6mths" optype="continuous" dataType="double"/>
<DataField name="acc_open_past_24mths" optype="continuous" dataType="double"/>
<DataField name="mort_acc" optype="continuous" dataType="double"/>
<DataField name="mths_since_recent_bc" optype="continuous" dataType="double"/>
<DataField name="num_rev_tl_bal_gt_0" optype="continuous" dataType="double"/>
<DataField name="percent_bc_gt_75" optype="continuous" dataType="double"/>
</DataDictionary>
<RegressionModel functionName="classification" algorithmName="sklearn.linear_model._logistic.LogisticRegression" normalizationMethod="logit">
<MiningSchema>
<MiningField name="Target" usageType="target"/>
<MiningField name="const"/>
<MiningField name="grade"/>
<MiningField name="emp_length"/>
<MiningField name="dti"/>
<MiningField name="Orig_FicoScore"/>
<MiningField name="inq_last_6mths"/>
<MiningField name="acc_open_past_24mths"/>
<MiningField name="mort_acc"/>
<MiningField name="mths_since_recent_bc"/>
<MiningField name="num_rev_tl_bal_gt_0"/>
<MiningField name="percent_bc_gt_75"/>
</MiningSchema>
<Output>
<OutputField name="probability(0)" optype="continuous" dataType="double" feature="probability" value="0"/>
<OutputField name="probability(1)" optype="continuous" dataType="double" feature="probability" value="1"/>
</Output>
<RegressionTable intercept="0.8064694059338298" targetCategory="1">
<NumericPredictor name="const" coefficient="0.8013433785974717"/>
<NumericPredictor name="grade" coefficient="0.9010481046582982"/>
<NumericPredictor name="emp_length" coefficient="0.9460686056314133"/>
<NumericPredictor name="dti" coefficient="0.5117062988491518"/>
<NumericPredictor name="Orig_FicoScore" coefficient="0.07944303372859234"/>
<NumericPredictor name="inq_last_6mths" coefficient="0.20516234445402765"/>
<NumericPredictor name="acc_open_past_24mths" coefficient="0.4852503249658917"/>
<NumericPredictor name="mort_acc" coefficient="0.6673203078463711"/>
<NumericPredictor name="mths_since_recent_bc" coefficient="0.1962158305958366"/>
<NumericPredictor name="num_rev_tl_bal_gt_0" coefficient="0.12964661294856686"/>
<NumericPredictor name="percent_bc_gt_75" coefficient="0.04534570018290847"/>
</RegressionTable>
<RegressionTable intercept="0.0" targetCategory="0"/>
</RegressionModel>
</PMML>
I have parsed it using this code:
from lxml import objectify
path = 'LogReg.xml'
parsed = objectify.parse(open(path))
root = parsed.getroot()
data = []
if True:
for elt in root.RegressionModel.RegressionTable:
el_data = {}
for child in elt.getchildren():
el_data[child.tag] = child.text
data.append(el_data)
perf = pd.DataFrame(data)
I am interested in parsing this bit:
<RegressionTable intercept="0.8064694059338298" targetCategory="1">
<NumericPredictor name="const" coefficient="0.8013433785974717"/>
<NumericPredictor name="grade" coefficient="0.9010481046582982"/>
<NumericPredictor name="emp_length" coefficient="0.9460686056314133"/>
<NumericPredictor name="dti" coefficient="0.5117062988491518"/>
<NumericPredictor name="Orig_FicoScore" coefficient="0.07944303372859234"/>
<NumericPredictor name="inq_last_6mths" coefficient="0.20516234445402765"/>
<NumericPredictor name="acc_open_past_24mths" coefficient="0.4852503249658917"/>
<NumericPredictor name="mort_acc" coefficient="0.6673203078463711"/>
<NumericPredictor name="mths_since_recent_bc" coefficient="0.1962158305958366"/>
<NumericPredictor name="num_rev_tl_bal_gt_0" coefficient="0.12964661294856686"/>
<NumericPredictor name="percent_bc_gt_75" coefficient="0.04534570018290847"/>
</RegressionTable>
so that I can build the following dictionary:
myDict = {
"const : 0.8013433785974717,
"grade" : 0.9010481046582982,
"emp_length" : 0.9460686056314133,
"dti" : 0.5117062988491518,
"Orig_FicoScore" : 0.07944303372859234,
"inq_last_6mths" : 0.20516234445402765,
"acc_open_past_24mths" : 0.4852503249658917,
"mort_acc" : 0.6673203078463711,
"mths_since_recent_bc" : 0.1962158305958366,
"num_rev_tl_bal_gt_0" : 0.12964661294856686,
"percent_bc_gt_75" : 0.04534570018290847
}
Basically, in the dictionary the Key is the name of the feature and the value is the coefficient of the logistic regression.
Please can anyone help me with the code?

I'm not sure you need pandas for this, but you do need to handle the namespaces in your xml.
Try something along these lines:
myDict = {}
#register the namespace
ns = {'xx': 'http://www.dmg.org/PMML-4_4'}
#you could collapse the next two into one line, but I believe it's clearer this way
rt = root.xpath('//xx:RegressionTable[.//xx:NumericPredictor]',namespaces=ns)[0]
nps = rt.xpath('./xx:NumericPredictor',namespaces=ns)
for np in nps:
myDict[np.attrib['name']]=np.attrib['coefficient']
myDict
The output should be your expected output.

Using 2 CROSS APPLY in XML

I'm using below XML to fetch the ClauseCode but it's not returning any data for the below query.
<Policy>
<Plans>
<Plan>
<ProductEndorsementClauses>
<ClauseCode>3</ClauseCode>
</ProductEndorsementClauses>
</Plan>
<Plan>
<ProductEndorsementClauses>
<ClauseCode>1</ClauseCode>
<ClauseCode>2</ClauseCode>
</ProductEndorsementClauses>
</Plan>
<Plan>
<ProductEndorsementClauses>
<ClauseCode>5</ClauseCode>
<ClauseCode>4</ClauseCode>
</ProductEndorsementClauses>
</Plan>
</Plans>
</Policy>
Here is my query :
select proposaid,
,Col1.value('(/*/ProductEndorsementClauses/ClauseCode)[1]','nvarchar(max)')
from Policy p
CROSS APPLY data.nodes('/*/Plans/Plan') AS Tbl(Col)
CROSS APPLY Tbl.Col.nodes('/ProductEndorsementClauses/ClauseCode') AS TblPec(Col1)
where Col1.value('(ProductEndorsementClauses/ClauseCode)[1]', 'nvarchar(max)') in ('1','3')

For inspiration:
declare #x xml = N'
<Policy>
<Plans>
<Plan>
<ProductEndorsementClauses>
<ClauseCode>3</ClauseCode>
</ProductEndorsementClauses>
</Plan>
<Plan>
<ProductEndorsementClauses>
<ClauseCode>1</ClauseCode>
<ClauseCode>2</ClauseCode>
</ProductEndorsementClauses>
</Plan>
<Plan>
<ProductEndorsementClauses>
<ClauseCode>5</ClauseCode>
<ClauseCode>4</ClauseCode>
</ProductEndorsementClauses>
</Plan>
</Plans>
</Policy>'
select #x
--all clause code values
select t.col.value('.[1]', 'nvarchar(10)')
from #x.nodes('Policy/Plans/Plan/ProductEndorsementClauses/ClauseCode') as t(col);
--only clause code values that have at least a 1, 3 clause code in their parent ProductEndorsementClauses
select t.col.value('.[1]', 'nvarchar(10)')
from #x.nodes('Policy/Plans/Plan/ProductEndorsementClauses[(./ClauseCode/text() = "1") or (./ClauseCode/text() = "3")]/ClauseCode') as t(col);
--result only when exists at least a 1 or 3
select t.col.value('.[1]', 'nvarchar(max)')
from #x.nodes('Policy/Plans/Plan/ProductEndorsementClauses/ClauseCode') as t(col)
where #x.exist('Policy/Plans/Plan/ProductEndorsementClauses/ClauseCode[text() = "1" or text() = "3"]') = 1;
declare #Policy table
(
proposaid int,
data xml
);
insert into #Policy(proposaid, data)
values(1, N'
<Policy>
<Plans>
<Plan>
<ProductEndorsementClauses>
<ClauseCode>3</ClauseCode>
</ProductEndorsementClauses>
</Plan>
<Plan>
<ProductEndorsementClauses>
<ClauseCode>1</ClauseCode>
<ClauseCode>2</ClauseCode>
</ProductEndorsementClauses>
</Plan>
<Plan>
<ProductEndorsementClauses>
<ClauseCode>5</ClauseCode>
<ClauseCode>4</ClauseCode>
</ProductEndorsementClauses>
</Plan>
</Plans>
</Policy>'),
(2, N'
<Policy>
<Plans>
<Plan>
<ProductEndorsementClauses>
<ClauseCode>13</ClauseCode>
</ProductEndorsementClauses>
</Plan>
<Plan>
<ProductEndorsementClauses>
<ClauseCode>11</ClauseCode>
<ClauseCode>12</ClauseCode>
</ProductEndorsementClauses>
</Plan>
<Plan>
<ProductEndorsementClauses>
<ClauseCode>15</ClauseCode>
<ClauseCode>14</ClauseCode>
</ProductEndorsementClauses>
</Plan>
</Plans>
</Policy>');
select *
from #Policy;
select p.proposaid,
t.col.value('.[1]', 'nvarchar(max)')
from #Policy as p
cross apply p.data.nodes('Policy/Plans/Plan/ProductEndorsementClauses/ClauseCode') as t(col)
where p.data.exist('Policy/Plans/Plan/ProductEndorsementClauses/ClauseCode[text() = "1" or text() = "3"]') = 1;

Convert Tables into XML using T-SQL

If found several questions about how to convert a table (or query) into XML, but none that showed how to start with one main table and join several one:many satellite tables, and from that generate XML that represents the hierarchical structure of the data. So I thought I'd share this solution now that I've figured it out. If someone else has another way of doing this, please post another answer.
Given this contrived data:
create table #recipe (id int, name varchar(10))
create table #ingredient (recipe_id int, name varchar(30), quantity varchar(20), sort int)
create table #instruction (recipe_id int, task varchar(32), sort int)
insert into #recipe values (1, 'pizza'), (2, 'omelet')
insert into #ingredient values (1, 'pizza dough', '1 package', 1),
(1, 'tomato sauce', '1 can', 2),
(1, 'favorite toppings', 'you choose', 3),
(2, 'eggs', 'three', 1),
(2, 'a bunch of other ingredients', 'you choose', 2)
insert into #instruction values (1, 'pre-bake pizza dough', 1),
(1, 'add tomato sauce', 2),
(1, 'add toppings', 3),
(1, 'bake a little longer', 4),
(2, 'break eggs into mixing bowl', 1),
(2, 'beat yolks and whites together', 2),
(2, 'pour into large sauce pan', 3),
(2, 'add other ingredients', 4),
(2, 'fold in half', 5),
(2, 'cook until done', 6)
.
Which looks like this in tabular form:
#recipe
id name
----------- ----------
1 pizza
2 omelet
.
#ingredient
recipe_id name quantity sort
----------- ------------------------------ -------------------- -----------
1 pizza dough 1 package 1
1 tomato sauce 1 can 2
1 favorite toppings you choose 3
2 eggs three 1
2 a bunch of other ingredients you choose 2
.
#instruction
recipe_id task sort
----------- -------------------------------- -----------
1 pre-bake pizza dough 1
1 add tomato sauce 2
1 add toppings 3
1 bake a little longer 4
2 break eggs into mixing bowl 1
2 beat yolks and whites together 2
2 pour into large sauce pan 3
2 add other ingredients 4
2 fold in half 5
2 cook until done 6
.
I want to create an XML document that has one record for each recipe, and within each recipe element, I want a group of ingredients and another group of instructions, like this:
<recipes>
<recipe id="2" name="omelet">
<ingredients>
<ingredient name="eggs" quantity="three" />
<ingredient name="a bunch of other ingredients" quantity="you choose" />
</ingredients>
<instructions>
<instruction task="break eggs into mixing bowl" />
<instruction task="beat yolks and whites together" />
<instruction task="pour into large sauce pan" />
<instruction task="add other ingredients" />
<instruction task="fold in half" />
<instruction task="cook until done" />
</instructions>
</recipe>
<recipe id="1" name="pizza">
<ingredients>
<ingredient name="pizza dough" quantity="1 package" />
<ingredient name="tomato sauce" quantity="1 can" />
<ingredient name="favorite toppings" quantity="you choose" />
</ingredients>
<instructions>
<instruction task="pre-bake pizza dough" />
<instruction task="add tomato sauce" />
<instruction task="add toppings" />
<instruction task="bake a little longer" />
</instructions>
</recipe>
</recipes>

This SQL creates the desired XML verbatim:
select recipe.*,
(
select ingredient.name, ingredient.quantity
from #ingredient ingredient
where recipe.id = ingredient.recipe_id
order by ingredient.sort
for xml auto, root('ingredients'), type
),
(
select instruction.task
from #instruction instruction
where recipe.id = instruction.recipe_id
order by instruction.sort
for xml auto, root('instructions'), type
)
from #recipe as recipe
order by recipe.name
for xml auto, root('recipes'), type
I aliased the temp table names because using for xml auto on temp tables creates poorly named XML elements. This is how it looks:
<recipes>
<recipe id="2" name="omelet">
<ingredients>
<ingredient name="eggs" quantity="three" />
<ingredient name="a bunch of other ingredients" quantity="you choose" />
</ingredients>
<instructions>
<instruction task="break eggs into mixing bowl" />
<instruction task="beat yolks and whites together" />
<instruction task="pour into large sauce pan" />
<instruction task="add other ingredients" />
<instruction task="fold in half" />
<instruction task="cook until done" />
</instructions>
</recipe>
<recipe id="1" name="pizza">
<ingredients>
<ingredient name="pizza dough" quantity="1 package" />
<ingredient name="tomato sauce" quantity="1 can" />
<ingredient name="favorite toppings" quantity="you choose" />
</ingredients>
<instructions>
<instruction task="pre-bake pizza dough" />
<instruction task="add tomato sauce" />
<instruction task="add toppings" />
<instruction task="bake a little longer" />
</instructions>
</recipe>
</recipes>
.
This SQL creates another version of the XML with all data as values instead of attributes, but in the same basic hierarchical structure:
select recipe.*,
(
select ingredient.name, ingredient.quantity
from #ingredient ingredient
where recipe.id = ingredient.recipe_id
order by ingredient.sort
for xml path('ingredient'), root('ingredients'), type
),
(
select instruction.task
from #instruction instruction
where recipe.id = instruction.recipe_id
order by instruction.sort
for xml path('instruction'), root('instructions'), type
)
from #recipe as recipe
order by recipe.name
for xml path('recipe'), root('recipes'), type
.
This is how it looks:
<recipes>
<recipe>
<id>2</id>
<name>omelet</name>
<ingredients>
<ingredient>
<name>eggs</name>
<quantity>three</quantity>
</ingredient>
<ingredient>
<name>a bunch of other ingredients</name>
<quantity>you choose</quantity>
</ingredient>
</ingredients>
<instructions>
<instruction>
<task>break eggs into mixing bowl</task>
</instruction>
<instruction>
<task>beat yolks and whites together</task>
</instruction>
<instruction>
<task>pour into large sauce pan</task>
</instruction>
<instruction>
<task>add other ingredients</task>
</instruction>
<instruction>
<task>fold in half</task>
</instruction>
<instruction>
<task>cook until done</task>
</instruction>
</instructions>
</recipe>
<recipe>
<id>1</id>
<name>pizza</name>
<ingredients>
<ingredient>
<name>pizza dough</name>
<quantity>1 package</quantity>
</ingredient>
<ingredient>
<name>tomato sauce</name>
<quantity>1 can</quantity>
</ingredient>
<ingredient>
<name>favorite toppings</name>
<quantity>you choose</quantity>
</ingredient>
</ingredients>
<instructions>
<instruction>
<task>pre-bake pizza dough</task>
</instruction>
<instruction>
<task>add tomato sauce</task>
</instruction>
<instruction>
<task>add toppings</task>
</instruction>
<instruction>
<task>bake a little longer</task>
</instruction>
</instructions>
</recipe>
</recipes>
Originally I tried placing the ingredients and instructions in the main query's from clause with an inner join to the recipe table. But the instructions were all nested within the ingredients, which were nested within the recipe. When I moved them up to the select part of the query it straightened out the XML.

XSLT2.0 grouping based on three fields

<passengergroup>
<passengerList>
<passDetails>
<route>LONDON</route>
<lastname>RAY</lastname>
</passDetails>
<seatDetails>
<SeatNo>1A</SeatNo>
</seatDetails>
<customervalue>AB</customervalue>
</passengerList
<passengerList>
<passDetails>
<route>LONDON</route>
<lastname>RAY</lastname>
</passDetails>
<seatDetails>
<SeatNo>1B</SeatNo>
</seatDetails>
<customervalue>good</customervalue>
</passengerList
<passengerList>
<passDetails>
<route>DELHI</route>
<lastname>RAY</lastname>
</passDetails>
<seatDetails>
<SeatNo>2C</SeatNo>
</seatDetails>
<customervalue>BC</customervalue>
</passengerList>
<passengerList>
<passDetails>
<route>DELHI</route>
<lastname>RAY</lastname>
</passDetails>
<seatDetails>
<SeatNo>2D</SeatNo>
</seatDetails>
<customervalue>okey</customervalue>
</passengerList>
</passengergroup>
<xsl:for-each select="passengergroup/passengerList">
<xsl:if test="customervalue='good'
<xsl:value-of select="route"/><xsl:text> </xsl:text>
<xsl:value-of select="customervalue"/><xsl:text> </xsl:text>
<xsl:value-of select="seatDetails/SeatNo"/>
</for-each>
<xsl:for-each select="passengergroup/passengerList">
<xsl:if test="customervalue='ok'
<xsl:value-of select="route"/><xsl:text> </xsl:text>
<xsl:value-of select="customervalue"/><xsl:text> </xsl:text>
<xsl:value-of select="seatDetails/SeatNo"/>
</for-each>
Output
It will produce output like this
LONDON good 1A
LONDON good 1B
DELHI okey 2C
DELHI okey 2D
But i need the output like this
LONDON good 1A 1B
DELHI okey 2C 2D
If 'LONDON good 'is repeating many times,it has to be printed only once.but we got to repeat the seat no like' 1A 1B 1C 1D 1F 2G and so on'.i AM using xslt2.0 AND MY OUTPUT TYPE IS text. Thing is no need to display the items many times
I tried lot ..not able to figure out the solutions please help me out.

IMO your input xml doesn't correspond to desired output (e.g. there is only one LONDON with customervalue = good. But may be I don't understand well what you needs. But following xslt could make a job.
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:fn="http://www.w3.org/2005/xpath-functions">
<xsl:output method="text" />
<xsl:template match="/">
<xsl:apply-templates select="passengergroup" />
</xsl:template>
<xsl:template match="passengergroup">
<xsl:for-each-group select="passengerList" group-by="concat(passDetails/route, ' ', customervalue)">
<xsl:value-of select="current-grouping-key()" />
<xsl:text> </xsl:text>
<xsl:value-of select="current-group()/seatDetails/SeatNo" separator=" " />
<xsl:value-of select="'
'" />
</xsl:for-each-group>
</xsl:template>
</xsl:stylesheet>
For input
<?xml version="1.0" encoding="UTF-8"?>
<passengergroup>
<passengerList>
<passDetails>
<route>LONDON</route>
<lastname>RAY</lastname>
</passDetails>
<seatDetails>
<SeatNo>1A</SeatNo>
</seatDetails>
<customervalue>good</customervalue>
</passengerList>
<passengerList>
<passDetails>
<route>LONDON</route>
<lastname>RAY</lastname>
</passDetails>
<seatDetails>
<SeatNo>1B</SeatNo>
</seatDetails>
<customervalue>good</customervalue>
</passengerList>
<passengerList>
<passDetails>
<route>DELHI</route>
<lastname>RAY</lastname>
</passDetails>
<seatDetails>
<SeatNo>2C</SeatNo>
</seatDetails>
<customervalue>BC</customervalue>
</passengerList>
<passengerList>
<passDetails>
<route>DELHI</route>
<lastname>RAY</lastname>
</passDetails>
<seatDetails>
<SeatNo>2D</SeatNo>
</seatDetails>
<customervalue>okey</customervalue>
</passengerList>
</passengergroup>
it produces following output
LONDON good 1A 1B
DELHI BC 2C
DELHI okey 2D

Using VB.NET Regular Expressions to Remove Excel XML Conversion

I have the following lines showing up in files that have been converted to XML from an Excel worksheet:
<Worksheet ss:Name="Sheet1">
<Names>
<NamedRange ss:Name="Print_Area" ss:RefersTo="=Sheet1!R30C1:R8642C15"/>
</Names>
<Table ss:ExpandedColumnCount="14" ss:ExpandedRowCount="8655" x:FullColumns="1"
x:FullRows="1" ss:StyleID="s16">
<Column ss:Index="2" ss:StyleID="s16" ss:AutoFitWidth="0" ss:Width="41.25"/>
<Column ss:StyleID="s16" ss:AutoFitWidth="0" ss:Width="36"/>
<Column ss:StyleID="s16" ss:AutoFitWidth="0" ss:Width="35.25"/>
<Column ss:StyleID="s16" ss:AutoFitWidth="0" ss:Width="38.25" ss:Span="1"/>
<Column ss:Index="8" ss:StyleID="s16" ss:AutoFitWidth="0" ss:Width="31.5"/>
<Column ss:Index="11" ss:StyleID="s16" ss:AutoFitWidth="0" ss:Width="30"/>
<Column ss:StyleID="s16" ss:AutoFitWidth="0" ss:Width="33.75"/>
<Column ss:StyleID="s16" ss:AutoFitWidth="0" ss:Width="28.5"/>
<Row ss:StyleID="s18">
<Cell ss:StyleID="s17"><Data ss:Type="String">UNITED STATES</Data></Cell>
<Cell ss:StyleID="s17"/>
<Cell ss:StyleID="s17"/>
<Cell ss:StyleID="s17"/>
<Cell ss:StyleID="s17"/>
<Cell ss:StyleID="s17"/>
<Cell ss:StyleID="s17"/>
</Row>
I am attempting to only remove the <Column .. /> lines. I "thought" I had a pretty good handle on Regular Expressions in VB.NET, but I cannot seem to match these lines. I have tried the following match strings:
'Using (RegexOptions.Multiline)
Private Const Column_MatchExpression As String = "^[\s]*<Column[\s\S]+$"
Private Const Column_MatchExpression As String = " <Column[\s\S]+$"
Private Const Column_MatchExpression As String = "^ <Column[\s\S]+$"
Private Const Column_MatchExpression As String = "^[\s]+<Column[\s\w\W]+$"
Any thoughts on the matter would be appreciated.

What about
"^\s*<Column.*/>\s*$"
?

\<Column[^>]*\>
Should work

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Loop Through Collection of XML Records in SQL - sql

Related

Parse xml file in pandas

Using 2 CROSS APPLY in XML

Convert Tables into XML using T-SQL

XSLT2.0 grouping based on three fields

Using VB.NET Regular Expressions to Remove Excel XML Conversion

Categories

Resources