Parse xml file in pandas

Parse xml file in pandas - pandas

I have this xml file (it's called "LogReg.xml" and it contains some information about a logistic regression (I am interested in the name of the features and their coefficient - I'll explain in more detail below):
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<PMML xmlns="http://www.dmg.org/PMML-4_4" xmlns:data="http://jpmml.org/jpmml-model/InlineTable" version="4.4">
<Header>
<Application name="JPMML-SkLearn" version="1.6.35"/>
<Timestamp>2022-02-15T09:44:54Z</Timestamp>
</Header>
<MiningBuildTask>
<Extension name="repr">PMMLPipeline(steps=[('classifier', LogisticRegression())])</Extension>
</MiningBuildTask>
<DataDictionary>
<DataField name="Target" optype="categorical" dataType="integer">
<Value value="0"/>
<Value value="1"/>
</DataField>
<DataField name="const" optype="continuous" dataType="double"/>
<DataField name="grade" optype="continuous" dataType="double"/>
<DataField name="emp_length" optype="continuous" dataType="double"/>
<DataField name="dti" optype="continuous" dataType="double"/>
<DataField name="Orig_FicoScore" optype="continuous" dataType="double"/>
<DataField name="inq_last_6mths" optype="continuous" dataType="double"/>
<DataField name="acc_open_past_24mths" optype="continuous" dataType="double"/>
<DataField name="mort_acc" optype="continuous" dataType="double"/>
<DataField name="mths_since_recent_bc" optype="continuous" dataType="double"/>
<DataField name="num_rev_tl_bal_gt_0" optype="continuous" dataType="double"/>
<DataField name="percent_bc_gt_75" optype="continuous" dataType="double"/>
</DataDictionary>
<RegressionModel functionName="classification" algorithmName="sklearn.linear_model._logistic.LogisticRegression" normalizationMethod="logit">
<MiningSchema>
<MiningField name="Target" usageType="target"/>
<MiningField name="const"/>
<MiningField name="grade"/>
<MiningField name="emp_length"/>
<MiningField name="dti"/>
<MiningField name="Orig_FicoScore"/>
<MiningField name="inq_last_6mths"/>
<MiningField name="acc_open_past_24mths"/>
<MiningField name="mort_acc"/>
<MiningField name="mths_since_recent_bc"/>
<MiningField name="num_rev_tl_bal_gt_0"/>
<MiningField name="percent_bc_gt_75"/>
</MiningSchema>
<Output>
<OutputField name="probability(0)" optype="continuous" dataType="double" feature="probability" value="0"/>
<OutputField name="probability(1)" optype="continuous" dataType="double" feature="probability" value="1"/>
</Output>
<RegressionTable intercept="0.8064694059338298" targetCategory="1">
<NumericPredictor name="const" coefficient="0.8013433785974717"/>
<NumericPredictor name="grade" coefficient="0.9010481046582982"/>
<NumericPredictor name="emp_length" coefficient="0.9460686056314133"/>
<NumericPredictor name="dti" coefficient="0.5117062988491518"/>
<NumericPredictor name="Orig_FicoScore" coefficient="0.07944303372859234"/>
<NumericPredictor name="inq_last_6mths" coefficient="0.20516234445402765"/>
<NumericPredictor name="acc_open_past_24mths" coefficient="0.4852503249658917"/>
<NumericPredictor name="mort_acc" coefficient="0.6673203078463711"/>
<NumericPredictor name="mths_since_recent_bc" coefficient="0.1962158305958366"/>
<NumericPredictor name="num_rev_tl_bal_gt_0" coefficient="0.12964661294856686"/>
<NumericPredictor name="percent_bc_gt_75" coefficient="0.04534570018290847"/>
</RegressionTable>
<RegressionTable intercept="0.0" targetCategory="0"/>
</RegressionModel>
</PMML>
I have parsed it using this code:
from lxml import objectify
path = 'LogReg.xml'
parsed = objectify.parse(open(path))
root = parsed.getroot()
data = []
if True:
for elt in root.RegressionModel.RegressionTable:
el_data = {}
for child in elt.getchildren():
el_data[child.tag] = child.text
data.append(el_data)
perf = pd.DataFrame(data)
I am interested in parsing this bit:
<RegressionTable intercept="0.8064694059338298" targetCategory="1">
<NumericPredictor name="const" coefficient="0.8013433785974717"/>
<NumericPredictor name="grade" coefficient="0.9010481046582982"/>
<NumericPredictor name="emp_length" coefficient="0.9460686056314133"/>
<NumericPredictor name="dti" coefficient="0.5117062988491518"/>
<NumericPredictor name="Orig_FicoScore" coefficient="0.07944303372859234"/>
<NumericPredictor name="inq_last_6mths" coefficient="0.20516234445402765"/>
<NumericPredictor name="acc_open_past_24mths" coefficient="0.4852503249658917"/>
<NumericPredictor name="mort_acc" coefficient="0.6673203078463711"/>
<NumericPredictor name="mths_since_recent_bc" coefficient="0.1962158305958366"/>
<NumericPredictor name="num_rev_tl_bal_gt_0" coefficient="0.12964661294856686"/>
<NumericPredictor name="percent_bc_gt_75" coefficient="0.04534570018290847"/>
</RegressionTable>
so that I can build the following dictionary:
myDict = {
"const : 0.8013433785974717,
"grade" : 0.9010481046582982,
"emp_length" : 0.9460686056314133,
"dti" : 0.5117062988491518,
"Orig_FicoScore" : 0.07944303372859234,
"inq_last_6mths" : 0.20516234445402765,
"acc_open_past_24mths" : 0.4852503249658917,
"mort_acc" : 0.6673203078463711,
"mths_since_recent_bc" : 0.1962158305958366,
"num_rev_tl_bal_gt_0" : 0.12964661294856686,
"percent_bc_gt_75" : 0.04534570018290847
}
Basically, in the dictionary the Key is the name of the feature and the value is the coefficient of the logistic regression.
Please can anyone help me with the code?

I'm not sure you need pandas for this, but you do need to handle the namespaces in your xml.
Try something along these lines:
myDict = {}
#register the namespace
ns = {'xx': 'http://www.dmg.org/PMML-4_4'}
#you could collapse the next two into one line, but I believe it's clearer this way
rt = root.xpath('//xx:RegressionTable[.//xx:NumericPredictor]',namespaces=ns)[0]
nps = rt.xpath('./xx:NumericPredictor',namespaces=ns)
for np in nps:
myDict[np.attrib['name']]=np.attrib['coefficient']
myDict
The output should be your expected output.

Related

Loop Through Collection of XML Records in SQL

I have a dataset that looks like this:
<?xml version="1.0" encoding="utf-8"?>
<process xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<return>
<approved>
<callbackTable>
<tableRow max="100" value="10" selectedRow="true" maxRow="112.0">
<tableCell term="72" selectedCell="false" maxCell="73">
<number>21.7</number>
</tableCell>
<tableCell term="74" selectedCell="true" maxCell="75">
<number>21.7</number>
</tableCell>
</tableRow>
<tableRow max="200" value="15" selectedRow="false" maxRow="113.0">
<tableCell term="76" selectedCell="false" maxCell="77">
<number>14.5</number>
</tableCell>
<tableCell term="78" selectedCell="false" maxCell="79">
<number>22.5</number>
</tableCell>
</tableRow>
<tableRow max="300" value="20" selectedRow="false" maxRow="114.0">
<tableCell term="80" selectedCell="false" maxCell="81">
<number>14.6</number>
</tableCell>
<tableCell term="82" selectedCell="false" maxCell="83">
<number>15.7</number>
</tableCell>
</tableRow>
</callbackTable>
</approved>
</return>
</process>
I need to be able to loop through the callbackTable entries and add them to a table named Options.
Here is what I need the data to ultimately look like in the Options table.
Id
Max
Value
SelectedRow
MaxRow
Term
SelectedCell
MaxCell
Number
1
100
10
true
112.0
72
false
73
21.7
2
100
10
true
112.0
74
true
75
21.7
3
200
15
false
113.0
76
false
77
14.5
4
200
15
false
113.0
78
false
79
22.5
5
300
20
false
114.0
80
false
81
14.6
6
300
20
false
114.0
82
false
83
15.7
(Note that the Id column is an identity key and does not need to be populated)
The tricky part is that I don't know exactly how many rows or how many cells are in the callbackTable collection so I will need to loop through the results and insert based on the number of items in the collection.
I could really use some help as I'm not entirely sure where to start.
Thanks in advance!

If you can change the encoding in the XML processing instruction to utf-16 or omit it, try the set-based query below. Note the Id column of the target table is omitted from the column list so that SQL Server will assign the IDENTITY value.
DECLARE #xml xml =
<process xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<return>
<approved>
<callbackTable>
<tableRow max="100" value="10" selectedRow="true" maxRow="112.0">
<tableCell term="72" selectedCell="false" maxCell="73">
<number>21.7</number>
</tableCell>
<tableCell term="74" selectedCell="true" maxCell="75">
<number>21.7</number>
</tableCell>
</tableRow>
<tableRow max="200" value="15" selectedRow="false" maxRow="113.0">
<tableCell term="76" selectedCell="false" maxCell="77">
<number>14.5</number>
</tableCell>
<tableCell term="78" selectedCell="false" maxCell="79">
<number>22.5</number>
</tableCell>
</tableRow>
<tableRow max="300" value="20" selectedRow="false" maxRow="114.0">
<tableCell term="80" selectedCell="false" maxCell="81">
<number>14.6</number>
</tableCell>
<tableCell term="82" selectedCell="false" maxCell="83">
<number>15.7</number>
</tableCell>
</tableRow>
</callbackTable>
</approved>
</return>
</process>';
INSERT INTO dbo.TargetTable([Max],[Value],[SelectedRow],[MaxRow],[Term],[SelectedCell],[MaxCell],[Number])
SELECT
tableRow.value('data(./#max)', 'varchar(10)')
,tableRow.value('data(./#value)', 'int')
,tableRow.value('data(./#selectedRow)', 'varchar(10)')
,tableRow.value('data(./#maxRow)', 'decimal(10,1)')
,tableCell.value('data(./#term)', 'int')
,tableCell.value('data(./#selectedCell)', 'varchar(10)')
,tableCell.value('data(./#maxCell)', 'int')
,tableCell.value('./number[1]', 'decimal(10,1)')
FROM #xml.nodes('//tableRow') AS tableRow(tableRow)
CROSS APPLY tableRow.nodes('//tableCell') AS tableCell(tableCell);

How to create unique id for users during extract from oracle xml

I have xmldata in my oracle DB, there are different applicants for a particular appID in my oracle DB. Note the appID is a field in the oracle table while the applicants are in the xmldata(I have multiple applicants in this xml) I would like to create a unique id for the applicants.
In the sample data, there are 3 applicants. how do I create unique ids in my select statement.
WITH t( xml ) AS
(
SELECT XMLType('<loanApplication xmlns="http://www.abcdef.com/Schema/FCX/1"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<applicantGroup>
<applicantGroupTypeDd>0</applicantGroupTypeDd>
<applicant>
<asset>
<assetDescription>neweg</assetDescription>
<assetTypeDd>1</assetTypeDd>
<assetValue>1500.0</assetValue>
</asset>
<asset>
<assetDescription>RayM</assetDescription>
<assetTypeDd>6</assetTypeDd>
<assetValue>60000</assetValue>
</asset>
<asset>
<assetDescription>TDC</assetDescription>
<assetTypeDd>8</assetTypeDd>
<assetValue>100</assetValue>
</asset>
<asset>
<assetDescription>2007 Hyundai</assetDescription>
<assetTypeDd>4</assetTypeDd>
<assetValue>2500</assetValue>
</asset>
</applicant>
</applicantGroup>
<applicantGroup>
<applicantGroupTypeDd>1</applicantGroupTypeDd>
<applicant>
<asset>
<assetDescription>neweg</assetDescription>
<assetTypeDd>2</assetTypeDd>
<assetValue>15000.0</assetValue>
</asset>
<asset>
<assetDescription>Bay</assetDescription>
<assetTypeDd>6</assetTypeDd>
<assetValue>60000</assetValue>
</asset>
<asset>
<assetDescription>TDC</assetDescription>
<assetTypeDd>9</assetTypeDd>
<assetValue>100</assetValue>
</asset>
<asset>
<assetDescription>2007 car</assetDescription>
<assetTypeDd>3</assetTypeDd>
<assetValue>2500</assetValue>
</asset>
</applicant>
</applicantGroup>
<applicantGroup>
<applicantGroupTypeDd>3</applicantGroupTypeDd>
<applicant>
<asset>
<assetDescription>neweg</assetDescription>
<assetTypeDd>6</assetTypeDd>
<assetValue>100.0</assetValue>
</asset>
<asset>
<assetDescription>RayM</assetDescription>
<assetTypeDd>8</assetTypeDd>
<assetValue>60000</assetValue>
</asset>
<asset>
<assetDescription>TDC</assetDescription>
<assetTypeDd>7</assetTypeDd>
<assetValue>100</assetValue>
</asset>
<asset>
<assetDescription>2007 Hyundai</assetDescription>
<assetTypeDd>5</assetTypeDd>
<assetValue>2500</assetValue>
</asset>
</applicant>
</applicantGroup>
</loanApplication>')
FROM dual
)
SELECT JSON_OBJECT (
KEY 'Assets' value y.Assets
) assets
FROM t,
XMLTABLE(XMLNAMESPACES(DEFAULT 'http://www.abcdef.com/Schema/FCX/1'), '/loanApplication/applicantGroup/applicant/asset'
PASSING xml
COLUMNS
Assets INT PATH 'assetValue') y
Results, I need
AppId
applicantId
assetTypeDd
1
1
[1,6,8,4]
1
2
[1,2,6,9,3]
1
3
[3,6,8,7,5]
Thanks

Consider XPath's ancestor axis and count pf preceding-sibling since it appears one applicant node falls under each applicantGroup:
WITH t( xml_data ) AS
(
SELECT XMLType('<loanApplication xmlns="http://www.abcdef.com/Schema/FCX/1"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<applicantGroup>
<applicantGroupTypeDd>0</applicantGroupTypeDd>
<applicant>
<asset>
<assetDescription>neweg</assetDescription>
<assetTypeDd>1</assetTypeDd>
<assetValue>1500.0</assetValue>
</asset>
<asset>
<assetDescription>RayM</assetDescription>
<assetTypeDd>6</assetTypeDd>
<assetValue>60000</assetValue>
</asset>
<asset>
<assetDescription>TDC</assetDescription>
<assetTypeDd>8</assetTypeDd>
<assetValue>100</assetValue>
</asset>
<asset>
<assetDescription>2007 Hyundai</assetDescription>
<assetTypeDd>4</assetTypeDd>
<assetValue>2500</assetValue>
</asset>
</applicant>
</applicantGroup>
<applicantGroup>
<applicantGroupTypeDd>1</applicantGroupTypeDd>
<applicant>
<asset>
<assetDescription>neweg</assetDescription>
<assetTypeDd>2</assetTypeDd>
<assetValue>15000.0</assetValue>
</asset>
<asset>
<assetDescription>Bay</assetDescription>
<assetTypeDd>6</assetTypeDd>
<assetValue>60000</assetValue>
</asset>
<asset>
<assetDescription>TDC</assetDescription>
<assetTypeDd>9</assetTypeDd>
<assetValue>100</assetValue>
</asset>
<asset>
<assetDescription>2007 car</assetDescription>
<assetTypeDd>3</assetTypeDd>
<assetValue>2500</assetValue>
</asset>
</applicant>
</applicantGroup>
<applicantGroup>
<applicantGroupTypeDd>3</applicantGroupTypeDd>
<applicant>
<asset>
<assetDescription>neweg</assetDescription>
<assetTypeDd>6</assetTypeDd>
<assetValue>100.0</assetValue>
</asset>
<asset>
<assetDescription>RayM</assetDescription>
<assetTypeDd>8</assetTypeDd>
<assetValue>60000</assetValue>
</asset>
<asset>
<assetDescription>TDC</assetDescription>
<assetTypeDd>7</assetTypeDd>
<assetValue>100</assetValue>
</asset>
<asset>
<assetDescription>2007 Hyundai</assetDescription>
<assetTypeDd>5</assetTypeDd>
<assetValue>2500</assetValue>
</asset>
</applicant>
</applicantGroup>
</loanApplication>')
FROM dual
)
SELECT y.ApplicantId AS "applicantId",
LISTAGG(y.AssetTypeDd, ',') AS "assetTypeDd",
LISTAGG(y.Assets, ',') AS "assets"
FROM t,
XMLTABLE(
XMLNAMESPACES('http://www.abcdef.com/Schema/FCX/1' AS "d",
DEFAULT 'http://www.abcdef.com/Schema/FCX/1'),
'//d:asset'
PASSING xml_data
COLUMNS
ApplicantId INT PATH 'count(ancestor::applicantGroup/preceding-sibling::*)+1',
AssetTypeDd INT PATH 'assetTypeDd',
Assets INT PATH 'assetValue'
) y
GROUP BY y.ApplicantId
ORDER BY y.ApplicantId
Online Demo

Convert Tables into XML using T-SQL

If found several questions about how to convert a table (or query) into XML, but none that showed how to start with one main table and join several one:many satellite tables, and from that generate XML that represents the hierarchical structure of the data. So I thought I'd share this solution now that I've figured it out. If someone else has another way of doing this, please post another answer.
Given this contrived data:
create table #recipe (id int, name varchar(10))
create table #ingredient (recipe_id int, name varchar(30), quantity varchar(20), sort int)
create table #instruction (recipe_id int, task varchar(32), sort int)
insert into #recipe values (1, 'pizza'), (2, 'omelet')
insert into #ingredient values (1, 'pizza dough', '1 package', 1),
(1, 'tomato sauce', '1 can', 2),
(1, 'favorite toppings', 'you choose', 3),
(2, 'eggs', 'three', 1),
(2, 'a bunch of other ingredients', 'you choose', 2)
insert into #instruction values (1, 'pre-bake pizza dough', 1),
(1, 'add tomato sauce', 2),
(1, 'add toppings', 3),
(1, 'bake a little longer', 4),
(2, 'break eggs into mixing bowl', 1),
(2, 'beat yolks and whites together', 2),
(2, 'pour into large sauce pan', 3),
(2, 'add other ingredients', 4),
(2, 'fold in half', 5),
(2, 'cook until done', 6)
.
Which looks like this in tabular form:
#recipe
id name
----------- ----------
1 pizza
2 omelet
.
#ingredient
recipe_id name quantity sort
----------- ------------------------------ -------------------- -----------
1 pizza dough 1 package 1
1 tomato sauce 1 can 2
1 favorite toppings you choose 3
2 eggs three 1
2 a bunch of other ingredients you choose 2
.
#instruction
recipe_id task sort
----------- -------------------------------- -----------
1 pre-bake pizza dough 1
1 add tomato sauce 2
1 add toppings 3
1 bake a little longer 4
2 break eggs into mixing bowl 1
2 beat yolks and whites together 2
2 pour into large sauce pan 3
2 add other ingredients 4
2 fold in half 5
2 cook until done 6
.
I want to create an XML document that has one record for each recipe, and within each recipe element, I want a group of ingredients and another group of instructions, like this:
<recipes>
<recipe id="2" name="omelet">
<ingredients>
<ingredient name="eggs" quantity="three" />
<ingredient name="a bunch of other ingredients" quantity="you choose" />
</ingredients>
<instructions>
<instruction task="break eggs into mixing bowl" />
<instruction task="beat yolks and whites together" />
<instruction task="pour into large sauce pan" />
<instruction task="add other ingredients" />
<instruction task="fold in half" />
<instruction task="cook until done" />
</instructions>
</recipe>
<recipe id="1" name="pizza">
<ingredients>
<ingredient name="pizza dough" quantity="1 package" />
<ingredient name="tomato sauce" quantity="1 can" />
<ingredient name="favorite toppings" quantity="you choose" />
</ingredients>
<instructions>
<instruction task="pre-bake pizza dough" />
<instruction task="add tomato sauce" />
<instruction task="add toppings" />
<instruction task="bake a little longer" />
</instructions>
</recipe>
</recipes>

This SQL creates the desired XML verbatim:
select recipe.*,
(
select ingredient.name, ingredient.quantity
from #ingredient ingredient
where recipe.id = ingredient.recipe_id
order by ingredient.sort
for xml auto, root('ingredients'), type
),
(
select instruction.task
from #instruction instruction
where recipe.id = instruction.recipe_id
order by instruction.sort
for xml auto, root('instructions'), type
)
from #recipe as recipe
order by recipe.name
for xml auto, root('recipes'), type
I aliased the temp table names because using for xml auto on temp tables creates poorly named XML elements. This is how it looks:
<recipes>
<recipe id="2" name="omelet">
<ingredients>
<ingredient name="eggs" quantity="three" />
<ingredient name="a bunch of other ingredients" quantity="you choose" />
</ingredients>
<instructions>
<instruction task="break eggs into mixing bowl" />
<instruction task="beat yolks and whites together" />
<instruction task="pour into large sauce pan" />
<instruction task="add other ingredients" />
<instruction task="fold in half" />
<instruction task="cook until done" />
</instructions>
</recipe>
<recipe id="1" name="pizza">
<ingredients>
<ingredient name="pizza dough" quantity="1 package" />
<ingredient name="tomato sauce" quantity="1 can" />
<ingredient name="favorite toppings" quantity="you choose" />
</ingredients>
<instructions>
<instruction task="pre-bake pizza dough" />
<instruction task="add tomato sauce" />
<instruction task="add toppings" />
<instruction task="bake a little longer" />
</instructions>
</recipe>
</recipes>
.
This SQL creates another version of the XML with all data as values instead of attributes, but in the same basic hierarchical structure:
select recipe.*,
(
select ingredient.name, ingredient.quantity
from #ingredient ingredient
where recipe.id = ingredient.recipe_id
order by ingredient.sort
for xml path('ingredient'), root('ingredients'), type
),
(
select instruction.task
from #instruction instruction
where recipe.id = instruction.recipe_id
order by instruction.sort
for xml path('instruction'), root('instructions'), type
)
from #recipe as recipe
order by recipe.name
for xml path('recipe'), root('recipes'), type
.
This is how it looks:
<recipes>
<recipe>
<id>2</id>
<name>omelet</name>
<ingredients>
<ingredient>
<name>eggs</name>
<quantity>three</quantity>
</ingredient>
<ingredient>
<name>a bunch of other ingredients</name>
<quantity>you choose</quantity>
</ingredient>
</ingredients>
<instructions>
<instruction>
<task>break eggs into mixing bowl</task>
</instruction>
<instruction>
<task>beat yolks and whites together</task>
</instruction>
<instruction>
<task>pour into large sauce pan</task>
</instruction>
<instruction>
<task>add other ingredients</task>
</instruction>
<instruction>
<task>fold in half</task>
</instruction>
<instruction>
<task>cook until done</task>
</instruction>
</instructions>
</recipe>
<recipe>
<id>1</id>
<name>pizza</name>
<ingredients>
<ingredient>
<name>pizza dough</name>
<quantity>1 package</quantity>
</ingredient>
<ingredient>
<name>tomato sauce</name>
<quantity>1 can</quantity>
</ingredient>
<ingredient>
<name>favorite toppings</name>
<quantity>you choose</quantity>
</ingredient>
</ingredients>
<instructions>
<instruction>
<task>pre-bake pizza dough</task>
</instruction>
<instruction>
<task>add tomato sauce</task>
</instruction>
<instruction>
<task>add toppings</task>
</instruction>
<instruction>
<task>bake a little longer</task>
</instruction>
</instructions>
</recipe>
</recipes>
Originally I tried placing the ingredients and instructions in the main query's from clause with an inner join to the recipe table. But the instructions were all nested within the ingredients, which were nested within the recipe. When I moved them up to the select part of the query it straightened out the XML.

xml file data imported to sql with script

im having this kind of xml:
<?xml version="1.0"?>
-<recordedData>
<machine>ZSK40-2</machine>
<date>2013/09/21</date>
<hour>05:32</hour>-<CollectedData>-<variable>
<Name>PRODUCT</Name>
<Value>FILLER 580</Value>
</variable>-<variable>
<Name>LOT_NUMBER</Name>
<Value>CG 00063 0</Value>
</variable>-<variable>
<Name>SHIFT_SUPERVISOR</Name>
<Value> covaliu l</Value>
</variable>-<variable>
<Name>KGH_ALL_SET</Name>
<Value>0</Value>
</variable>-<variable>
<Name>KGH_ALL_REAL</Name>
<Value>0</Value>
</variable>-<variable>
<Name>KGH_F1_SET</Name>
<Value>0</Value>
</variable>-<variable>
<Name>KGH_F1_REAL</Name>
<Value>0</Value>
</variable>-<variable>
<Name>K_F1</Name>
<Value>43</Value>
</variable>-<variable>
<Name>SCREW_RPM_SET</Name>
<Value>550</Value>
</variable>-<variable>
<Name>SCREW_RPM_REAL</Name>
<Value>550.085388183594</Value>
</variable>-<variable>
<Name>TORQUE</Name>
<Value>1.21340000629425</Value>
</variable>-<variable>
<Name>CURRENT</Name>
<Value>60.1959991455078</Value>
</variable>-<variable>
<Name>KW_KG</Name>
<Value>0</Value>
</variable>-<variable>
<Name>KW</Name>
<Value>-0.990000009536743</Value>
</variable>-<variable>
<Name>MELT_PRESSURE</Name>
<Value>0</Value>
</variable>-<variable>
<Name>MELT_TEMPERATURE</Name>
<Value>214</Value>
</variable>-<variable>
<Name>PV1</Name>
<Value>216</Value>
</variable>-<variable>
<Name>SP1</Name>
<Value>210</Value>
</variable>-<variable>
<Name>PV2</Name>
<Value>239</Value>
</variable>-<variable>
<Name>SP2</Name>
<Value>220</Value>
</variable>-<variable>
<Name>PV3</Name>
<Value>220</Value>
</variable>-<variable>
<Name>SP3</Name>
<Value>220</Value>
</variable>-<variable>
<Name>PV4</Name>
<Value>220</Value>
</variable>-<variable>
<Name>SP4</Name>
<Value>220</Value>
</variable>-<variable>
<Name>PV5</Name>
<Value>209</Value>
</variable>-<variable>
<Name>SP5</Name>
<Value>210</Value>
</variable>-<variable>
<Name>PV6</Name>
<Value>210</Value>
</variable>-<variable>
<Name>SP6</Name>
<Value>210</Value>
</variable>-<variable>
<Name>PV7</Name>
<Value>210</Value>
</variable>-<variable>
<Name>SP7</Name>
<Value>210</Value>
</variable>-<variable>
<Name>PV8</Name>
<Value>210</Value>
</variable>-<variable>
<Name>SP8</Name>
<Value>210</Value>
</variable>-<variable>
<Name>PV9</Name>
<Value>210</Value>
</variable>-<variable>
<Name>SP9</Name>
<Value>210</Value>
</variable>-<variable>
<Name>PV10</Name>
<Value>210</Value>
</variable>-<variable>
<Name>SP10</Name>
<Value>210</Value>
</variable>-<variable>
<Name>PV11</Name>
<Value>220</Value>
</variable>-<variable>
<Name>SP11</Name>
<Value>220</Value>
</variable>
</CollectedData>
</recordedData>
Can anyone provide a sample sql script for extracting all the data from it please.
i would really apreciate this since im new to xml.
Thanks in advance.

If you have your data in a table already, you can use something like this:
DECLARE #Tmp TABLE (ID INT NOT NULL, XmlContent XML)
INSERT INTO #TMP VALUES(1, '......(your entire XML here).......)
SELECT
ID,
MACHINE = XmlContent.value('(/recordedData/machine)[1]', 'varchar(50)'),
RecordingDate = XmlContent.value('(/recordedData/date)[1]', 'varchar(50)'),
RecordingTime = XmlContent.value('(/recordedData/hour)[1]', 'varchar(50)'),
VariableName = XVar.value('(Name)[1]', 'varchar(50)'),
VariableValue = XVar.value('(Value)[1]', 'varchar(50)')
FROM
#Tmp
CROSS APPLY
XmlContent.nodes('/recordedData/CollectedData/variable') AS XTbl(XVar)
This gives you an output something like:
.... and so on - listing all the variables with their name and value.

Using VB.NET Regular Expressions to Remove Excel XML Conversion

I have the following lines showing up in files that have been converted to XML from an Excel worksheet:
<Worksheet ss:Name="Sheet1">
<Names>
<NamedRange ss:Name="Print_Area" ss:RefersTo="=Sheet1!R30C1:R8642C15"/>
</Names>
<Table ss:ExpandedColumnCount="14" ss:ExpandedRowCount="8655" x:FullColumns="1"
x:FullRows="1" ss:StyleID="s16">
<Column ss:Index="2" ss:StyleID="s16" ss:AutoFitWidth="0" ss:Width="41.25"/>
<Column ss:StyleID="s16" ss:AutoFitWidth="0" ss:Width="36"/>
<Column ss:StyleID="s16" ss:AutoFitWidth="0" ss:Width="35.25"/>
<Column ss:StyleID="s16" ss:AutoFitWidth="0" ss:Width="38.25" ss:Span="1"/>
<Column ss:Index="8" ss:StyleID="s16" ss:AutoFitWidth="0" ss:Width="31.5"/>
<Column ss:Index="11" ss:StyleID="s16" ss:AutoFitWidth="0" ss:Width="30"/>
<Column ss:StyleID="s16" ss:AutoFitWidth="0" ss:Width="33.75"/>
<Column ss:StyleID="s16" ss:AutoFitWidth="0" ss:Width="28.5"/>
<Row ss:StyleID="s18">
<Cell ss:StyleID="s17"><Data ss:Type="String">UNITED STATES</Data></Cell>
<Cell ss:StyleID="s17"/>
<Cell ss:StyleID="s17"/>
<Cell ss:StyleID="s17"/>
<Cell ss:StyleID="s17"/>
<Cell ss:StyleID="s17"/>
<Cell ss:StyleID="s17"/>
</Row>
I am attempting to only remove the <Column .. /> lines. I "thought" I had a pretty good handle on Regular Expressions in VB.NET, but I cannot seem to match these lines. I have tried the following match strings:
'Using (RegexOptions.Multiline)
Private Const Column_MatchExpression As String = "^[\s]*<Column[\s\S]+$"
Private Const Column_MatchExpression As String = " <Column[\s\S]+$"
Private Const Column_MatchExpression As String = "^ <Column[\s\S]+$"
Private Const Column_MatchExpression As String = "^[\s]+<Column[\s\w\W]+$"
Any thoughts on the matter would be appreciated.

What about
"^\s*<Column.*/>\s*$"
?

\<Column[^>]*\>
Should work

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Parse xml file in pandas - pandas

Related

Loop Through Collection of XML Records in SQL

How to create unique id for users during extract from oracle xml

Convert Tables into XML using T-SQL

xml file data imported to sql with script

Using VB.NET Regular Expressions to Remove Excel XML Conversion

Categories

Resources