Parsing nested XML into SQL table - sql

What would be the right way to parse the following XML block into SQL Server table according to desired layout (below)? Is it possible to do it with a single SELECT statement, without UNION or a loop? Any takers? Thanks in advance.
Input XML:
<ObjectData>
<Parameter1>some value</Parameter1>
<Parameter2>other value</Parameter2>
<Dates>
<dateTime>2011-02-01T00:00:00</dateTime>
<dateTime>2011-03-01T00:00:00</dateTime>
<dateTime>2011-04-01T00:00:00</dateTime>
</Dates>
<Values>
<double>0.019974</double>
<double>0.005395</double>
<double>0.004854</double>
</Values>
<Description>
<string>this is row 1</string>
<string>this is row 2</string>
<string>this is row 3</string>
</Values>
</ObjectData>
Desired table output:
Parameter1 Parameter2 Dates Values Description
Some value Other value 2011-02-01 00:00:00.0 0.019974 this is row 1
Some value Other value 2011-03-01 00:00:00.0 0.005395 this is row 2
Some value Other value 2011-04-01 00:00:00.0 0.004854 this is row 3
I am after an SELECT SQL statement using OPENXML or xml.nodes() functionality. For example, the following SELECT statement results in production between Values and Dates (that is all permutations of Values and Dates), which is something I want to avoid.
SELECT
doc.col.value('Parameter1[1]', 'varchar(20)') Parameter1,
doc.col.value('Parameter2[1]', 'varchar(20)') Parameter2,
doc1.col.value('.', 'datetime') Dates ,
doc2.col.value('.', 'float') [Values]
FROM
#xml.nodes('/ObjectData') doc(col),
#xml.nodes('/ObjectData/Dates/dateTime') doc1(col),
#xml.nodes('/ObjectData/Values/double') doc2(col);

You can make use of a numbers table to pick the first, second, third etc row from the child elements. In this query I have limited the rows returned to the number if dates provided. If there are more values or descriptions than dates you have to modify the join to take that into account.
declare #XML xml = '
<ObjectData>
<Parameter1>some value</Parameter1>
<Parameter2>other value</Parameter2>
<Dates>
<dateTime>2011-02-01T00:00:00</dateTime>
<dateTime>2011-03-01T00:00:00</dateTime>
<dateTime>2011-04-01T00:00:00</dateTime>
</Dates>
<Values>
<double>0.019974</double>
<double>0.005395</double>
<double>0.004854</double>
</Values>
<Description>
<string>this is row 1</string>
<string>this is row 2</string>
<string>this is row 3</string>
</Description>
</ObjectData>'
;with Numbers as
(
select number
from master..spt_values
where type = 'P'
)
select T.N.value('Parameter1[1]', 'varchar(50)') as Parameter1,
T.N.value('Parameter2[1]', 'varchar(50)') as Parameter2,
T.N.value('(Dates/dateTime[position()=sql:column("N.Number")])[1]', 'datetime') as Dates,
T.N.value('(Values/double[position()=sql:column("N.Number")])[1]', 'float') as [Values],
T.N.value('(Description/string[position()=sql:column("N.Number")])[1]', 'varchar(max)') as [Description]
from #XML.nodes('/ObjectData') as T(N)
cross join Numbers as N
where N.number between 1 and (T.N.value('count(Dates/dateTime)', 'int'))

Use the OPENXML function. It is a rowset provider (it returns the set of rows parsed from the XML) and thus can be utilized in SELECT or INSERT like:
INSERT INTO table SELECT * FROM OPENXML(source, rowpattern, flags)
Please see the first example in the documentation link for clarity.

Typically, if you wanted to parse XML, you'd do it a programming language like Perl, Python, Java or C# that a) has an XML DOM, and b) can communicate with a relational database.
Here's a short article that shows you some of the basics of reading and writing XML in C# ... and even has an example of how to create an XML document from a SQL query (in one line!):
http://www.c-sharpcorner.com/uploadfile/mahesh/readwritexmltutmellli2111282005041517am/readwritexmltutmellli21.aspx

Related

Can't parse XML with outer apply

I have an XML column in a table which i am trying to parse out values from to flat table structure.
I am trying to input the XML here but stackoverflow ses it as code and when i try and format as code it still won't accept it.
I can't even get data from "Header" level.
<RequestMessage xmlns="http://iec.ch/TC57/2011/schema/message" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="Message.xsd">
<Header>
<Verb>created</Verb>
<Noun>MeterReadings</Noun>
<Timestamp>2021-03-08T00:57:18+01:00</Timestamp>
<Source>Ipsum Lorum</Source>
<AsyncReplyFlag>true</AsyncReplyFlag>
<AckRequired>true</AckRequired>
<MessageID>Ipsum Lorum</MessageID>
<CorrelationID />
</Header>
<Payload>
<MeterReadings xmlns:MeterReadings="http://iec.ch/TC57/2011/MeterReadings#" xmlns="http://iec.ch/TC57/2011/MeterReadings#">
<MeterReading>
<IntervalBlocks>
<IntervalReadings>
<timeStamp>2021-03-07T01:00:00+01:00</timeStamp>
<value>480.196</value>
<ReadingQualities>
<ReadingQualityType ref="3.0.0" />
</ReadingQualities>
</IntervalReadings>
<IntervalReadings>
<ReadingType ref="11.0.7.3.1.2.12.1.1.0.0.0.0.101.0.3.72.0" />
</IntervalReadings>
</IntervalBlocks>
<Meter>
<mRID>0000000000000</mRID>
<status>
<remark>Ipsum Lorum</remark>
<value>ESP</value>
</status>
</Meter>
<UsagePoint>
<mRID>73599900000000</mRID>
</UsagePoint>
</MeterReading>
</MeterReadings>
</Payload>
</RequestMessage>
I am not able to parse it and i have tried using examples from other threads. I am trying to not use OPENXML solution because requires DECLARE and executing the built in procedure for clearing the XML from memmory periodically. I am trying to use the OUTER APPLY solution.
Like Shugos solution in How to parse XML data in SQL server table or Query XML with nested nodes on Cross Apply.
It doesn't work.
It returns null for the timestamp column.
select
t.file_created_time
,c.value('(Timestamp)[1]','varchar(max)') as timestamp
from load.t t
OUTER APPLY t.xml_data.nodes('RequestMessage/Header') as m(c)
Please try the following solution.
Starting from SQL Server 2005 onwards, it is better to use XQuery language, based on the w3c standards, while dealing with the XML data type.
Microsoft proprietary OPENXML and its companions sp_xml_preparedocument and sp_xml_removedocument are kept just for backward compatibility with the obsolete SQL Server 2000. Their use is diminished just to very few fringe cases.
I had to comment out the following tag <!--<IntervalReadings>--> to make your XML well-formed.
XML Header fragment has a default namespace:
xmlns="http://iec.ch/TC57/2011/schema/message"
XML Payload fragment has its own two additional namespaces:
xmlns:MeterReadings="http://iec.ch/TC57/2011/MeterReadings#"
xmlns="http://iec.ch/TC57/2011/MeterReadings#"
Namespaces should be taken into account.
Check it out below.
SQL
DECLARE #tbl TABLE (ID INT IDENTITY PRIMARY KEY, xml_data XML);
INSERT INTO #tbl (xml_data) VALUES
(N'<RequestMessage xmlns="http://iec.ch/TC57/2011/schema/message"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="Message.xsd">
<Header>
<Verb>created</Verb>
<Noun>MeterReadings</Noun>
<Timestamp>2021-03-08T00:57:18+01:00</Timestamp>
<Source>Ipsum Lorum</Source>
<AsyncReplyFlag>true</AsyncReplyFlag>
<AckRequired>true</AckRequired>
<MessageID>Ipsum Lorum</MessageID>
<CorrelationID/>
</Header>
<Payload>
<MeterReadings xmlns:MeterReadings="http://iec.ch/TC57/2011/MeterReadings#"
xmlns="http://iec.ch/TC57/2011/MeterReadings#">
<MeterReading>
<IntervalBlocks>
<IntervalReadings>
<timeStamp>2021-03-07T01:00:00+01:00</timeStamp>
<value>480.196</value>
<ReadingQualities>
<ReadingQualityType ref="3.0.0"/>
</ReadingQualities>
</IntervalReadings>
<!--<IntervalReadings>-->
<ReadingType ref="11.0.7.3.1.2.12.1.1.0.0.0.0.101.0.3.72.0"/>
</IntervalBlocks>
<Meter>
<mRID>0000000000000</mRID>
<status>
<remark>Ipsum Lorum</remark>
<value>ESP</value>
</status>
</Meter>
<UsagePoint>
<mRID>73599900000000</mRID>
</UsagePoint>
</MeterReading>
</MeterReadings>
</Payload>
</RequestMessage>');
-- DDL and sample data population, end
WITH XMLNAMESPACES(DEFAULT 'http://iec.ch/TC57/2011/schema/message')
SELECT id
, c.value('(Noun/text())[1]','VARCHAR(30)') AS Noun
, c.value('(Timestamp/text())[1]','DATETIMEOFFSET(0)') AS [timestamp]
FROM #tbl
CROSS APPLY xml_data.nodes('/RequestMessage/Header') AS t(c);
Output
+----+---------------+----------------------------+
| id | Noun | timestamp |
+----+---------------+----------------------------+
| 1 | MeterReadings | 2021-03-08 00:57:18 +01:00 |
+----+---------------+----------------------------+
You need to respect and include the XML namespace in your XML document in your XQuery!
<RequestMessage xmlns="http://iec.ch/TC57/2011/schema/message"
**********************************************
Try something like this:
WITH XMLNAMESPACES(DEFAULT N'http://iec.ch/TC57/2011/schema/message')
SELECT
t.id,
c.value('(Timestamp)[1]','varchar(max)') as timestamp
FROM
load.t t
CROSS APPLY
t.xml_data.nodes('RequestMessage/Header') AS m(c)
Also when trying to run this on my SQL Server, I get an error that the XML as shown is malformed.....
UPDATE:
If you need to also access bits in the Payload section - you need to also respect that XML namespace there:
<MeterReadings xmlns:MeterReadings="http://iec.ch/TC57/2011/MeterReadings#"
xmlns="http://iec.ch/TC57/2011/MeterReadings#">
***********************************************
Try this:
WITH XMLNAMESPACES(N'http://iec.ch/TC57/2011/schema/message' as hdr,
N'http://iec.ch/TC57/2011/MeterReadings#' as mr)
SELECT
t.id,
c.value('(hdr:Timestamp)[1]', 'varchar(50)') AS timestamp,
col.value('(mr:MeterReading/mr:IntervalBlocks/mr:IntervalReadings/mr:timeStamp)[1]', 'varchar(50)') AS MeterReadingsTimestamp
FROM
load.t t
CROSS APPLY
t.xml_data.nodes('/hdr:RequestMessage/hdr:Header') AS m(c)
CROSS APPLY
t.xml_data.nodes('/hdr:RequestMessage/hdr:Payload/mr:MeterReadings') AS mr(col)

Convert XML data into a table SQL, Different tag in one script

I have one XML data which I want to bring them into a one table
the XML data is like this:
<return>
<start>
<name>Sara</name>
<familyname>Moradi</familyname>
<age>22</age>
</start>
<start>
<name>Sam</name>
<familyname>Mic</familyname>
<age>32</age>
</start>
<errorCode>0</errorCode>
<resultStatus/>
<extra>22255</extra>
</return>
and I wanna create a table like this:
name
familyname
age
errorCode
extra
Sara
Moradi
22
0
22255
Sam
Mic
32
0
22255
I check the previous ones but they didn't help me.
Try below:
declare #data xml = convert(xml, '<return>
<start>
<name>Sara</name>
<familyname>Moradi</familyname>
<age>22</age>
</start>
<start>
<name>Sam</name>
<familyname>Mic</familyname>
<age>32</age>
</start>
<errorCode>0</errorCode>
<resultStatus/>
<extra>22255</extra>
</return>')
SELECT X.Y.value('(name)[1]', 'VARCHAR(20)') as name,
X.Y.value('(familyname)[1]', 'VARCHAR(20)') as familyname,
X.Y.value('(age)[1]', 'int') as age,
A.B.value('(errorCode)[1]','int') as errorCode,
A.B.value('(extra)[1]','int') as extra
FROM #data.nodes('return') as A(B)
cross apply A.B.nodes('start') as X(Y)
For quick solution, I have stored xml data into variable, you need to replace it with your original column from table.

T-SQL/XML Query to get for each row in the query a subfield as well as the whole lement

I have an XML string like this:
<Root>
<Elem>
<RecTime>2016-08-17 12:30PM</RecTime>
<Otherfield>blah blah</Otherfield>
.. other fields ..
</Elem>
<Elem>
<RecTime>2016-08-17 15:30PM</RecTime>
<Otherfield>more blah</Otherfield>
.. other fields ..
</Elem>
</Root>
Obviously this describes a list of elements. I want to extract the record time as well as the entire element for every element in the XML - this is because I want to insert in a table the record time as well as the entire element: the table could be declared as
DECLARE Table myTable(SampleTime datetime, Data xml)
I tried the query
declare #xml xml
set #xml='<Root>
<Elem>
<RecordTime>2016-08-17 12:30:00PM</RecordTime>
<Otherfield>2</Otherfield>
<field2/>
</Elem>
<Elem>
<RecordTime>2016-08-17 15:30:00PM</RecordTime>
<Otherfield>3</Otherfield>
<field2>hello there</field2>
</Elem>
</Root>'
--INSERT INTO myTable
SELECT SampleTime = T.Item.value('RecordTime[1]', 'dateTime'),
Data = #xml.query('//Root/Elem[1]')
FROM #xml.nodes('//Root/Elem') T(item)
but it gives me rows that contain the proper time for each row but only the first element in the list for the 'whole element' part of the query:
I circled in red the proof that I select the wrong element
How should I shape the query to get in response the sample time for each element as well as the corresponding element?
Thanks for the help!
The issue is in Data = #xml.query('//Root/Elem[1]'). To achieve what you want use something like this.
--INSERT INTO myTable
SELECT SampleTime = T.Item.value('RecordTime[1]', 'dateTime'),
Data = T.item.query('.')
FROM #xml.nodes('Root/Elem') T(item) --No // before Root. Thanks to Shnugo
This is result.
2016-08-17 12:30:00.000 <Elem><RecordTime>2016-08-17 12:30:00PM</RecordTime><Otherfield>2</Otherfield><field2 /></Elem>
2016-08-17 15:30:00.000 <Elem><RecordTime>2016-08-17 15:30:00PM</RecordTime><Otherfield>3</Otherfield><field2>hello there</field2></Elem>

Parsing Dynamic XML to SQL Server tables with Parent and child relation

I have a XML in Source Table. I need to parse this XML to 3 different tables which has Parent Child relationship. I can do this in C# but currently for this i need to implement it at SQL server side.
The sample xml looks like:
<ROWSET>
<ROW>
<HEADER_ID>5001507</HEADER_ID>
<ORDER_NUMBER>42678548</ORDER_NUMBER>
<CUST_PO_NUMBER>LSWQWE1</CUST_PO_NUMBER>
<CUSTOMER_NUMBER>38087</CUSTOMER_NUMBER>
<CUSTOMER_NAME>UNIVERSE SELLER</CUSTOMER_NAME>
<LINE>
<LINE_ROW>
<HEADER_ID>5001507</HEADER_ID>
<LINE_ID>12532839</LINE_ID>
<LINE_NUMBER>1</LINE_NUMBER>
<ITEM_NUMBER>STAGEPAS 600I-CA</ITEM_NUMBER>
<ORDER_QUANTITY>5</ORDER_QUANTITY>
</LINE_ROW>
<LINE_ROW>
<HEADER_ID>5001507</HEADER_ID>
<LINE_ID>12532901</LINE_ID>
<LINE_NUMBER>3</LINE_NUMBER>
<ITEM_NUMBER>CD-C600 RK</ITEM_NUMBER>
<ORDER_QUANTITY>6</ORDER_QUANTITY>
</LINE_ROW>
<LINE_ROW>
<HEADER_ID>5001507</HEADER_ID>
<LINE_ID>12532902</LINE_ID>
<LINE_NUMBER>4</LINE_NUMBER>
<ITEM_NUMBER>CD-S300 RK</ITEM_NUMBER>
<ORDER_QUANTITY>8</ORDER_QUANTITY>
</LINE_ROW>
</LINE>
<PRCADJ>
<PRCADJ_ROW>
<PRICE_ADJUSTMENT_ID>43095064</PRICE_ADJUSTMENT_ID>
<HEADER_ID>5001507</HEADER_ID>
<LINE_ID>12532839</LINE_ID>
<ADJUSTED_AMOUNT>-126</ADJUSTED_AMOUNT>
</PRCADJ_ROW>
<PRCADJ_ROW>
<PRICE_ADJUSTMENT_ID>43095068</PRICE_ADJUSTMENT_ID>
<HEADER_ID>5001507</HEADER_ID>
<LINE_ID>12532840</LINE_ID>
<ADJUSTED_AMOUNT>-96.6</ADJUSTED_AMOUNT>
</PRCADJ_ROW>
</PRCADJ>
</ROW>
</ROWSET>
The issue is the Parent can have multiple child and each child can multiple sub child. How can i write query to transfer this into Sql Server 2005
You need to use three CROSS APPLY operators to break up the "list of XML elements" into separate pseudo tables of XML rows, so you can access their properties - something like this:
SELECT
HeaderID = XCRow.value('(HEADER_ID)[1]', 'int'),
OrderNumber = XCRow.value('(ORDER_NUMBER)[1]', 'int'),
LineHeaderID = XCLine.value('(HEADER_ID)[1]', 'int'),
LineID = XCLine.value('(LINE_ID)[1]', 'int'),
LineNumber = XCLine.value('(LINE_NUMBER)[1]', 'int'),
PriceAdjustmentID = XCPrc.value('(PRICE_ADJUSTMENT_ID)[1]', 'int'),
AdjustedAmount = XCPrc.value('(ADJUSTED_AMOUNT)[1]', 'decimal(20,4)')
FROM
dbo.YourTableNameHere
CROSS APPLY
Data.nodes('/ROWSET/ROW') AS XTRow(XCRow)
CROSS APPLY
XCRow.nodes('LINE/LINE_ROW') AS XTLine(XCLine)
CROSS APPLY
XCRow.nodes('PRCADJ/PRCADJ_ROW') AS XTPrc(XCPrc)
With this, the first CROSS APPLY will handle all the elements that are found directly under <ROWSET> / <ROW> (the header information), the second one will enumerate all instances of <LINE> / <LINE_ROW> below that header element, and the third CROSS APPLY handles the <PRCADJ> / <PRCADJ_ROW> elements, also below the header.
You might need to tweak the outputs a bit - and I only picked two or three of the possible values - extend and adapt to your own needs! But this should show you the basic mechanism - the .nodes() method returns a "pseudo table" of XML fragments, one for each match of the XPath expression you define.
you can do some thing like this. using cross apply you will get node elements and then extract the value using value clause. you need to specify the column type i.e int or varchar etc.
The result can then be inserted using insert into select query.
insert into Table1 values ( header_id, order_number, cust_po_number)
select R.value('(HEADER_ID)[1]', 'int') As header_id,
R.value('(ORDER_NUMBER)[1]', 'int') as order_number,
R.value('(CUST_PO_NUMBER)[1]', 'varchar(256)') as cust_po_number
from table
cross apply XMLdata.nodes('/ROWSET/ROW') AS P(R)
insert into Table2 values ( header_id, line_id, line_number)
select R.value('(HEADER_ID)[1]', 'int') As header_id,
R.value('(LINE_ID)[1]', 'int') as line_id,
R.value('(LINE_NUMBER)[1]', 'int') as line_number
from table
cross apply XMLdata.nodes('/ROWSET/ROW/LINE/LINE_ROW') AS P(R)

Does nodes() or openxml returns rows in same order as it finds in xml?

I have an xml which i need to parse using openxml or nodes(). The xml contains few child tags that repeat with different values, as below.
<root>
<value>10</value>
<value>12</value>
<value>11</value>
<value>1</value>
<value>15</value>
<root>
For my code it is very important that i get all these rows returned in same order as in xml. I googled and gogled but nothing tells me if the #mp:id is always returned in same order as in xml. Or if nodes() return values in same order as it encounters them.
All I want to know if I can trust any of those two methods and be happy with proper order of rows.
P.S. excuse any errors or mistakes in above text, I dont enjoy typing codes in an android window either.
You can use row_number on the shredded XML like this.
declare #XML xml=
'<root>
<value>10</value>
<value>12</value>
<value>11</value>
<value>1</value>
<value>15</value>
</root>'
select value
from
(
select T.N.value('.', 'int') as value,
row_number() over(order by T.N) as rn
from #xml.nodes('/root/value') as T(N)
) as T
order by T.rn
Uniquely Identifying XML Nodes with DENSE_RANK
Update:
You can also use a numbers table like this;
declare #XML xml=
'<root>
<value>10</value>
<value>12</value>
<value>11</value>
<value>1</value>
<value>15</value>
</root>';
with N(Number) as
(
select Number
from master..spt_values
where type = 'P'
)
select #XML.value('(/root/value[sql:column("N.Number")])[1]', 'int')
from N
where N.Number between 1 and #XML.value('count(/root/value)', 'int')
order by N.Number
XPath allows you to select nodes explicitly by ordinal: '/root[1]/value[1]' is the first element, '/root[1]/value[2]' is the second etc. Also could use '(/root/value)[1]' and '(/root/value[2])'. This way you can select exactly the element you want, and selecting element 1 then element 2 then element 3 etc will give you controlled order. Slow, but controlled.
Updated P.S. Wouldn't this be nice to be true?
declare #x xml = '<root>
<value>10</value>
<value>12</value>
<value>11</value>
<value>1</value>
<value>15</value>
<root>';
select x.value(N'position()', N'int') as position,
x.value(N'.', 'int') as value
from #x.nodes(N'//root/value') t(x)
Unfortunately, is not...
Msg 2371, Level 16, State 1, Line 9
XQuery [value()]: 'position()' can only be used within a predicate or XPath selector
And the existence of this error makes me worry that order may be broken sometimes...