How to iterate through every potential XML subelement using SQL Server - sql

I have a large XML file of over 45K contacts and I need to iterate through their subelement transactions into a SQL table. I've looked at several solutions to this, using value(), node(), etc..., but no examples appear to have an XML structure close to mine:
<Contacts>
<Contact>
<ContactID>1234</ContactID>
<ContactName>’John Doe’</ContactName>
<DOB>09031978</DOB>
<Address>’123 Main Street’</Address>
<Transactions>
<Transaction>
<TransactionID>4490</TransactionID>
<ProductName>’Recliner’</ProductName>
<Cost>123.00</Cost>
<PurchaseDate>07042020
</Transaction>
<Transaction>
<TransactionID>5678</TransactionID>
<ProductName>’Lamp’</ProductName>
<Cost>45.00</Cost>
<PurchaseDate>07042020
<Transaction>
</Transactions>
</Contact>
<Contact>
<ContactID>4567</ContactID>
<ContactName>’Jane Doe’</ContactName>
<DOB>05191984</DOB>
<Address>’567 Fake Street’</Address>
<Transactions>
<Transaction>
<TransactionID>4378</TransactionID>
<ProductName>’Coffee Table’</ProductName>
<Cost>225.00</Cost>
<PurchaseDate>07042018
</Transaction>
</Transactions>
</Contact>
</Contacts>
I need these data in a result like below:
ContactID
TransactionID
ProductName
Cost
PurchaseDate
1234
4490
Recliner
123.00
4 July 2020
1234
5678
Lamp
45.00
4 July 2020
4567
4378
Coffee Table
225.00
4 July 2018
I've tried a query using the following script:
EXEC sp_xml_preparedocument #idoc OUTPUT, #doc
-- Execute a SELECT stmt using OPENXML rowset provider.
SELECT *
FROM OPENXML (#idoc, '/Contacts/Contact/Transactions/Transaction',2)
WITH (ContactID int '../ContactID',
TransactionID int 'TransactionID',
ProductName nvarchar(50) 'ProductName',
Cost float 'Cost',
PurchaseDate date 'PurchaseDate')
But this will return either a null for ContactID; or return only one transaction for each ContactID. But I need it to iterate and get as many transactions as exist for a contact.
Any insights would be most welcome!

Try to avoid sp_xml_preparedocument because it uses large amounts of memory that can't be used by SQL Server until you remember to free it up by invoking sp_xml_removedocument.
What you're asking for can be easily achieved using nodes() with a cross apply, e.g. (after fixing your XML sample):
declare #doc xml = N'<Contacts>
<Contact>
<ContactID>1234</ContactID>
<ContactName>’John Doe’</ContactName>
<DOB>09031978</DOB>
<Address>’123 Main Street’</Address>
<Transactions>
<Transaction>
<TransactionID>4490</TransactionID>
<ProductName>’Recliner’</ProductName>
<Cost>123.00</Cost>
<PurchaseDate>07042020</PurchaseDate>
</Transaction>
<Transaction>
<TransactionID>5678</TransactionID>
<ProductName>’Lamp’</ProductName>
<Cost>45.00</Cost>
<PurchaseDate>07042020</PurchaseDate>
</Transaction>
</Transactions>
</Contact>
<Contact>
<ContactID>4567</ContactID>
<ContactName>’Jane Doe’</ContactName>
<DOB>05191984</DOB>
<Address>’567 Fake Street’</Address>
<Transactions>
<Transaction>
<TransactionID>4378</TransactionID>
<ProductName>’Coffee Table’</ProductName>
<Cost>225.00</Cost>
<PurchaseDate>07042018</PurchaseDate>
</Transaction>
</Transactions>
</Contact>
</Contacts>';
select
Cont.value('ContactID[1]', 'int') as ContactID
,Trans.value('TransactionID[1]', 'int') as TransactionID
,Trans.value('ProductName[1]', 'nvarchar(50)') as ProductName
,Trans.value('Cost[1]', 'float') as Cost
,convert(date, concat(substring(purDate,1,2), '/', substring(purDate,3,2), '/', substring(purDate,5,4)), 101) as PurchaseDate
from #doc.nodes('//Contact') nodes1(Cont)
cross apply nodes1.Cont.nodes('Transactions/Transaction') nodes2(Trans)
outer apply (
select purDate = Trans.value('PurchaseDate[1]', 'nvarchar(8)')
) temp;
Which yields:
ContactID
TransactionID
ProductName
Cost
PurchaseDate
1234
4490
’Recliner’
123
2020-07-04
1234
5678
’Lamp’
45
2020-07-04
4567
4378
’Coffee Table’
225
2018-07-04

Related

XML select statement to loop over all rows

I have
mytable consists of one XMLTYPE column called 'RS'. RS contains:
<test>
<mycol>
<name>a</name>
<number>1</number>
<number>2</number>
<number>50</number>
<number>60</number>
</mycol>
<mycol>
<name>b</name>
<number>5</number>
<number>820</number>
<number>601</number>
</mycol>
<mycol>
<name>c</name>
<number>6</number>
<number>8</number>
<number>62</number>
</mycol>
etc...
</test>
I'm looking to run a select statement that will display ALL names and up to 2 numbers from mytable.
something like this select statement but for all rows and without calling mycol[] several times.
select a.RS.extract('/test/mycol[1]/name[1]/text()').getstringval() as Names,
a.RS.extract('/test/mycol[1]/a[1]/text()').getstringval() ||''||
a.RS.extract('/test/mycol[1]/a[2]/text()').getstringval() ||''||
a.RS.extract('/test/mycol[1]/a[3]/text()').getstringval()
as num
from mytable a;/
output should be:
Names | num
a | 1 2
b | 5 820
c | 6 8
etc...
Thanks in advance.
Xml_table and string-join may be helpful

Converting SQL query result to XML [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 3 years ago.
Improve this question
Input data
id year Name provid prov
1 1995 MAC 1995-11_CL236 reg 236
1 1995 MAC 1995-11_CL230 reg 230 (1)
1 1995 MAC 1995-11_CL229J reg 229J
1 1995 MAC 1995-11_CL260 reg 260
My query looks like this
select
id, year, Name, prov, provid
from
Table
for xml path ('entry'), root('legref'), elements
The above Query generating different entry for each row. But I need group by id year, name and provide single entry with different prov and provid.
<legref>
<entry>
<id>1</id>
<year>1995</year>
<Name>MAC</Name>
<prov>reg 229J</prov>
<provid>NSW_REG_1995-11_CL229J</provid>
</entry>
<entry>
<id>1</id>
<year>1995</year>
<Name>MAC</Name>
<prov>reg 230 (1)</prov>
<provid>NSW_REG_1995-11_CL230</provid>
</entry>
<entry>
<id>1</id>
<year>1995</year>
<Name>MAC</Name>
<prov>reg 236</prov>
<provid>NSW_REG_1995-11_CL236</provid>
</entry>
<entry>
<id>1</id>
<year>1995</year>
<Name>MAC</Name>
<prov>reg 260</prov>
<provid>NSW_REG_1995-11_CL260</provid>
</entry>
</legref>
Output Data:
How do I convert SQL query result to XML?
Expected result set:
<legref>
<entry>
<id>1<id>
<year>1995</year>
<Name>MAC</Name>
<prov provID="1995-11_CL230">reg 230 (1)</prov>
<prov provID="1995-11_CL236">reg 236</prov>
<prov provID="1995-11_CL260">reg 260</prov>
<prov provID="1995-11_CL229J">reg 229J</prov>
</entry>
</legref>
Try This
FIDDLE DEMO
SELECT ID, Year, Name,
(
SELECT ProvID AS 'Prov/#ProvID',Prov
FROM tbl t
WHERE ID = t.ID AND Name = t.Name
FOR XML PATH(''),TYPE
)
FROM tbl
GROUP BY ID, Year, Name
FOR XML PATH ('Entry'),ROOT('legref')
Output
<legref>
<Entry>
<ID>1</ID>
<Year>1995</Year>
<Name>MAC</Name>
<Prov ProvID="1995-11_CL236">Reg 236</Prov>
<Prov ProvID="1995-11_CL230">Reg 230</Prov>
<Prov ProvID="1995-11_CL229J">Reg 229J</Prov>
<Prov ProvID="1995-11_CL260">Reg 260</Prov>
</Entry>
</legref>
Please try below query,
SELECT id
,year
,Name
,prov
,provid
FROM TABLE
FOR XML RAW ,ELEMENTS
I have tried it for different table for same type of result.
Please check snap:

Extract multi-value field in XML format in SQL

I'm currently working with a database that stores XML record for all of its field, please see below example. Lets name the table CUSTOMER table.
customer table
------------------------------------------------------
| RECID | XMLRECORD |
| 1 | <row id='1' xml:space="preserve"><c1>... |
| 2 | <row id='2' xml:space="preserve"><c1>... |
| 3 | <row id='3' xml:space="preserve"><c1>... |
------------------------------------------------------
All of the record of each customer is stored in 1 field called XMLRECORD, below is one example of XML RECORD of a customer.
<row id="1" xml:space="preserve">
<c1>James</c1>
<c2>Anderson</c2>
<c3>25</c3>
<c4>District 2 1657</c4>
<c4 m="2">Riverside Drive Redding</c4>
<c4 m="3">California, USA</c4>
</row>
Where c1 would be the customer's first name, c2 for last name, c3 for age and c4 would be the customer's address.
To query or extract values for each column, I usually use .value function to extract and return single value.
SELECT XMLRECORD.value('(/row/c4)[1]','NVARCHAR(20)') as ADDRESS
FROM CUSTOMER
Now my problem is this function only returns a single value, what I want is to return all the values under c4, which is multi value field. Can someone advise a way to do this?
Initiating the table
declare #xml as table
(
recid int,
xmlrecord xml
)
insert into #xml
values
( 1 , '<row id="1" xml:space="preserve">
<c1>James</c1>
<c2>Anderson</c2>
<c3>25</c3>
<c4>District 2 1657</c4>
<c4 m="2">Riverside Drive Redding</c4>
<c4 m="3">California, USA</c4>
</row>' )
DECLARE #XMLRECORD as xml =
'<row id="1" xml:space="preserve">
<c1>James</c1>
<c2>Anderson</c2>
<c3>25</c3>
<c4>District 2 1657</c4>
<c4 m="2">Riverside Drive Redding</c4>
<c4 m="3">California, USA</c4>
</row>' ;
Using .nodes functionality to fetch all the nodes
SELECT T.C.value('.','NVARCHAR(1000)') as c4_nodes FROM #XMLRECORD.nodes('(/row/c4)') as T(C)
Output -
c4_nodes
---------
District 2 1657
Riverside Drive Redding
California, USA
Since this fetches multiple records, using stuff command to concatenate the rows
SELECT
recid
,STUFF((
SELECT ',' + T.C.value('.','NVARCHAR(1000)')
FROM XMLRECORD.nodes('(/row/c4)') as T(C)
FOR XML PATH(''), TYPE).value('.', 'NVARCHAR(MAX)'), 1, 1, '') c4
FROM #xml
Output -
recid | c4
-----------
1 District 2 1657,Riverside Drive Redding,California, USA
c4 should not repeat multiple time. data should be stored in single node. For all node value, you should use [*] to get all node value.
You can try something like this
SELECT Tmp.record.value('.','NVARCHAR(20)')
FROM [customer]
CROSS APPLY [XMLRECORD].nodes('/row/c4') as Tmp(record)

Grouping values under elements FOR XML

I am trying to format the output of a query using FOR XML in SQL Server 2012.
Each PART_NO can have a varying number of SUPPLIER_PART_NUMBER's mapped to it.
The table has data in the following format.
PART_NO SUPPLIER_PART_NO
------- ----------------
AAA 1
AAA 2
BBB 3
BBB 4
BBB 5
The desired output is as follows where part AAA has two supplier part numbers and part BBB has three supplier part numbers, and the supplier part numbers are nested below the part number.
<root>
<item PartNo ="AAA">
<mpn>1</mpn>
<mpn>2</mpn>
</item>
<item PartNo ="BBB">
<mpn>3</mpn>
<mpn>4</mpn>
<mpn>5</mpn>
</item>
</root>
The closest I can get is below, but this does not group the mpn under PartNo:
SELECT
[PART_NO] as 'item/#PartNo',
[SUPPLIER_PART_NO] as 'mpn'
FROM
[dbo].[supplier_part_mapping2]
ORDER BY
PART_NO
FOR XML PATH('') , ROOT('root');
Thank you in advance
Try this:
SELECT
p1.PART_NO as 'item/#PartNo',
(SELECT
SUPPLIER_PART_NO AS 'mpn'
FROM
[dbo].[supplier_part_mapping2] p2
WHERE
p1.PART_NO = p2.PART_NO
FOR XML PATH(''), TYPE) AS 'item'
FROM
[dbo].[supplier_part_mapping2] p1
GROUP BY
PART_NO
ORDER BY
PART_NO
FOR XML PATH('') , ROOT('root');
This should produce:
You basically need to group by the PART_NO so that you get only one <item> entry for each distinct PART_NO, and you need to grab the "sub-elements" as a subquery to list them all together under one parent node.

How to Select distinct values in FOR XML PATH?

Given the following tables where T_DATA.ID = PARENT_ID or CHILD.ID
Name: T_DATA
+----+------+--------+
| ID | CODE | VALUE |
+----+------+--------+
| 1 | 3186 | value1 |
| 2 | 3186 | value2 |
| 3 | 3189 | value3 |
| 4 | 3189 | value4 |
| 5 | 3190 | value5 |
+----+------+--------+
Name: T_DATA_LINK
+-----------+----------+
| PARENT_ID | CHILD_ID |
+-----------+----------+
| 1 | 3 |
| 1 | 4 |
+-----------+----------+
I want to return an xml structure like this:
<ITEM_LIST>
<ITEM>
<CODE>3186</CODE>
<ROWS>
<ROW>
<ID>1</ID>
<ROW_INDEX>0</ROW_INDEX>
<VALUE>value1</VALUE>
</ROW>
<ROW>
<ID>2</ID>
<ROW_INDEX>1</ROW_INDEX>
<VALUE>value2</VALUE>
</ROW>
</ROWS>
</ITEM>
<ITEM>
<CODE>3189</CODE>
<ROWS>
<ROW>
<ID>3</ID>
<ROW_INDEX>0</ROW_INDEX>
<VALUE>value3</VALUE>
</ROW>
<ROW>
<ID>4</ID>
<ROW_INDEX>1</ROW_INDEX>
<VALUE>value4</VALUE>
</ROW>
</ROWS>
</ITEM>
<ITEM>
<CODE>3190</CODE>
<VALUE>value5</VALUE>
</ITEM>
</ITEM_LIST>
The ROW_INDEX is incremented by 1 for every ROW.
I need the T_DATA_LINK table to know whether an ITEM has a parent or not.
If it has a parent it means that there is more than one record with that CODE value and they need to be displayed as ROWS, otherwise it has to be displayed as a single ITEM.
UPDATE
I actually need to check the T_DATA_LINK table since there may be cases where an ITEM has a parent and only one record, but it still need to be displayed as a ROW.
#Shnugo I tried your solution, but even if now I get the correct values inside the ROWS, I get duplicates for each ITEM that has more than one record.
This is probably because I had to add to the GROUP BY the other fields I need to return with the SELECT which I didn't add to the example in order to keep it simpler.
For example, the ID need to be displayed at the ITEM level for the items which don't have any ROWS.
UPDATE 2
#Shnugo you are correct. Items 3 and 4 are the children of Item 1, but you don't see the relationship in the xml.
All the items are unique, always.
The items that are referenced in T_DATA_LINK are still unique, but are linked to each other in my application where they are displayed inside a table.
Basically the PARENT is the first column of the table and the children are the others columns.
This is the updated output I want to get.
ID should be always -1 for the items that have rows.
PARENT_CODE should be the CODE of the parent (if the item is a parent then it is equal to the CODE)
<ITEM_LIST>
<ITEM>
<ID>-1</ID>
<CODE>3186</CODE>
<PARENT_CODE>3186</PARENT_CODE>
<ROWS>
<ROW>
<ID>1</ID>
<ROW_INDEX>0</ROW_INDEX>
<VALUE>value1</VALUE>
</ROW>
<ROW>
<ID>2</ID>
<ROW_INDEX>1</ROW_INDEX>
<VALUE>value2</VALUE>
</ROW>
</ROWS>
</ITEM>
<ITEM>
<ID>-1</ID>
<CODE>3189</CODE>
<PARENT_CODE>3186</PARENT_CODE>
<ROWS>
<ROW>
<ID>3</ID>
<ROW_INDEX>0</ROW_INDEX>
<VALUE>value3</VALUE>
</ROW>
<ROW>
<ID>4</ID>
<ROW_INDEX>1</ROW_INDEX>
<VALUE>value4</VALUE>
</ROW>
</ROWS>
</ITEM>
<ITEM>
<ID>5</ID>
<CODE>3190</CODE>
<VALUE>value5</VALUE>
</ITEM>
</ITEM_LIST>
This is a new answer... Please try to put all needed information into the initial question...
DECLARE #t_data TABLE(ID INT,CODE INT,VALUE VARCHAR(100));
INSERT INTO #t_data VALUES
(1,3186,'value1')
,(2,3186,'value2')
,(3,3189,'value3')
,(4,3189,'value4')
,(5,3190,'value5');
DECLARE #t_data_link TABLE(PARENT_ID INT, CHILD_ID INT)
INSERT INTO #t_data_link VALUES
(1,3)
,(1,4);
--The CTE links the two tables and allows to handle them as one derived table
WITH Combined AS
(
SELECT d.*
,d2.CODE AS PARENT_CODE
,COUNT(*) OVER(PARTITION BY d.CODE) AS CountRows
FROM #t_data AS d
LEFT JOIN #t_data_link AS dl ON d.ID=dl.CHILD_ID
LEFT JOIN #t_data AS d2 ON dl.PARENT_ID=d2.ID
)
SELECT CASE WHEN c.CountRows>1 THEN -1 END AS ID
,CASE WHEN c.CountRows>1 THEN c.CODE END AS CODE
,CASE WHEN c.CountRows>1 THEN ISNULL(c.PARENT_CODE,c.CODE) END AS PARENT_CODE
--This part for elements with just one row per code
,(
SELECT d2.ID
,d2.CODE
,d2.VALUE
FROM #t_data AS d2
WHERE c.CODE=d2.CODE
AND c.CountRows=1
FOR XML PATH(''),TYPE
)
--This part for elements with more rows per code
,(
SELECT d2.ID
,ROW_NUMBER() OVER(ORDER BY (SELECT NULL))-1 AS ROW_INDEX
,d2.VALUE
FROM #t_data AS d2
WHERE c.CODE=d2.CODE
AND c.CountRows>1
FOR XML PATH('ROW'),ROOT('ROWS'),TYPE
)
FROM Combined AS c
GROUP BY c.CODE,c.CountRows,c.PARENT_CODE
FOR XML PATH('ITEM'),ROOT('ITEM_LIST');
The result
<ITEM_LIST>
<ITEM>
<ID>-1</ID>
<CODE>3186</CODE>
<PARENT_CODE>3186</PARENT_CODE>
<ROWS>
<ROW>
<ID>1</ID>
<ROW_INDEX>0</ROW_INDEX>
<VALUE>value1</VALUE>
</ROW>
<ROW>
<ID>2</ID>
<ROW_INDEX>1</ROW_INDEX>
<VALUE>value2</VALUE>
</ROW>
</ROWS>
</ITEM>
<ITEM>
<ID>-1</ID>
<CODE>3189</CODE>
<PARENT_CODE>3186</PARENT_CODE>
<ROWS>
<ROW>
<ID>3</ID>
<ROW_INDEX>0</ROW_INDEX>
<VALUE>value3</VALUE>
</ROW>
<ROW>
<ID>4</ID>
<ROW_INDEX>1</ROW_INDEX>
<VALUE>value4</VALUE>
</ROW>
</ROWS>
</ITEM>
<ITEM>
<ID>5</ID>
<CODE>3190</CODE>
<VALUE>value5</VALUE>
</ITEM>
</ITEM_LIST>
XML will omit any NULL value. The WHERE clause in the subselects will return with NULL if there's nothing found...