Use SQL Server 2005 XML APIs to normalize an XML fragment - sql

I have some (untyped) XML being stored in SQL Server 2005 that I need to transform into a normalized structure. The structure of the document currently looks like so:
<wrapper>
<parent />
<node />
<node />
<node />
<parent />
<node />
<node />
<node />
<wrapper>
I want to transform it to look like this:
<wrapper>
<parent>
<node />
<node />
<node />
</parent>
<parent>
<node />
<node />
<node />
</parent>
<wrapper>
I can select the XML out into a relational structure if I need to, put the problem is there are no attributes linking the parent and the child nodes together, so order becomes an issue when using set-based operations. How can I use the .nodes()/.value()/other SQL Server XML APIs to transform this data? The transformation needs to run as part of a batch SQL script so extracting it into another tool/language is not a reasonable option for me.

Actually - following code works (grouping here may be isn't very optimal, but anyway):
declare #xml xml = '
<wrapper>
<parent id="1" />
<node id="1" />
<node id="2" />
<node id="3" />
<parent id="2" />
<node id="4" />
<node id="5" />
<node id="6" />
</wrapper>
'
;with px as
(
select row_number() over (order by (select 1)) as RowNumber
,t.v.value('#id', 'int') as Id
,t.v.value('local-name(.)', 'nvarchar(max)') as TagName
from #xml.nodes('//wrapper/*') as t(v)
)
select p.Id as [#id],
(
select n.Id as id
from px n
where n.TagName = 'node'
and n.RowNumber > p.RowNumber
and not exists
(
select null
from px np
where np.TagName = 'parent'
and np.RowNumber > p.RowNumber
and np.RowNumber < n.RowNumber
)
order by n.RowNumber
for xml raw('node'), type
)
from px p
where p.TagName = 'parent'
order by p.RowNumber
for xml path('parent'), root('wrapper')
But I don't recommend to use it. See here: http://msdn.microsoft.com/en-us/library/ms172038%28v=sql.90%29.aspx: In SQLXML 4.0, document order is not always determined
So I'm not sure that we can rely on order of tags inside wrapper (and code above is more just for fun than for practical use).

Related

Unable to extract details from an XML column of MSSQL

I am new to extracting data from XML column using SQL, I did try with other simple format but this one is a bit tricky.
XML column name is "Data" and below is the value.
<SpecificationDataDto
xmlns:i="http://www.w3.org/2001/XMLSchema-instance"
xmlns="http://schemas.datacontract.org/2004/07/Dummy.Sample.API.TransferObjects.Dto">
<Narratives>
<AnswerDto>
<AddressData />
<AddressResponseType i:nil="true" />
<Answers
xmlns:d4p1="http://schemas.microsoft.com/2003/10/Serialization/Arrays">
<d4p1:string>Loose Print</d4p1:string>
</Answers>
<ComponentOrderNumber>3</ComponentOrderNumber>
<Index>0</Index>
<IsQuantity>false</IsQuantity>
<MasterQuestionId i:nil="true" />
<MasterQuestionTitle i:nil="true" />
<MasterQuestionTitleResourceKey i:nil="true" />
<ParentQuestionGroupId i:nil="true" />
<ParentQuestionGroupIndex i:nil="true" />
<ParentQuestionGroupTitle i:nil="true" />
<PermutationId>65922bf8-6468-40f2-967c-27547412ba2b</PermutationId>
<PermutationOrder>0</PermutationOrder>
<ProductSpecificationLabel />
<ProductSpecificationLabelResourceKey>65A4E3D8-9FF7-48D9-B3F8-0E2D4D23A1B8</ProductSpecificationLabelResourceKey>
<QuestionGroupId i:nil="true" />
<QuestionGroupIndex i:nil="true" />
<QuestionGroupOrderNumber i:nil="true" />
<QuestionGroupTitle i:nil="true" />
<QuestionId>e575d4ac-5dfb-4170-c5c0-08da85c9a9ed</QuestionId>
<QuestionOrderNumber>1</QuestionOrderNumber>
<QuestionTitle>Product category</QuestionTitle>
<QuestionTitleResourceKey>81B4FFBC-4503-4E48-83C0-17FBB227384B</QuestionTitleResourceKey>
<ResponseIds
xmlns:d4p1="http://schemas.microsoft.com/2003/10/Serialization/Arrays">
<d4p1:guid>c4ed5170-6634-4aad-c054-08da7f653c36</d4p1:guid>
</ResponseIds>
<UpdateDate i:nil="true" />
</AnswerDto>
</Narratives>
</SpecificationDataDto>
I tried below but no luck
WITH XMLNAMESPACES ('http://schemas.microsoft.com/2003/10/Serialization/Arrays' AS d4p1)
, CTE as (
SELECT Order
,t.c.value('(*/QuestionTitle)[1]', 'Varchar(250)') AS Question
,t.c.value('(*/d4p1:string)[1]', 'Varchar(250)') AS Answer
FROM SchemaName.DatabaseName.TableName
CROSS APPLY Data.nodes('*/SpecificationDataDto/Narratives/AnswerDto') t(c)
WHERE Order = '737B4994'
)
select * from CTE
I am expecting the below values to come out for this order in 2 columns,
QuestionTitle = Product category
d4p1:string = Loose Print

Select data using WHERE condition on XML data column in SQL table

I have a table that lists some user details.
ID
GUID
Username
Password
Data
1
a2a8s7d4d
xswe
xxxxxx
XML
2
aer335mla
user
xxxxxx
XML
The Data column contains data using XML. Below is a sample from the table.
<UserInfo xmlns:i="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://schemas.datacontract.org/2004/07/ComponentFramework">
<ActiveDirectoryUser>false</ActiveDirectoryUser>
<CanUpdateMasterData>false</CanUpdateMasterData>
<CanUploadFiles>false</CanUploadFiles>
<ChangePassword>false</ChangePassword>
<CustomDataPageSize>false</CustomDataPageSize>
<CustomMasterDataPageSize>false</CustomMasterDataPageSize>
<DataPageSize>100</DataPageSize>
<Disabled>true</Disabled>
<Displayname>Pål</Displayname>
<Email i:nil="true" />
<EnforcePasswordPolicy>false</EnforcePasswordPolicy>
<EnvironmentIdList xmlns:d2p1="http://schemas.microsoft.com/2003/10/Serialization/Arrays" />
<GUID i:nil="true" />
<GeoLocation>
<City i:nil="true" />
<Country i:nil="true" />
<CountryCode i:nil="true" />
<Ip i:nil="true" />
<Isp i:nil="true" />
<Lat>0</Lat>
<Lon>0</Lon>
<Org i:nil="true" />
<Query i:nil="true" />
<Region i:nil="true" />
<RegionName i:nil="true" />
<Status i:nil="true" />
<Timezone i:nil="true" />
<Zip i:nil="true" />
</GeoLocation>
<GroupIdList xmlns:d2p1="http://schemas.microsoft.com/2003/10/Serialization/Arrays" />
<HttpLink i:nil="true" />
<JobIdList xmlns:d2p1="http://schemas.microsoft.com/2003/10/Serialization/Arrays" />
<LastLoggedIn>2015-06-11T19:04:44.6407074+05:30</LastLoggedIn>
<MasterDataPageSize>1000</MasterDataPageSize>
<ModifyImages>false</ModifyImages>
<QualityControl>false</QualityControl>
<QualityControlGroupId i:nil="true" />
<Review>false</Review>
<ReviewGroupId i:nil="true" />
<SecurityToken i:nil="true" />
<ShowTrackerPage>false</ShowTrackerPage>
<StatIdList xmlns:d2p1="http://schemas.microsoft.com/2003/10/Serialization/Arrays" />
<Username><new user></Username>
<Usertype>Power</Usertype>
</UserInfo>
I'm trying to match users that have their accounts disabled. Using the below sql query.
select * from [ATC_Config].[dbo].[Users] where [ATC_Config].[dbo].[Users].[Data].value('/UserInfo/Disabled[1]','nvarchar(MAX)') = 'true'
But SSMS is giving me an error Cannot call methods on nvarchar(max) and highlight my column which Data. I tried few suggestions in SO and in MSDN but nothing helped. Can someone show me what am I doing wrong?
Because your sample XML contains a default namespace definition you'll need to declare that in your value XQuery or via with xmlnamespaces.
Here's how you can do that with value...
select *
from dbo.Users
where cast(Data as xml).value(N'
declare default element namespace "http://schemas.datacontract.org/2004/07/ComponentFramework";
(/UserInfo/Disabled)[1]',N'nvarchar(max)') = N'true';
Or by using with xmlnamespaces:
with xmlnamespaces(default N'http://schemas.datacontract.org/2004/07/ComponentFramework')
select *
from dbo.Users
where cast(Data as xml).value(N'(/UserInfo/Disabled)[1]', N'nvarchar(max)') = N'true';
You need to add the namespace, and you need to cast the value to xml. This is easier if you use with because it applies to the whole query.
A slightly more efficient version of #AlwaysLearning's answer is to use exist and /text()
with xmlnamespaces (
default N'http://schemas.datacontract.org/2004/07/ComponentFramework'
)
select *
from dbo.Users u
where cast(u.Data as xml).exist(N'/UserInfo/Disabled[text() = "true"]') = 1;
db<>fiddle
I strongly suggest you store the Data column as xml in the first place, as casting is inefficient.

SQL - Extract XML from multiple nodes

I've done a load of research and cannot seem to string together the SQL to extract the required data from an XML field.
<vItem>
<jobScript>
<node guid="7606bd90-98df-4572-accd-5b41ec5605dc">
<subNodes>
<node guid="17f8e275-d4f6-47c0-a5e4-80da658f4097">
<execute taskVersionGuid="5fc17d5c-7264-461f-ae38-753d703f3c99" />
</node>
<node guid="5fe2233c-9e3a-44be-aa20-aea2c8dcbd4a">
<execute taskVersionGuid="f55dc069-46ff-427e-920f-5f1c3fc3ad09" />
</node>
<node guid="ecd6a7b5-a3be-483c-acf8-64ba1c289088">
<execute taskVersionGuid="5220d97c-6e8f-400a-b814-aa7d84942c20" />
</node>
</subNodes>
</node>
</jobScript>
I'm trying to extract the taskVersionGuid from each node. In the scenario, there could be anywhere between 1 and 10 taskVersionGuids, however the example I have above has 3.
Any help with this would be appreciated.
Thanks
Edit
I have tried the below also:
declare #XML xml
set #XML =
'
<vItem>
<jobScript>
<node guid="7606bd90-98df-4572-accd-5b41ec5605dc">
<subNodes>
<node guid="17f8e275-d4f6-47c0-a5e4-80da658f4097">
<execute taskVersionGuid="5fc17d5c-7264-461f-ae38-
753d703f3c99" />
</node>
<node guid="5fe2233c-9e3a-44be-aa20-aea2c8dcbd4a">
<execute taskVersionGuid="f55dc069-46ff-427e-920f-
5f1c3fc3ad09" />
</node>
<node guid="ecd6a7b5-a3be-483c-acf8-64ba1c289088">
<execute taskVersionGuid="5220d97c-6e8f-400a-b814-
aa7d84942c20" />
</node>
</subNodes>
</node>
</jobScript>
</vItem>
'
select T.N.query('.')
from #XML.nodes('/vItem/jobScript/node/subNodes/node/execute') as T(N)
However, this results in the following:
<execute taskVersionGuid="5fc17d5c-7264-461f-ae38-753d703f3c99" />
<execute taskVersionGuid="f55dc069-46ff-427e-920f-5f1c3fc3ad09" />
<execute taskVersionGuid="5220d97c-6e8f-400a-b814-aa7d84942c20" />
Whereas I'm trying to receive the value of taskVersionGuid.
Thanks again.
Answer as below:
select T.N.value('#taskVersionGuid[1]', 'uniqueidentifier')
from #XML.nodes('/vItem/jobScript/node/subNodes/node/execute') as T(N)
What you need to do is turn your xml into a table so you can query it. below is an example of the query you will need to grab the values from the nodes.
DECLARE #xml AS XML = '<jobScript>
<node guid="7606bd90-98df-4572-accd-5b41ec5605dc">
<subNodes>
<node guid="17f8e275-d4f6-47c0-a5e4-80da658f4097">
<execute taskVersionGuid="5fc17d5c-7264-461f-ae38-753d703f3c99" />
</node>
<node guid="5fe2233c-9e3a-44be-aa20-aea2c8dcbd4a">
<execute taskVersionGuid="f55dc069-46ff-427e-920f-5f1c3fc3ad09" />
</node>
<node guid="ecd6a7b5-a3be-483c-acf8-64ba1c289088">
<execute taskVersionGuid="5220d97c-6e8f-400a-b814-aa7d84942c20" />
</node>
</subNodes>
</node>
</jobScript>'
SELECT a.value('.', 'varchar(max)')
FROM #xml.nodes('/jobScript/node/subNodes/node/execute/#taskVersionGuid') a(a)

how to select all parent node in xml data

i want to select in all parent node while i don't know how many nodes exist ?
*<TreeView>
<node text="a">
<node text="aa">
<node text="aaa" />
</node>
<node text="b">
<node text="bb" />
</node>
</node>
<node text="c" />*
</TreeView>
what i want is: a,aa,b
DECLARE #MyXML XML
SET #MyXML = '<TreeView>
<node text="a">
<node text="aa">
<node text="aaa" />
</node>
<node text="b">
<node text="bb" />
</node>
</node>
<node text="c" />*
</TreeView> '
SELECT #MyXML.value ('(//node/#text)[1]', 'VARCHAR(30)'),
#MyXML.value ('(//node/#text)[2]', 'VARCHAR(30)'),
#MyXML.value ('(//node/#text)[4]', 'VARCHAR(30)')

sql select data from XML param

whats the SQL for selecting the values from this XML chunk like done in the sample below?
<RWFCriteria reportType="OPRAProject">
<item id="88" name="" value="" type="Project" />
<item id="112" name="" value="12" type="Milestone" />
<item id="43" name="" value="11" type="Milestone" />
</RWFCriteria>
i want to select out similar to this but with the above XML data
DECLARE #Param XML
SET #Param = '<data>
<release id="1"><milestone id="1" /><milestone id="2" /></release>
<release id="3"><milestone id="1" /><milestone id="27"/></release>
</data>'
SELECT c.value('../#id', 'INT') AS ReleaseId, c.value('#id', 'INT') AS MilestoneId
FROM #Param.nodes('/data/release/milestone') AS T(c)
I want only the data in the nodes where type="Milestone"
Something like this:
DECLARE #Param XML
SET #Param = '<RWFCriteria reportType="OPRAProject">
<item id="88" name="" value="" type="Project" />
<item id="112" name="" value="12" type="Milestone" />
<item id="43" name="" value="11" type="Milestone" />
</RWFCriteria>'
SELECT
RWF.item.value('#id', 'INT') AS 'Id',
RWF.item.value('#name', 'VARCHAR(100)') AS 'Name',
RWF.item.value('#value', 'INT') AS 'Value',
RWF.item.value('#type', 'VARCHAR(100)') AS 'Type'
FROM
#Param.nodes('/RWFCriteria/item') AS RWF(item)
WHERE
RWF.item.value('#type', 'VARCHAR(100)') = 'Milestone'
Resulting output:
Id Name Value Type
112 12 Milestone
43 11 Milestone