Extracting XML data in SQL - too many cross apply statements - sql

I have an xml document containing details from a Statement:
<Statement>
<Id />
<Invoices>
<Invoice>
<Id />
<Date />
<AmountDue />
etc.
</Invoice>
<Invoice>
<Id />
<Date />
<AmountDue />
etc.
</Invoice>
<Invoice>
<Id />
<Date />
<AmountDue />
etc.
</Invoice>
</Invoices>
</Statement>
This works fine for the Statement specific details:
SET #statementId = #xml.value('(Id)[1]', 'UNIQUEIDENTIFIER');
but it requires a singleton, and only returns the first value. I need ALL of the values for the invoices, not just the first so a singleton won't work.
I am able to get the information out using cross apply statements like this:
SELECT
#statementId AS STATEMENT_ID
Id.value('.', 'uniqueidentifier') AS INVOICE_ID
Date.value('.', 'smalldatetime') AS INVOICE_DATE
Due.value('.', 'decimal') AS INVOICE_AMOUNT_DUE
FROM #xml.nodes('Statement') A(S)
cross apply S.nodes('Invoices/Invoice') B(InvoiceD)
cross apply InvoiceD.nodes('Id') C(Id)
cross apply InvoiceD.nodes('Date') D(Date)
cross apply InvoiceD.nodes('AmountDue') E(Due)
This returns an Id, date, and amount from each Invoice in the Statement - perfect.
My problem comes when I try to extract all of the invoice details. I currently have seven cross apply statements and I got the following message:
"The query processor ran out of internal resources and could not
produce a query plan. This is a rare event and only expected for
extremely complex queries or queries that reference a very large
number of tables or partitions. Please simplify the query. If you
believe you have received this message in error, contact Customer
Support Services for more information."
What I want to do is have one cross apply for the Invoice and narrow down the exact field in the select statement, but unless I use '.' I must make the statement return a singleton and I don't get all of the data that I need.
I have done some research about specifying a namespace within the select statement, but all of the examples set the namespace to be an http address instead of a node in an xml document and I haven't gotten anything to return yet using this approach.
The result I'm looking for is something like this, but with more Invoice Details:
STATEMENT_ID INVOICE_ID INVOICE_DATE INVOICE_AMOUNT_DUE ...
Statement-1-Id Invoice-1-Id Invoice-1-Date Invoice-1-AmountDue ...
Statement-1-Id Invoice-2-Id Invoice-2-Date Invoice-2-AmountDue ...
Statement-1-Id Invoice-3-Id Invoice-3-Date Invoice-3-AmountDue ...
Where should I go from here?
EDIT: I removed some unnecessary information. Getting all of the invoice-specific details is my goal here.

select #XML.value('(Statement/Id/text())[1]', 'uniqueidentifier') as StatementId,
T.N.value('(Id/text())[1]', 'uniqueidentifier') as InvoiceId,
T.N.value('(Date/text())[1]', 'smalldatetime') as InvoiceDate,
T.N.value('(AmountDue/text())[1]', 'decimal') as AmountDue
from #XML.nodes('/Statement/Invoices/Invoice') as T(N)
.nodes will shred your XML to rows so that each row T.N is pointing to an Invoice node of its own. On that node there is only a single Id node so fetching the value specifying a singleton Id[1] works.
You can use Id[1] or (Id/text())[1] but the latter will give you a more efficient execution plan.

Related

How do I get a value from XML column in SQL?

So, I have a table with a large chunk of data stored in XML.
The partial XML schema (down to where I need) looks like this:
<DecisionData>
<Customer>
<SalesAttemptNumber />
<SubLenderID>IN101_CNAC</SubLenderID>
<DecisionType>Decision</DecisionType>
<DealerID />
<CustomerNumber>468195994772076</CustomerNumber>
<CustomerId />
<ApplicationType>Personal</ApplicationType>
<ApplicationDate>9/16/2008 11:32:07 AM</ApplicationDate>
<Applicants>
<Applicant PersonType="Applicant">
<CustNum />
<CustomerSSN>999999999</CustomerSSN>
<CustLastName>BRAND</CustLastName>
<CustFirstName>ELIZABETH</CustFirstName>
<CustMiddleName />
<NumberOfDependants>0</NumberOfDependants>
<MaritalStatus>Single</MaritalStatus>
<DateOfBirth>1/1/1911</DateOfBirth>
<MilitaryRank />
<CurrentAddress>
<ZipCode>46617</ZipCode>
Unfortunately, I am unfamiliar with pulling from XML, and my google-fu has failed me.
select TransformedXML.value('(/DecisionData/Customer/Applicants/Applicant PersonType="Applicant"/CurrentAddress/ZipCode/node())[1]','nvarchar(max)') as zip
from XmlDecisionInputText as t
I believe my problem lies with the portion that goes Applicant PersonType="Applicant", but am unsure how to deal with it.
Thanks for any help.
The xpath in its simplest form would be:
TransformedXML.value('(//ZipCode)[1]', 'nvarchar(100)') AS zip
This will find the first ZipCode node anywhere inside your document. If there are multiple, just be specific (as much as you want but not any more):
TransformedXML.value('(/DecisionData/Customer/Applicants/Applicant[#PersonType="Applicant"]/CurrentAddress/ZipCode)[1]', 'nvarchar(100)') AS zip
DB Fiddle
If there are MULTIPLE applicants, you can use a CROSS APPLY
Example
Select A.ID
,B.*
From XmlDecisionInputText A
Cross Apply (
Select PersonType = x.v.value('#PersonType','VARCHAR(150)')
,CustLastName = x.v.value('CustLastName[1]','VARCHAR(150)')
,CustFirstName = x.v.value('CustFirstName[1]','VARCHAR(150)')
,ZipCode = x.v.value('CurrentAddress[1]/ZipCode[1]','VARCHAR(150)')
From XmlDecisionInputText.nodes('DecisionData/Customer/Applicants/*') x(v)
) B

SQL WHERE clause with multiple XML attributes

I have table in database with XML column. Now I need to select some rows by two attributes from XML.
So far I've come up with this:
SELECT o.Id
FROM Objects o
WHERE o.SerializedObject.value('(/object/param[#id="111"]/#value)[1]', 'varchar(8)') = '-1'
AND o.SerializedObject.value('(/object/param[#id="222"]/#value)[1]', 'varchar(8)') = '8'
EDIT:
XML is like:
<object>
<param id="1" value="111"/>
<param id="2" value="222"/>
...
<param id="200" value="4545"/>
<object>
Each object has ~2k params.
I'm wondering if there is a better way to do that with single XML query.
This depends on your XML (you did not show an example, but I assume this is kind of EAV).
You can try using XML's method .exist():
DECLARE #mockup TABLE(ID INT IDENTITY,Comment VARCHAR(100),SerializedObject XML);
INSERT INTO #mockup VALUES
('just one of them','<object><param id="111" value="-1"/></object>')
,('both, but wrong values','<object><param id="111" value="-1"/><param id="222" value="-1"/></object>')
,('both, should fit','<object><param id="111" value="-1"/><param id="222" value="8"/></object>')
SELECT o.Id,o.Comment,o.SerializedObject
FROM #mockup o
WHERE o.SerializedObject.exist('/object[param[#id="111" and #value="-1"] and param[#id="222" and #value="8"]]')=1;
.exist() is the fastest here, because it does not return any value. It will just return 1 on the first occurance found. This is especially fast, when there are many occurances of a <param id="111" value="???"> Otherwise you'd have to shred the whole lot and place the filter on the whole resultset.
And - of course! - the necessary hint: As told in a comment by Jeroen Mostert dealing with bigger XMLs might turn out as a bottle neck. If you need this more often, you might think about a relational design instead of big XMLs...

Flattening xml data in sql

I'm trying to flatten XML data in a SQL query but I always seem to get nulls.
I tried the cross/outer apply method described here.
The column with XML data is called Data.
I'm guessing that the xml data with these links need to be somehow also added?
Could you please help to get a proper SQL query?
Query I tried:
SELECT
v.name
,pref.value('(LocalId/text())[1]', 'nvarchar(10)') as localid
FROM [database].[requests] v
outer apply v.Data.nodes('/DataForm') x(pref)
GO
example of xml data in that column:
<Dataform xmlns="http://somelongasslink.org/hasalsosomestuffhere" xmlns:i="http://somexlmschemalink/">
<DeleteDate xmlns="http://somelongasslink.org/hasalsosomestuffhere" i:nil="true" />
<LocalId xmlns="http://somelongasslink.org/hasalsosomestuffhere">5325325</LocalId>
...
You can use this code to get the result you're looking for:
;WITH XMLNAMESPACES(DEFAULT 'http://somelongasslink.org/hasalsosomestuffhere')
SELECT
rq.Name,
LocalID = TC.value('(LocalId)[1]', 'nvarchar(10)')
FROM
[database].[requests] rq
CROSS APPLY
rq.Data.nodes('/Dataform') AS TX(TC)
GO
There were two problems with your code:
you're not respecting / including the XML namespace that's defined on the XML document
<Dataform xmlns="http://somelongasslink.org/hasalsosomestuffhere"
*******************************************************
you didn't pay attention to the case-sensitivity of XML in your call to .nodes() - you need to use .nodes('/Dataform') (not /DataForm - the F is not capitalized in your XML)

Extracting child node value from XML using T-SQL

I'm stuck on trying to get the 'availability' node's value out of an envelope returned via T-SQL from a Microsoft Lync database. The usual methods of .value('(/MyElement/Something)[1]') doesn't seem to work for me.
<state xsi:type="aggregateState" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://schemas.microsoft.com/2006/09/sip/state">
<availability>3500
</availability>
<delimiter xmlns="http://schemas.microsoft.com/2006/09/sip/commontypes" />
<timeZoneBias>-60
</timeZoneBias>
<timeZoneName>GMT Daylight Time
</timeZoneName>
<timeZoneAbbreviation>GMT Daylight Time
</timeZoneAbbreviation>
<device>computer
</device>
<end xmlns="http://schemas.microsoft.com/2006/09/sip/commontypes" />
</state>
This is the query I've been experimenting with:
SELECT TOP 1
CAST(SUBSTRING(i.Data, 0, 4000) as XML).value('(/state//availability)[1]', 'varchar(256)')
FROM dbo.PublishedCategoryInstanceView AS i
INNER JOIN dbo.CategoryDef AS d
ON (d.CategoryId = i.CategoryId)
WHERE i.PublisherId = (SELECT ResourceId FROM dbo.Resource
WHERE UserAtHost = 'my.email#mydomain.local')
ORDER BY i.LastPubTime DESC
All I get back is 'NULL' unless I do CAST(SUBSTRING(i.Data, 0, 4000) as XML).value('(/)[1]', 'varchar(256)') which returns 3500-60GMT Daylight TimeGMT Daylight Timecomputer
I do know that when I strip out the three attributes on the state element I can perform normal XML queries against the data so I can get around this by manipulating the string with a few replace statements but I'd rather learn exactly what I'm doing wrong here, if anyone can help?
You're just plain ignoring the XML namespace that exists on your XML root node:
<state xsi:type="aggregateState"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns="http://schemas.microsoft.com/2006/09/sip/state">
******************************************************
You need to include that in your T-SQL query!
Try something like this:
;WITH XMLNAMESPACES(DEFAULT 'http://schemas.microsoft.com/2006/09/sip/state')
SELECT TOP 1
CAST(SUBSTRING(i.Data, 0, 4000) as XML).value('(/state//availability)[1]', 'varchar(256)')
FROM
dbo.PublishedCategoryInstanceView AS i

No results using with xmlnamespace exist

I am new to using WITH XMLNAMESPACES and am having some difficulty producing results using the exist() method. The xml column I am querying is as follows:
<Root>
<ProductDescription ProductID="1" ProductName="Road Bike">
<Features>
<Warranty>1 year parts and labor</Warranty>
<Maintenance>3 year parts and labor extended maintenance is
available
</Maintenance>
</Features>
</ProductDescription>
</Root>
I am attempting to return an xml namespace in the result set, working from a modified example from Microsoft. However when I execute the following query I get 0 results. Removing the alias from the WHERE clause however does produce results.
WITH XMLNAMESPACES ('uri' as pd)
SELECT
Trans_ID,
Bike_Sales.query
(
'<ProductDescription Product_ID = "{ sql:column("Trans_ID")}"/>'
) AS Result
FROM XML_EXAMPLE
WHERE
Bike_Sales.exist
(
'/pd:Root/ProductDescription[(pd:Features)]'
) = 1
So basically I just want to return rows where the Features element is present. Any suggestions?
Thanks.