SQL Server Grabbing Value from XML parameter to use in later query - sql

I am really new to SQL Server and stored procedures to begin with. I need to be able to parse an incoming XML file for a specific element's value and compare/save it later in the procedure.
I have a few things stacked against me. One the Element I need is buried deeply inside the document. I have had no luck in searching for it by name using methods similar to this:
select CurrentBOD = c.value('(local-name(.))[1]', 'VARCHAR(MAX)'),
c.value('(.)[1]', 'VARCHAR(MAX)') from #xml.nodes('PutMessage/payload/content/AcknowledgePartsOrder/ApplicationArea/BODId') as BODtable(c)
It always returns null.
So, I am trying something similar to this:
declare #BODtable TABLE(FieldName VARCHAR(MAX),
FieldValue VARCHAR(MAX))
SELECT
FieldName = nodes.value('local-name(.)', 'varchar(50)'),
FieldValue = nodes.value('(.)[1]', 'varchar(50)')
FROM
#xml.nodes('//*') AS BODtable(nodes)
declare #CurrentBOD VARCHAR(36)
set #CurrentBOD = ''
SET #CurrentBOD = (SELECT FieldValue from #BODtable WHERE FieldName = 'BODId')
This provides me the list of node names and values correctly (I test this in a query and BODtable has all elements listed with the correct values), but when I set #CurrentBOD it comes up null.
Am I missing an easier way to do this? Am I messing these two approaches up somehow?
Here is a part of the xml I am parsing for reference:
<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:wsa="http://schemas.xmlsoap.org/ws/2004/03/addressing" xmlns:wsse="http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-wssecurity- secext-1.0.xsd" xmlns:wsu="http://docs.oasis-open.org/wss/2004/01/oasis-200401- wss-wssecurity-utility-1.0.xsd">
<soap:Header>
<payloadManifest xmlns="???">
<c contentID="Content0" namespaceURI="???" element="AcknowledgePartsOrder" version="4.0" />
</payloadManifest>
<wsa:Action>http://www.starstandards.org/webservices/2005/10/transport/operations/PutMessage</wsa:Action>
<wsa:MessageID>uuid:df8c66af-f364-4b8f-81d8-06150da14428</wsa:MessageID>
<wsa:ReplyTo>
<wsa:Address>http://schemas.xmlsoap.org/ws/2004/03/addressing/role/anonymous</wsa:Address>
</wsa:ReplyTo>
<wsa:To>???</wsa:To>
<wsse:Security soap:mustUnderstand="1">
<wsu:Timestamp wsu:Id="Timestamp-bd91e76f-c212-4555-9b23-f66f839672bd">
<wsu:Created>2013-01-03T21:52:48Z</wsu:Created>
<wsu:Expires>2013-01-03T21:53:48Z</wsu:Expires>
</wsu:Timestamp>
<wsse:UsernameToken xmlns:wsu="???" wsu:Id="???">
<wsse:Username>???</wsse:Username>
<wsse:Password Type="http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-username-token-profile-1.0#PasswordText">???</wsse:Password>
<wsse:Nonce>???</wsse:Nonce>
<wsu:Created>2013-01-03T21:52:48Z</wsu:Created>
</wsse:UsernameToken>
</wsse:Security>
</soap:Header>
<soap:Body>
<PutMessage xmlns="??????">
<payload>
<content id="???">
<AcknowledgePartsOrder xmlns="???" xmlns:xsi="???" xsi:schemaLocation="??? ???" revision="???" release="???" environment="???n" lang="en-US" bodVersion="???">
<ApplicationArea>
<Sender>
<Component>???</Component>
<Task>???</Task>
<ReferenceId>???</ReferenceId>
<CreatorNameCode>???</CreatorNameCode>
<SenderNameCode>???</SenderNameCode>
<DealerNumber>???</DealerNumber>
<PartyId>???</PartyId>
<LocationId />
<ServiceId />
</Sender>
<CreationDateTime>2013-01-03T21:52:47</CreationDateTime>
<BODId>71498800-c098-4885-9ddc-f58aae0e5e1a</BODId>
<Destination>
<DestinationNameCode>???</DestinationNameCode>

You need to respect the XML namespaces!
First of all, your target XML node <BODId> is inside the <soap:Envelope> and <soap:Body> tags - both need to be included in your selection.
Secondly, both the <PutMessage> as well as the <AcknowledgePartsOrder> nodes appear to have default XML namespaces (those xmlns=.... without a prefix) - and those must be respected when you select your data using XPath.
So assuming that <PutMessage xmlns="urn:pm"> and <AcknowledgePartsOrder xmlns="urn:apo"> (those are just guesses on my part - replace with the actual XML namespaces that you haven't shown use here), you should be able to use this XPath to get what you're looking for:
;WITH XMLNAMESPACES('http://schemas.xmlsoap.org/soap/envelope/' AS soap,
'urn:pm' AS ns, 'urn:apo' AS apo)
SELECT
XC.value('(apo:BODId)[1]', 'varchar(100)')
FROM
#YourXmlVariable.nodes('/soap:Envelope/soap:Body/ns:PutMessage/ns:payload/ns:content/apo:AcknowledgePartsOrder/apo:ApplicationArea') AS XT(XC)
This does return the expected value (71498800-c098-4885-9ddc-f58aae0e5e1a) in my case.

Related

U-SQL with XmlExtractor - elements inside elements

In U-SQL I am trying to get a list of elements inside elements, using the XmlExtractor. But I cannot get the nested collection.
It is a list of items, which has locations. With the XmlExtractor I can get a collection of elements, but I don't see how I can get a collection that contains a collection. An XML sample is shown below.
Any ideas?
<root>
<Item>
<Header>
<id>111</id>
</Header>
<Body>
<Locations>
<Location>
<Station>k4</Station>
<Timestamp>2017-08-30T02:04:18.2506945+02:00</Timestamp>
</Location>
<Location>
<Station>k5</Station>
<Timestamp>2017-08-30T02:04:18.2506945+02:00</Timestamp>
</Location>
</Locations>
</Body>
</Item>
<Item>
<Header>
<id>222</id>
</Header>
<Body>
<Locations>
<Location>
<Station>k4</Station>
<Timestamp>2017-08-30T02:12:36.1218601+02:00</Timestamp>
</Location>
<Location>
<Station>k5</Station>
<Timestamp>2017-08-30T02:12:36.1218601+02:00</Timestamp>
</Location>
</Locations>
</Body>
</Item>
</root>
Solved by making an extractor that takes the XML in one string, and then calls a method using xpath, returning an SQL.Array, where the string has comma separated values of of the result. The result looks like this:
111;k4,2017-08-30T02:04:18.2506945+02:00
111;k5,2017-08-30T02:04:18.2506945+02:00
222;k4,2017-08-30T02:12:36.1218601+02:00
222;k5,2017-08-30T02:12:36.1218601+02:00
The standard XmlExtractor cannot do this, and I also decided that it is better to postpone the parsing of the xml to after it has been extracted, because there can be multiple steps on the same xml.
Azure SQL Database has powerful abilities to shred XML. Maybe if this is already in your estate/architecture it might make a simple alternative to custom code? A simple example:
DECLARE #xml XML = '<root>
<Item>
<Header>
<id>111</id>
</Header>
<Body>
<Locations>
<Location>
<Station>k4</Station>
<Timestamp>2017-08-30T02:04:18.2506945+02:00</Timestamp>
</Location>
<Location>
<Station>k5</Station>
<Timestamp>2017-08-30T02:04:18.2506945+02:00</Timestamp>
</Location>
</Locations>
</Body>
</Item>
<Item>
<Header>
<id>222</id>
</Header>
<Body>
<Locations>
<Location>
<Station>k4</Station>
<Timestamp>2017-08-30T02:12:36.1218601+02:00</Timestamp>
</Location>
<Location>
<Station>k5</Station>
<Timestamp>2017-08-30T02:12:36.1218601+02:00</Timestamp>
</Location>
</Locations>
</Body>
</Item>
</root>'
/*
111;k4,2017-08-30T02:04:18.2506945+02:00
111;k5,2017-08-30T02:04:18.2506945+02:00
222;k4,2017-08-30T02:12:36.1218601+02:00
222;k5,2017-08-30T02:12:36.1218601+02:00
*/
SELECT
r.c.value('(Header/id/text())[1]', 'int' ) id,
b.c.value('(Station/text())[1]', 'varchar(10)' ) station,
b.c.value('(Timestamp/text())[1]', 'varchar(40)' ) [timestamp],
b.c.value('(Timestamp/text())[1]', 'datetimeoffset' ) [timestamp2]
FROM #xml.nodes('root/Item') r(c)
CROSS APPLY r.c.nodes('Body/Locations/Location') b(c)
You can do something similar if the XML is stored in a table also.
My results:
Here is a script that achieves the desired results using the extractors provided.
USE master;
REFERENCE SYSTEM ASSEMBLY [System.Xml]
REFERENCE ASSEMBLY master.[Microsoft.Analytics.Samples.Formats.Xml]
#e = EXTRACT a string, b string
FROM "CollectTest.xml"
USING new Microsoft.Analytics.Samples.Formats.Xml.XmlDomExtractor(rowPath:"Item",
columnPaths:new SQL.MAP<string, string> { {"Header", "a"}, {"Body", "b"} });
#f = SELECT #e.a, t.c, t.d
FROM #e
CROSS APPLY new Microsoft.Analytics.Samples.Formats.Xml.XmlApplier("b","Location", new SQL.MAP<string,string> { {"Station", "c"}, {"Timestamp", "d"} }) AS t(c string, d string);
OUTPUT #f TO "foo.txt" USING Outputters.Tsv(outputHeader:true);
OUTPUT #e TO "foo2.txt" USING Outputters.Tsv(outputHeader:true);
The first rowset #e uses the XmlDomExtractor to create a row set containing "ID" in col a and the child XML code in col b.
The second rowset #f then uses XmlApplier to extract the values from the nested xml code and cross apply it to the correct rows. The sample xml was copied from the post above and saved in the USQLDataRoot folder as "CollectTest.xml."
Note: Got lazy and the output for Header contains some unwanted node syntax but adding an intermediate xpath or XmlApplier step between #e and #f should solve this.

SQL SERVER xml with CDATA

I have a table in my database with a column containing xml. The column type is nvarchar(max). The xml is formed in this way
<root>
<child>....</child>
.
.
<special>
<event><![CDATA[text->text]]></event>
<event><![CDATA[text->text]]></event>
...
</special>
</root>
I have not created the db, I cannot change the way information is stored in it but I can retrieve it with a select. For the extraction I use
select cast(replace(xml,'utf-8','utf-16')as xml)
from table
It works well except for cdata, whose content in the query output is: text -> text
Is there a way to retrieve also the CDATA tags?
Well, this is - as far as I know - not possible on normal ways...
The CDATA section has one sole reason: include invalid characters within XML for lazy people...
CDATA is not seen as needed at all and therefore is not really supported by normal XML methods. Or in other words: It is supported in the way, that the content is properly escaped. There is no difference between correctly escaped content and not-escaped content within CDATA actually! (Okay, there are some minor differences like including ]]> within a CDATA-section and some more tiny specialties...)
The big question is: Why?
What are you trying to do with this afterwards?
Try this. the included text is given as is:
DECLARE #xml XML =
'<root>
<special>
<event><![CDATA[text->text]]></event>
<event><![CDATA[text->text]]></event>
</special>
</root>'
SELECT t.c.query('text()')
FROM #xml.nodes('/root/special/event') t(c);
So: Please explain some more details: What do you really want?
If your really need nothing more than the wrapping CDATA you might use this:
SELECT '<![CDATA[' + t.c.value('.','varchar(max)') + ']]>'
FROM #xml.nodes('/root/special/event') t(c);
Update: Same with outdated FROM OPENXML
I just tried how the outdated approach with FROM OPENXML handles this and found, that there is absolutely no indication in the resultset, that the given text was within a CDATA section originally. The "Some value here" is exactly returned in the same way as the text within CDATA:
DECLARE #doc XML =
'<root>
<child>Some value here </child>
<special>
<event><![CDATA[text->text]]></event>
<event><![CDATA[text->text]]></event>
</special>
</root>';
DECLARE #hnd INT;
EXEC sp_xml_preparedocument #hnd OUTPUT, #doc;
SELECT * FROM OPENXML (#hnd, '/root',0);
EXEC sp_xml_removedocument #hnd;
This is how to include cdata on child nodes in XML, using pure SQL. But; it's not ideal.
SELECT 1 AS tag,
null AS parent,
'10001' AS 'Customer!1!Customer_ID!Element',
'AirBallon Captain' AS 'Customer!1!Title!cdata',
'Customer!1' = (
SELECT
2 AS tag,
NULL AS parent,
'Wrapped in cdata, using explicit' AS 'Location!2!Title!cdata'
FOR XML EXPLICIT)
FOR XML EXPLICIT, ROOT('Customers')
CDATA is included, but Child element is encoded using
>
instead of >
Which is so weird from a sensable point of view. I'm sure there are technical explanations, but they are stupid, because there is no difference in the FOR XML specification.
You could include the option type on the inner child node and then loose cdata too..
BUT WHY OH WHY?!?!?!?! would you (Microsoft) remove cdata, when I just added it?
<Customers>
<Customer>
<Customer_ID>10001</Customer_ID>
<Title><![CDATA[AirBallon Captain]]></Title>
<Location>
<Title><![CDATA[wrapped in cdata, using explicit]]></Title>
</Location>
</Customer>
</Customers>

SQL For XML PATH - Redundant xmlns throughout result and how to return a node with an attribute AND value

Removing all the extraneous information from my initial posting and focusing on two items:
The majority of my data is just <element>value</element> but certain parts call out these namespace prefixes:
<componentList>
<InstantiatedBillingRateComponent>
<definition xsi:type="PSPM">
...
</definition>
Looking at this as a model -
How to return XML from SQL Server 2008 that is structured with multiple selections sharing a common parent
Select BillRateType as "#xsi:type"
...
for xml path('definition'),type,elements
which gave me the error:
XML name space prefix 'xsi' declaration is missing for FOR XML column name '#xsi:type'.
So I did some research on the WITH xmlnamespaces clause and added this to the beginning of the query:
WITH xmlnamespaces ('http://www.w3.org/2001/XMLSchema-instance' as xsi, 'http://www.w3.org/2001/XMLSchema' AS xsd )
I received no error but got this:
<definition xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="PSPM">
and received both xmlns: declarations ALL throughout the entire result (looks like every nested level at the XML PATH or Root if called out)
Problem 1: How can I get rid of all the xmlns (except the top level)
Problem 2: If my column called BillCat has the value COBRA how do I achieve this output?
<billingCategory>
<ID xsi:type="xsd:string">COBRA</ID>
<ID xsi:type="xsd:string">BillingCategory</ID>
</billingCategory>
Update: For this problem 2 I am getting closer - I'm unsure how to mix the element attributes and values together.
If I execute this:
WITH xmlnamespaces ('...://www.w3.org/2001/XMLSchema-instance' as xsi, '...://www.w3.org/2001/XMLSchema' AS xsd )
select
top 1
'xsd:string' as "#xsi:type"
,BillCat
from
##chris_global
for xml path('ID'),type,elements,root('billingCategory')
I get:
<billingCategory xmlns:xsd="....://www.w3.org/2001/XMLSchema" xmlns:xsi="...://www.w3.org/2001/XMLSchema-instance">
<ID xsi:type="xsd:string">
<BillCat>COBRA</BillCat>
</ID>
</billingCategory>
but i want:
<billingCategory>
<ID xsi:type="xsd:string">COBRA</ID>
<ID xsi:type="xsd:string">BillingCategory</ID>
</billingCategory>
I keep looking for query samples that show element with an attribute AND element value like above but cannot find one.
Update -
I found something that seems to work from: [can't post]
WITH xmlnamespaces ('http://www.w3.org/2001/XMLSchema-instance' as xsi, 'http://www.w3.org/2001/XMLSchema' AS xsd )
select
top 1
'xsd:string' as "#xsi:type"
,BillCat as 'text()'
from
##chris_global
for xml path('ID'),type,elements,root('billingCategory')
gave me:
<billingCategory xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<ID xsi:type="xsd:string">COBRA</ID>
</billingCategory>
So I guess I'm just looking for ways to remove the redundant xmlns from everything (short of parsing it out after the fact)

How to ignore XML namespace when creating SQL request?

I have many rows in a DB which contain XML data field. XML approximately looks like this:
<CabasEstimateReply xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="https://cabmb.cab.se/schemas/CABMBGeneralSchemas/CABASEstimateReply/2006-11-16/">
<Estimate xmlns="">
<WorkshopCompanyId>C002006893</WorkshopCompanyId>
<EstimateId>1-SE-AEB965-634921885183891313</EstimateId>
</Estimate>
<EstimateReply xmlns="">
**<EstimateReplyCode>ReplyStatus1</EstimateReplyCode>**
<EstimateReplyVersion>1</EstimateReplyVersion>
<EstimateReplyDate>2013-05-31T11:40:18.6227322+03:00</EstimateReplyDate>
<EstimateReplyComment />
<EstimateReplyMessage>Kunden betalar : 8692 Fakturaadress : Trygg Hansa</EstimateReplyMessage>
<EstimateReplyMessageCompressMethod />
<EstimateReplyReference>010704</EstimateReplyReference>
<EstimateReplyForthcomingInspectionDate />
</EstimateReply>
<Vehicle xmlns="">
<VehicleRegNo>XND108</VehicleRegNo>
<VehicleMake>BMW</VehicleMake>
<VehicleModel>525I TOURING</VehicleModel>
<VehicleModelYear />
<VehicleModelMonth />
<VehicleVINCode />
<VehicleChassiNo>NL51010CM95684</VehicleChassiNo>
<VehicleFirstRegistered>2006-02-23T00:00:00</VehicleFirstRegistered>
<Imported>null</Imported>
</Vehicle>
I need to have a possibility to get a value EstimateReplyCode(marked with bold) via SQL request. I'm doing this like:
;WITH XMLNAMESPACES(DEFAULT 'https://cabmb.cab.se/schemas/CABMBGeneralSchemas/CABASEstimateReply/2006-11-16/')
select [Data],
Data.value('(/CabasEstimateReply/EstimateReply/EstimateReplyCode)[1]', 'nvarchar(64)') AS ReplyCode
from EstimateReplyRawData
But get only null values for ReplyCode. When I tried to convert XML to string, then replace namespaces and then convert to XML back everything worked well, that's why I suppose that the issue is the namespace. What am I doing wrong here?
If you really want to ignore namespaces, you can use namespace wildcards.
select [Data],
Data.value('(/*:CabasEstimateReply/*:EstimateReply/*:EstimateReplyCode)[1]', 'nvarchar(64)') AS ReplyCode
from EstimateReplyRawData

Configuring namespace for sp_xml_preparedocument

I have an RSS xml with this format:
<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:media="http://search.yahoo.com/mrss/" version="2.0">
<channel>
<title></title>
<link></link>
<description></description>
<language></language>
<lastBuildDate></lastBuildDate>
<generator></generator>
<docs></docs>
<managingEditor></managingEditor>
<webMaster></webMaster>
<ttl></ttl>
<item>
<title></title>
<link></link>
<description></description>
<guid isPermaLink="false"></guid>
<pubDate></pubDate>
<author></author>
<dc:date></dc:date>
<dc:publisher></dc:publisher>
<dc:language></dc:language>
</item>
<item>
<title></title>
<link></link>
<description></description>
<guid isPermaLink="false"></guid>
<pubDate></pubDate>
<author></author>
<dc:date></dc:date>
<dc:publisher></dc:publisher>
<dc:language></dc:language>
</item>
</channel>
</rss>
And I want to parse it with sp_xml_preparedocument in SQLServer.
My problem is the "namespce" field. There are three tags in each item which has namespace, and I don't know how to specify them.
I have tried this:
EXEC sp_xml_preparedocument #hDoc OUTPUT, #xmlContent,'<item xmlns:dc="http://purl.org/dc/elements/1.1/"/>'
but it just parse the first item and forget the rest!
Any idea?
The fact that you are only getting one row has nothing to do with the namespace. You have some error in your openxml query against #hDoc.
There might be reasons for you to still use openxml but until you show the query that is not working for you I will suggest you use the XML data type instead.
with xmlnamespaces('http://purl.org/dc/elements/1.1/' as dc)
select C.N.value('(title/text())[1]', 'nvarchar(100)') as channel_title,
I.N.value('(title/text())[1]', 'nvarchar(100)') as item_title,
I.N.value('(dc:publisher/text())[1]', 'nvarchar(100)') as publisher
from #XML.nodes('/rss/channel') as C(N)
cross apply C.N.nodes('item') as I(N);
SQL Fiddle
The namespace needs to be defined as a character type:
EXEC sp_xml_preparedocument #hDoc OUTPUT, #xmlContent,'<item xmlns:dc="http://purl.org/dc/elements/1.1/"/>'
[ xpath_namespaces ]
Specifies the namespace declarations that are used in row and column XPath expressions in OPENXML. xpath_namespaces is a text parameter: char, nchar, varchar, nvarchar, text, ntext or xml.
The default value is . xpath_namespaces provides the namespace URIs for the prefixes used in the XPath expressions in OPENXML by means of a well-formed XML document. xpath_namespaces declares the prefix that must be used to refer to the namespace urn:schemas-microsoft-com:xml-metaprop; this provides metadata about the parsed XML elements. Although you can redefine the namespace prefix for the metaproperty namespace by using this technique, this namespace is not lost. The prefix mp is still valid for urn:schemas-microsoft-com:xml-metaprop even if xpath_namespaces contains no such declaration.
http://msdn.microsoft.com/en-us/library/ms187367.aspx