Reading dynamic XML nodes in SQL Server - sql

I have the following XML structure:
set #MailXML =
'<MailingCompany>
<Mailman>
<Name>Jamie</Name>
<Age> 24 </Age>
<Letter>
<DestinationAddress> 440 Mountain View Parade </DestinationAddress>
<DestinationCountry> USA </DestinationCountry>
<OriginCountry> Australia </OriginCountry>
<OriginAddress> 120 St Kilda Road </OriginAddress>
</Letter>
</Mailman>
</MailingCompany>'
My SQL currently looks like this:
-- Mail Insertion
INSERT INTO mailDB.dbo.Mailman
SELECT
m.value('Name[1]','varchar(50)') as Name,
m.value('Age[1]','varchar(50)') as Age
FROM
#MailXML.nodes('/MailingCompany/Mailman') as A(m)
SET #MailPersonFK = SCOPE_IDENTITY();
-- Letter Insertion
INSERT INTO mailDB.dbo.Letter
SELECT
l.value('DestinationAddress[1]', 'varchar(50)') as DestinationAddress,
l.value('DestinationCountry[1]', 'varchar(50)') as DestinationCountry,
l.value('OriginCountry[1]', 'varchar(50)') as OriginCountry,
l.value('OriginAddress[1]', 'varchar(50)') as OriginAddress
#MailPersonFK as MailID
FROM
#MailXML.nodes('MailingCompany/Mailman/Letter') as B(l)
I am trying to extract the Mailman and Letter data into their own respective tables. I have got that working however my issue is that the MailCompany node is dynamic. Sometimes it may be MailVehicle, for example, and I still need
to read the corresponding Mailman and Letter node data and insert them into their own respective tables.
So both
FROM #MailXML.nodes('/MailingCompany/Mailman') as A(t)
and
FROM #MailXML.nodes('MailingCompany/Mailman/Letter') as B(l)
Will need to be changed to allow MailingCompany to be dynamic.
I have tried to extract the parent node and concatenate it into a string to put into the .nodes function like the following:
set #DynXML = '/' + #parentNodeVar + '/Mailman'
FROM #MailXML.nodes(#DynXML) as A(t)
However I get the following error:
The argument 1 of the XML data type method "nodes" must be a string literal.
How can I overcome this dynamic XML issue?
Thank you very much in advance

Look at this reduced example:
DECLARE #xml1 XML=
N'<MailingCompany>
<Mailman>
<Name>Jamie</Name>
<Letter>
<DestinationAddress> 440 Mountain View Parade </DestinationAddress>
</Letter>
</Mailman>
</MailingCompany>';
DECLARE #xml2 XML=
N'<OtherName>
<Mailman>
<Name>Jodie</Name>
<Letter>
<DestinationAddress> This is the other address </DestinationAddress>
</Letter>
</Mailman>
</OtherName>';
SELECT #xml1.value(N'(*/Mailman/Name)[1]','nvarchar(max)') AS Mailman_Name
,#xml1.value(N'(*/Mailman/Letter/DestinationAddress)[1]','nvarchar(max)') AS DestinationAddress
SELECT #xml2.value(N'(*/Mailman/Name)[1]','nvarchar(max)') AS Mailman_Name
,#xml2.value(N'(*/Mailman/Letter/DestinationAddress)[1]','nvarchar(max)') AS DestinationAddress
You can replace a node's name with *.
Another trick is the deep search with // (same result as before):
SELECT #xml1.value(N'(//Name)[1]','nvarchar(max)') AS Mailman_Name
,#xml1.value(N'(//DestinationAddress)[1]','nvarchar(max)') AS DestinationAddress
SELECT #xml2.value(N'(//Name)[1]','nvarchar(max)') AS Mailman_Name
,#xml2.value(N'(//DestinationAddress)[1]','nvarchar(max)') AS DestinationAddress
The general rule: Be as specific as possible.

Related

Parse XML data from a column in SQL Server

I'm trying to put together a report for a badge program. Some the data for custom fields we created are stored in a single column in the table as XML. I need to return a couple of the items in there to the report and am having a hard time getting the proper syntax to get it to parse out.
Aside from the XML, the query itself is simple:
SELECT
Person1.firstName
,Person1.lastName
,Person1.idNumber
,Person1.idNumber2
,Person1.idNumber3
,Person1.status
,Person1.customdata
FROM
Person1
the field "customdata" is the XML field that I need to pull Title, and 2 different dates out of. This is what the XML looks like:
<person1_7:CustomData xmlns:person1_7="http://www.badgepass.com/Person1_7">
<Title>IT Director</Title>
<Gaming_x0020_Level>Level 1</Gaming_x0020_Level>
<Gaming_x0020_Issue_x0020_Date>2021-02-18T12:00:00Z</Gaming_x0020_Issue_x0020_Date>
<Gaming_x0020_Expire_x0020_Date>2022-02-18T12:00:00Z</Gaming_x0020_Expire_x0020_Date>
<Betting_x0020_Level>Level 1</Betting_x0020_Level>
<Betting_x0020_Issue_x0020_Date>2021-02-18T12:00:00Z</Betting_x0020_Issue_x0020_Date>
<Betting_x0020_Expire_x0020_Date>2022-02-18T12:00:00Z</Betting_x0020_Expire_x0020_Date>
<BadgeType>Dual Employee</BadgeType>
<Gaming_x0020_Status>TEMP</Gaming_x0020_Status>
<Betting_x0020_Status>TEMP</Betting_x0020_Status>
</person1_7:CustomData>
I have tried a couple of different methods trying to follow the advice from How to query for Xml values and attributes from table in SQL Server? and then tried declaring a XML namespace with the following query:
WITH XMLNAMESPACES ('http://www.badgepass.com/Person1_7' as X)
SELECT
Person1.firstName
,Person1.lastName
,Person1.idNumber
,Person1.idNumber2
,Person1.idNumber3
,Person1.status
,Person1.customdata.value('(/X:person1_7:customdata/X:Title)[1]', 'varchar(100)') AS Title
FROM
Person1
So far all of my results keep returning "XQuery [Person1.customData.value()]: ")" was expected.
" I'm assuming I have a syntax issue that I'm overlooking as I've never had to manipulate XML with SQL before. Thank you in advance for any help.
Please try the following solution.
Notable points:
XQuery .nodes() method establishes a context so you can access any
XML element right away without long XPath expressions.
Use of the text() in the XPath expressions is for performance
reasons.
SQL
-- DDL and sample data population, start
DECLARE #person1 TABLE (firstname varchar(50), customdata xml);
INSERT INTO #person1(firstname, customdata) VALUES
('John', '<person1_7:CustomData xmlns:person1_7="http://www.badgepass.com/Person1_7">
<Title>IT Director</Title>
<Gaming_x0020_Level>Level 1</Gaming_x0020_Level>
<Gaming_x0020_Issue_x0020_Date>2021-02-18T12:00:00Z</Gaming_x0020_Issue_x0020_Date>
<Gaming_x0020_Expire_x0020_Date>2022-02-18T12:00:00Z</Gaming_x0020_Expire_x0020_Date>
<Betting_x0020_Level>Level 1</Betting_x0020_Level>
<Betting_x0020_Issue_x0020_Date>2021-02-18T12:00:00Z</Betting_x0020_Issue_x0020_Date>
<Betting_x0020_Expire_x0020_Date>2022-02-18T12:00:00Z</Betting_x0020_Expire_x0020_Date>
<BadgeType>Dual Employee</BadgeType>
<Gaming_x0020_Status>TEMP</Gaming_x0020_Status>
<Betting_x0020_Status>TEMP</Betting_x0020_Status>
</person1_7:CustomData>');
-- DDL and sample data population, end
WITH XMLNAMESPACES ('http://www.badgepass.com/Person1_7' as person1_7)
SELECT firstName
, c.value('(Title/text())[1]', 'VARCHAR(100)') AS Title
, c.value('(Gaming_x0020_Issue_x0020_Date/text())[1]', 'DATETIME') GamingIssueDate
FROM #person1
CROSS APPLY customdata.nodes('/person1_7:CustomData') AS t(c);
Output
+-----------+-------------+-------------------------+
| firstName | Title | GamingIssueDate |
+-----------+-------------+-------------------------+
| John | IT Director | 2021-02-18 12:00:00.000 |
+-----------+-------------+-------------------------+

SQL server 2008 patindex recursion

I want to find the latest instance of an expression, then keep looking to find there a better match and then choose the best match.
The cell I am looking at is a repeatedly apended log with notes followed by the username and timestamp.
Example cell contents:
Starting the investigation.
JWAYNE entered the notes above on 08/12/1976 12:01
Taking over the case. Not a lot of progress recently.
CEASTWOOD entered the notes above on 03/14/2001 09:04
No wonder this case is not progressing, the whole town is covering up some shenanigans!
CEASTWOOD entered the notes above on 03/21/2001 05:23
Star command was right, this investigation has been tossed around like a hot potato for a long time!
BLIGHTYEAR entered the notes above on 08/29/2659 08:01
I am not an expert on database normal form rules but it is annoying that the entries are jammed together into one cell making my job of isolating and checking the notes for specific words, especially when the cell is duplicated for multiple rows until the investigation is closed which puts the notes from future phases into the note column of past events and on top of that the time stamps making a timestamp PATINDEX with even a few minute margin unreliable like this:
CaseID, Username, Notes, Phase, Timestamp
E18902, JWAYNE, Starting....08:01, E1, 03/14/2001 09:13
E18902, CEASTWOOD, Starting....08:01, E2, 03/14/2001 09:13
E18902, CEASTWOOD, Starting....08:01, E3, 03/21/2001 05:34
E18902, BLIGHTYEAR,Starting....08:01, E4, 08/29/2659 07:58
Right now I am doing a reverse on the whole string then a patindex to find the username then substringing to select only the note for that phase of the investigation and the problem is when the same user enters notes for multiple phases my simple "look for the first match staring at the end of the string moving to the top" picks up the wrong entry. My first thought is to search for the username and then check again to see if an entry further up is a better match (note time stamp vs column time stamp) but I am not sure how to code that...
Do i have to get into complicated string splits or is there a more simple solution?
Here's my suggestion. This is for one record, but you can convert it to a user-defined table-valued function, if you like.
I'm going to use the example data you had above.
declare #sourceText nvarchar(max)
, #workText nvarchar(max)
, #xml xml
set #sourceText = <your example text in your question>
set #workText = #sourceText
-- We're going to replace all the carriage returns and line feeds with
-- characters unlikely to appear in your text. (If they are, use some
-- other character.)
set #workText = REPLACE(#workText, char(10), '|')
set #workText = REPLACE(#workText, char(13), '|')
-- Now, we're going to turn your text into XML. Our first target is
-- the string of four "|" characters that the blank lines between entries
-- will be turned into. (If you've got 3, or 6, or blanks in between,
-- adjust accordingly.)
set #workText = REPLACE(#workText, '||||', '</line></entry><entry><line>')
-- Now we replace every other "|".
set #workText = REPLACE(#workText, '|', '</line><line>')
-- Now we construct the rest of the XML and convert the variable to an
-- actual XML variable.
set #workText = '<entry><line>' + #workText + '</line></entry>'
set #workText = REPLACE(#workText, '<line></line>','') -- Get rid of any empty nodes.
set #xml = CONVERT(xml, #workText)
We should now have an XML fragment that looks like this. (You can see it if you insert select #xml into the SQL at this point.)
<entry>
<line>Starting the investigation.</line>
<line>JWAYNE entered the notes above on 08/12/1976 12:01</line>
</entry>
<entry>
<line>Taking over the case. Not a lot of progress recently.</line>
<line>CEASTWOOD entered the notes above on 03/14/2001 09:04</line>
</entry>
<entry>
<line>No wonder this case is not progressing, the whole town is covering up some shenanigans!</line>
<line>CEASTWOOD entered the notes above on 03/21/2001 05:23</line>
</entry>
<entry>
<line>Star command was right, this investigation has been tossed around like a hot potato for a long time!</line>
<line>BLIGHTYEAR entered the notes above on 08/29/2659 08:01</line>
</entry>
We can now transform this XML into XML we like better:
set #xml = #xml.query(
'for $entry in /entry
return <entry><data>
{
for $line in $entry/line[position() < last()]
return string($line)
}
</data>
<timestamp>{ data($entry/line[last()]) }</timestamp>
</entry>
')
This gives us XML that looks like this (just one entry shown, for length reasons):
<entry>
<data>Starting the investigation.</data>
<timestamp>JWAYNE entered the notes above on 08/12/1976 12:01</timestamp>
</entry>
You can convert this back to tabular data with this query:
select EntryData = R.lines.value('data[1]', 'nvarchar(max)')
, EntryTimestamp = R.lines.value('timestamp[1]', 'nvarchar(MAX)')
from #xml.nodes('/entry') as R(lines)
... and get data that looks like this.
And from there, you can do whatever you need to do.

Using Fields[0].Value to get XML from FOR XML RAW, ELEMENTS query is messed up

I have a query that uses FOR XML RAW, ELEMENTS to return a SELECT query as a structured XML document. However, when I get the result using a TSQLDataSet by using Fields[0].Value, the result is different from what I see when I run the query in SQL Server Management Studio.
What I see in the result from the TSQLDataSet:
੄customerIdфname၄governmentNumberไdebtorAddress1ไdebtorAddress2ไdebtorAddress3ไdebtorAddress4ࡄpostCodeୄcontactNameՄphonë́faxൄcustomerSinceՄtermsلactiveไcurrentBalanceلDebtorခŁ䄁ഃӤ
What I see in the result in SSMS:
<Debtor>
<customerId>C0E449E5B2C </customerId>
<name>New Customer 2 </name>
<governmentNumber> </governmentNumber>
<debtorAddress1>Address Line 1 </debtorAddress1>
<debtorAddress4>Address Line 4 </debtorAddress4>
<postCode>1234 </postCode>
<phone>1234567890 </phone>
<fax>1234567890 </fax>
<customerSince>2013-06-10T18:16:06.213</customerSince>
<terms>M </terms>
<active>true</active>
<currentBalance>0.0000</currentBalance>
</Debtor>
Is there a particular way it should be executed to get the right result?
AFAIK this is a DbExpress limitation. I know how overcome this, but using ADO (the returned data must be requested using a special parametrized object and a set of ADO streams). However you can use a workaround converting the XML data to a string in the server side sorrounding the sentence with a select (subquery) or just using a simple CAST statement.
For example if you sentence is like so
SELECT Foo, Bar FROM FooTable FOR XML RAW, ELEMENTS
you can rewrite to
SELECT (SELECT Foo, Bar FROM FooTable FOR XML RAW, ELEMENTS)
or you can rewrite to (use a CAST VARCHAR or NVARCHAR)
SELECT CAST( (SELECT Foo, Bar FROM FooTable FOR XML RAW, ELEMENTS) AS VARCHAR(MAX))
and finally
Retrieve the result like this
SQLDataSet1.Fields[0].AsString

SQL, Find node value in xml variable, if it exists insert additional nodes into xml variable

I've got a Stored Procedure in SQL, where I have the following declaration:
Declare #fields xml
My SP gets passed values from the front end and then gets executed. The values it gets passed looks like this depending on what the user selects from the front end. For the purpose of this example I have included only 3 ID's.
'<F><ID>979</ID><ID>1000</ID><ID>989</ID></F>'
My question is this:
How can I find the node = 1000 and if that is present (exists) then insert (add) to 2 additional nodes,
<ID>992</ID><ID>993</ID>
to my existing '<F><ID>979</ID><ID>1000</ID><ID>989</ID></F>' xml.
If <ID>1000</ID> isn't present do nothing.
So, end result should be something like this if 1000 is present.
<F><ID>979</ID><ID>1000</ID><ID>989</ID><ID>992</ID><ID>993</ID></F>
If not, the result should stay:
<F><ID>979</ID><ID>1000</ID><ID>989</ID></F>
I just can't get my head around this?
Check this:
declare #fields xml = '<F><ID>979</ID><ID>1000</ID><ID>989</ID></F>'
, #add xml = '<ID>992</ID><ID>993</ID>'
;
if #fields.exist('/F[1]/ID[text()="1000"]') = 1
set #fields.modify('insert sql:variable("#add") as last into /F[1]');
select #fields

Better way in TSQL to search xml for a node that doesn't exist

We have a source XML file that has an address node, and each node is supposed to have a zip_code node beneath in order to validate. We received a file that failed the schema validation because at least one node was missing it's zip_code (there were several thousand addresses in the file).
We need to find the elements that do not have a zip code, so we can repair the file and send an audit report to the source.
--declare #x xml = bulkcolumn from openrowset(bulk 'x:\file.xml',single_blob) as s
declare #x xml = N'<addresses>
<address><external_address_id>1</external_address_id><zip_code>53207</zip_code></address>
<address><external_address_id>2</external_address_id></address>
</addresses>'
declare #t xml = (
select #x.query('for $a in .//address
return
if ($a/zip_code)
then <external_address_id />
else $a/external_address_id')
)
select x.AddressID.value('.', 'int') AddressID
from #t.nodes('./external_address_id') x(AddressID)
where x.AddressID.value('.', 'int') > 0
GO
Really, it's the where clause that bugs me. I feel like I'm depending on a cast for a null value to 0, and it works, but I'm not really sure that it should. I tried a few variations with the .exist function, but I couldn't get the correct result.
If you just want to ensure that you are selecting address elements that have a zip_code element, then adjust your XPATH to include that criteria in a predicate filter:
/addresses/address[zip_code]
If you also want to ensure that the zip_code element also has a value, use a predicate filter for the zip_node to select those that have text() nodes:
/addresses/address[zip_code[text()]]
EDIT:
Actually, I'm looking for the
opposite. I need to identify the nodes
that don't have a zip, so we can
manually correct the source data.
So, if you want to identify all of the address elements that do not have a zip_code, you can specify it in the XPATH like this:
/addresses/address[not(zip_code)]
If you just want to locate those nodes that are missing their <zip_code> element, you could use something like this:
SELECT
ADRS.ADR.value('(external_address_id)[1]', 'int') as 'ExtAdrID'
FROM
#x.nodes('/addresses/address') as ADRS(ADR)
WHERE
ADRS.ADR.exist('zip_code') = 0
It uses the built-in .exist() method in XQuery to check the existence of a subnode inside an XML node.