SQL server 2008 patindex recursion - sql

I want to find the latest instance of an expression, then keep looking to find there a better match and then choose the best match.
The cell I am looking at is a repeatedly apended log with notes followed by the username and timestamp.
Example cell contents:
Starting the investigation.
JWAYNE entered the notes above on 08/12/1976 12:01
Taking over the case. Not a lot of progress recently.
CEASTWOOD entered the notes above on 03/14/2001 09:04
No wonder this case is not progressing, the whole town is covering up some shenanigans!
CEASTWOOD entered the notes above on 03/21/2001 05:23
Star command was right, this investigation has been tossed around like a hot potato for a long time!
BLIGHTYEAR entered the notes above on 08/29/2659 08:01
I am not an expert on database normal form rules but it is annoying that the entries are jammed together into one cell making my job of isolating and checking the notes for specific words, especially when the cell is duplicated for multiple rows until the investigation is closed which puts the notes from future phases into the note column of past events and on top of that the time stamps making a timestamp PATINDEX with even a few minute margin unreliable like this:
CaseID, Username, Notes, Phase, Timestamp
E18902, JWAYNE, Starting....08:01, E1, 03/14/2001 09:13
E18902, CEASTWOOD, Starting....08:01, E2, 03/14/2001 09:13
E18902, CEASTWOOD, Starting....08:01, E3, 03/21/2001 05:34
E18902, BLIGHTYEAR,Starting....08:01, E4, 08/29/2659 07:58
Right now I am doing a reverse on the whole string then a patindex to find the username then substringing to select only the note for that phase of the investigation and the problem is when the same user enters notes for multiple phases my simple "look for the first match staring at the end of the string moving to the top" picks up the wrong entry. My first thought is to search for the username and then check again to see if an entry further up is a better match (note time stamp vs column time stamp) but I am not sure how to code that...
Do i have to get into complicated string splits or is there a more simple solution?

Here's my suggestion. This is for one record, but you can convert it to a user-defined table-valued function, if you like.
I'm going to use the example data you had above.
declare #sourceText nvarchar(max)
, #workText nvarchar(max)
, #xml xml
set #sourceText = <your example text in your question>
set #workText = #sourceText
-- We're going to replace all the carriage returns and line feeds with
-- characters unlikely to appear in your text. (If they are, use some
-- other character.)
set #workText = REPLACE(#workText, char(10), '|')
set #workText = REPLACE(#workText, char(13), '|')
-- Now, we're going to turn your text into XML. Our first target is
-- the string of four "|" characters that the blank lines between entries
-- will be turned into. (If you've got 3, or 6, or blanks in between,
-- adjust accordingly.)
set #workText = REPLACE(#workText, '||||', '</line></entry><entry><line>')
-- Now we replace every other "|".
set #workText = REPLACE(#workText, '|', '</line><line>')
-- Now we construct the rest of the XML and convert the variable to an
-- actual XML variable.
set #workText = '<entry><line>' + #workText + '</line></entry>'
set #workText = REPLACE(#workText, '<line></line>','') -- Get rid of any empty nodes.
set #xml = CONVERT(xml, #workText)
We should now have an XML fragment that looks like this. (You can see it if you insert select #xml into the SQL at this point.)
<entry>
<line>Starting the investigation.</line>
<line>JWAYNE entered the notes above on 08/12/1976 12:01</line>
</entry>
<entry>
<line>Taking over the case. Not a lot of progress recently.</line>
<line>CEASTWOOD entered the notes above on 03/14/2001 09:04</line>
</entry>
<entry>
<line>No wonder this case is not progressing, the whole town is covering up some shenanigans!</line>
<line>CEASTWOOD entered the notes above on 03/21/2001 05:23</line>
</entry>
<entry>
<line>Star command was right, this investigation has been tossed around like a hot potato for a long time!</line>
<line>BLIGHTYEAR entered the notes above on 08/29/2659 08:01</line>
</entry>
We can now transform this XML into XML we like better:
set #xml = #xml.query(
'for $entry in /entry
return <entry><data>
{
for $line in $entry/line[position() < last()]
return string($line)
}
</data>
<timestamp>{ data($entry/line[last()]) }</timestamp>
</entry>
')
This gives us XML that looks like this (just one entry shown, for length reasons):
<entry>
<data>Starting the investigation.</data>
<timestamp>JWAYNE entered the notes above on 08/12/1976 12:01</timestamp>
</entry>
You can convert this back to tabular data with this query:
select EntryData = R.lines.value('data[1]', 'nvarchar(max)')
, EntryTimestamp = R.lines.value('timestamp[1]', 'nvarchar(MAX)')
from #xml.nodes('/entry') as R(lines)
... and get data that looks like this.
And from there, you can do whatever you need to do.

Related

Finding every instance of XML Element in SQL output

I'm a beginner when it comes to SQL and have no experience with XML so I'm after a little bit of help.
At the moment I am looking at a single table and just using the query below
select
name,
convert(xml, convert(varbinary(max), orders)) ClientOrders
from client;
In the second columns of SQL output, I have a very lengthy bit of XML similar to the example below. I've used "..." just to skip over some of the output and give a general idea.
Name
ClientOrders
Client1
<report ... ><QueryParameter></QueryParameter Name = "#hello1"><commandtext> ...<value>Example1</value>....<value>Example2</value>...<value>Example3</value>...</commandtext></report>
Client2
<report ... ><QueryParameter></QueryParameter Name = "#hello2"><commandtext> ...<value>Example4</value>....<value>Example5</value>...<value>Example6</value>...</commandtext></report>
I have this for a lot of rows and this output is so long that it exceeds the Excel cell character limit. I'm only looking for the values Example1 through to Example6 in the example given above. Is there an SQL command I can run to get the above string between the open and close value?
I am using SSMS version 18.9.1
Cheers

Reading dynamic XML nodes in SQL Server

I have the following XML structure:
set #MailXML =
'<MailingCompany>
<Mailman>
<Name>Jamie</Name>
<Age> 24 </Age>
<Letter>
<DestinationAddress> 440 Mountain View Parade </DestinationAddress>
<DestinationCountry> USA </DestinationCountry>
<OriginCountry> Australia </OriginCountry>
<OriginAddress> 120 St Kilda Road </OriginAddress>
</Letter>
</Mailman>
</MailingCompany>'
My SQL currently looks like this:
-- Mail Insertion
INSERT INTO mailDB.dbo.Mailman
SELECT
m.value('Name[1]','varchar(50)') as Name,
m.value('Age[1]','varchar(50)') as Age
FROM
#MailXML.nodes('/MailingCompany/Mailman') as A(m)
SET #MailPersonFK = SCOPE_IDENTITY();
-- Letter Insertion
INSERT INTO mailDB.dbo.Letter
SELECT
l.value('DestinationAddress[1]', 'varchar(50)') as DestinationAddress,
l.value('DestinationCountry[1]', 'varchar(50)') as DestinationCountry,
l.value('OriginCountry[1]', 'varchar(50)') as OriginCountry,
l.value('OriginAddress[1]', 'varchar(50)') as OriginAddress
#MailPersonFK as MailID
FROM
#MailXML.nodes('MailingCompany/Mailman/Letter') as B(l)
I am trying to extract the Mailman and Letter data into their own respective tables. I have got that working however my issue is that the MailCompany node is dynamic. Sometimes it may be MailVehicle, for example, and I still need
to read the corresponding Mailman and Letter node data and insert them into their own respective tables.
So both
FROM #MailXML.nodes('/MailingCompany/Mailman') as A(t)
and
FROM #MailXML.nodes('MailingCompany/Mailman/Letter') as B(l)
Will need to be changed to allow MailingCompany to be dynamic.
I have tried to extract the parent node and concatenate it into a string to put into the .nodes function like the following:
set #DynXML = '/' + #parentNodeVar + '/Mailman'
FROM #MailXML.nodes(#DynXML) as A(t)
However I get the following error:
The argument 1 of the XML data type method "nodes" must be a string literal.
How can I overcome this dynamic XML issue?
Thank you very much in advance
Look at this reduced example:
DECLARE #xml1 XML=
N'<MailingCompany>
<Mailman>
<Name>Jamie</Name>
<Letter>
<DestinationAddress> 440 Mountain View Parade </DestinationAddress>
</Letter>
</Mailman>
</MailingCompany>';
DECLARE #xml2 XML=
N'<OtherName>
<Mailman>
<Name>Jodie</Name>
<Letter>
<DestinationAddress> This is the other address </DestinationAddress>
</Letter>
</Mailman>
</OtherName>';
SELECT #xml1.value(N'(*/Mailman/Name)[1]','nvarchar(max)') AS Mailman_Name
,#xml1.value(N'(*/Mailman/Letter/DestinationAddress)[1]','nvarchar(max)') AS DestinationAddress
SELECT #xml2.value(N'(*/Mailman/Name)[1]','nvarchar(max)') AS Mailman_Name
,#xml2.value(N'(*/Mailman/Letter/DestinationAddress)[1]','nvarchar(max)') AS DestinationAddress
You can replace a node's name with *.
Another trick is the deep search with // (same result as before):
SELECT #xml1.value(N'(//Name)[1]','nvarchar(max)') AS Mailman_Name
,#xml1.value(N'(//DestinationAddress)[1]','nvarchar(max)') AS DestinationAddress
SELECT #xml2.value(N'(//Name)[1]','nvarchar(max)') AS Mailman_Name
,#xml2.value(N'(//DestinationAddress)[1]','nvarchar(max)') AS DestinationAddress
The general rule: Be as specific as possible.

Avoid encoding of ampersand in SQL XML attribute

Image I have a currency table, containing e.g. this record :
- Id = 1
- Code = EUR
- Symbol = €
Important to notice :
The input in our database is already property HTML-encoded!
Now, when I use this SQL statement :
SELECT '#id' = Currency.Id
, '#code' = Currency.Code
, '#symbol' = Currency.Symbol
FROM Currency
FOR XML PATH('currency')
, ROOT('list')
, TYPE
;
...it unfortunately results into the following XML :
<list><currency id="1" code="EUR" symbol="&euro;" /></list>
Notice that the Euro-symbol has been re-encoded, rendering it invalid.
How can I avoid that? How can I obtain the following XML output :
<list><currency id="1" code="EUR" symbol="€" /></list>
The result you get unfortunately and is re-encoded and invalid is perfectly correct - but not what you expect. You pass in € which is a string. Within XML this is escaped as &euro; and will be re-coded to €.
You must stop to think of XML as kind of formalized string. This is a technical issue. XML will handle this implicitly.
There are two ways:
Go the string-way and cast your XML to NVARCHAR, do any string manipulation you might want (e.g. REPLACE(myXML,'&euro;','€') and cast back to XML or
(I'd prefer this!) hand in the € as actual symbol and let the XML engine do the encoding.
EDIT
One more thing: SQL Server doesn't know the € entity. Try with € or €:
SELECT '€' AS [#EuroSign] --works
,'€' AS [#NamedEscapedEuro] --will be encoded
,'€' AS [#EscapedEuro] --will be encoded
FOR XML PATH('TestEuro'),ROOT('root')
SELECT --CAST('<x>'+'€'+'</x>' AS XML).value('/x[1]','nvarchar(10)') AS [#EuroSign] --not allowed!!!
--CAST('<x>'+'€'+'</x>' AS XML).value('/x[1]','nvarchar(10)') AS [#NamedEscapedEuro] --not allowed, exists, but not known in SQL Server!
CAST('<x>'+'€'+'</x>' AS XML).value('/x[1]','nvarchar(10)') AS [#EscapedEuro] --works
FOR XML PATH('TestEuro'),ROOT('root')

SQL, Find node value in xml variable, if it exists insert additional nodes into xml variable

I've got a Stored Procedure in SQL, where I have the following declaration:
Declare #fields xml
My SP gets passed values from the front end and then gets executed. The values it gets passed looks like this depending on what the user selects from the front end. For the purpose of this example I have included only 3 ID's.
'<F><ID>979</ID><ID>1000</ID><ID>989</ID></F>'
My question is this:
How can I find the node = 1000 and if that is present (exists) then insert (add) to 2 additional nodes,
<ID>992</ID><ID>993</ID>
to my existing '<F><ID>979</ID><ID>1000</ID><ID>989</ID></F>' xml.
If <ID>1000</ID> isn't present do nothing.
So, end result should be something like this if 1000 is present.
<F><ID>979</ID><ID>1000</ID><ID>989</ID><ID>992</ID><ID>993</ID></F>
If not, the result should stay:
<F><ID>979</ID><ID>1000</ID><ID>989</ID></F>
I just can't get my head around this?
Check this:
declare #fields xml = '<F><ID>979</ID><ID>1000</ID><ID>989</ID></F>'
, #add xml = '<ID>992</ID><ID>993</ID>'
;
if #fields.exist('/F[1]/ID[text()="1000"]') = 1
set #fields.modify('insert sql:variable("#add") as last into /F[1]');
select #fields

Better way in TSQL to search xml for a node that doesn't exist

We have a source XML file that has an address node, and each node is supposed to have a zip_code node beneath in order to validate. We received a file that failed the schema validation because at least one node was missing it's zip_code (there were several thousand addresses in the file).
We need to find the elements that do not have a zip code, so we can repair the file and send an audit report to the source.
--declare #x xml = bulkcolumn from openrowset(bulk 'x:\file.xml',single_blob) as s
declare #x xml = N'<addresses>
<address><external_address_id>1</external_address_id><zip_code>53207</zip_code></address>
<address><external_address_id>2</external_address_id></address>
</addresses>'
declare #t xml = (
select #x.query('for $a in .//address
return
if ($a/zip_code)
then <external_address_id />
else $a/external_address_id')
)
select x.AddressID.value('.', 'int') AddressID
from #t.nodes('./external_address_id') x(AddressID)
where x.AddressID.value('.', 'int') > 0
GO
Really, it's the where clause that bugs me. I feel like I'm depending on a cast for a null value to 0, and it works, but I'm not really sure that it should. I tried a few variations with the .exist function, but I couldn't get the correct result.
If you just want to ensure that you are selecting address elements that have a zip_code element, then adjust your XPATH to include that criteria in a predicate filter:
/addresses/address[zip_code]
If you also want to ensure that the zip_code element also has a value, use a predicate filter for the zip_node to select those that have text() nodes:
/addresses/address[zip_code[text()]]
EDIT:
Actually, I'm looking for the
opposite. I need to identify the nodes
that don't have a zip, so we can
manually correct the source data.
So, if you want to identify all of the address elements that do not have a zip_code, you can specify it in the XPATH like this:
/addresses/address[not(zip_code)]
If you just want to locate those nodes that are missing their <zip_code> element, you could use something like this:
SELECT
ADRS.ADR.value('(external_address_id)[1]', 'int') as 'ExtAdrID'
FROM
#x.nodes('/addresses/address') as ADRS(ADR)
WHERE
ADRS.ADR.exist('zip_code') = 0
It uses the built-in .exist() method in XQuery to check the existence of a subnode inside an XML node.