Query (column) that lacks a root node - sql

I am supposed to be extracting data from an XML column in sql server 2012. Some values for this column will have multiple nodes. Unfortunately, the XML does not have a root node, and so using the CROSS APPLY does not seem to work.
Simplified example:
<header><msg_type>TYPE_ONE</msg_type>
<status><status_1>aaaa</status_1><status_2>bbbb</status_2>
<node_1><customerID>1234</customerID><zipcode>11111</zipcode>...</node_1>
<node_2><customerID>1234</customerID><ordernum>12345</ordernum><data2>A</ordernum>...</node_2>
<node_2><customerID>1234</customerID><ordernum>34567></ordernum><data2>B</ordernum>...</node_2>
<node_3><customerID>1234</customerID><delivery>2014-05-05 14:00:00></delivery>...</node_3>
<node_1><customerID>ABCD</customerID><zipcode>12345</zipcode>...</node_1>
<node_2><customerID>ABCD</customerId><ordernum>123536</ordernum><data2>C</ordernum>...</node_2>
<node_3><customerID>ABCD</customerID><delivery>2014-05-05 16:00:00>...</node_3>
.
.
.
(... = more elements)
Here's an example using CROSS APPLY against one of the multi-node types:
select
t.InfoXML.value( '/node_1/customerID)[1]', 'varchar(50)' ) as CustomerId
, CA.Det.value( '(/node_2/ordernum)[1]', 'varchar(20)') as OrderNumber
, CA.Det.value( '(/node_2/data2)[1]', 'varchar(5)' ) as Data2
from TableUnderTest as t
CROSS APPLY t.InfoXML.nodes( '/node_2') as CA(Det)
where t.InfoXML.value( '(/header/msg_type)[1]', 'varchar(20)') = 'TYPE_ONE'
This had results
CustomerID OrderNumber Data2
================================
1234 12345 A
1234 12345 A
1234 12345 A
My current thought is to create a temporary table and insert the XML fields (that match the WHERE clause) after wrapping the XML value in a root node, and then trying to get the data from the temporary table. My current effort to set the first part up:
Declare #Rooted Table (Rec XML);
insert into #Rooted(Rec)
(
select (convert (XML, '<root>', + convert(varchar(MAX),
t.XmlData.query('./') + '<\root>')) as Rec
from TableUnderTest t
where t.XmlData.value( '(/header/msg_type)[1]', 'varchar(20)' ) = 'TYPE_ONE'
)
Right now, the above gives a syntax error.
What I want for output is something as follows:
CustomerID ZipCode OrderNumber Data2 Delivery status2
-------------------------------------------------------------------
1234 11111 12345 A 2014-05-05 14:00:00 aaaa
1234 11111 34567 B 2014-05-05 14:00:00 aaaa
ABCD 12345 123456 D 2014-05-05 15:00:00 aaaa
What would be the best approach to take? (This is for testing, not production, so performance is not critical.) I've only been learning to write sql queries for XML for the last month, so perhaps I'm overlooking something. It appears the critical issue is the lack of a root node for the XML, but how do I work around it?

Related

Subquery in where statement returns more than just null values

EDIT: Didn't realize the original post was in oracle however this question still remains.
I was attempting to answer another post as I thought I could answer it. Turns out I came to a solution but don't know why this works.
The question was how do you return rows where the subset_ID is the same but one record has data in the Parent columns and the other record(s) don't have data in the Parent columns?
The original table looks like this (unfortunately not much more to go off of):
Parent_ID
Parent_Name
Subset_Name
Subset_ID
Address
123456
SPECIAL
special_shop
9876
1234 road st
NULL
NULL
special_shop
9876
1234 road st
654321
NOT_SPECIAL
not_special_shop
9877
1258 diff st
654321
NOT_SPECIAL
not_special_shop
9877
1258 diff st
The solution would look like below:
Parent_ID
Parent_Name
Subset_Name
Subset_ID
Address
123456
SPECIAL
special_shop
9876
1234 road st
NULL
NULL
special_shop
9876
1234 road st
The solution I came up with that gives the above result is:
SELECT *
FROM #Sometable
WHERE
Subset_ID in (SELECT ST2.Subset_ID FROM #Sometable ST2 WHERE ST2.Parent_ID IS NULL)
AND
Subset_ID in (SELECT ST3.Subset_ID FROM #Sometable ST3 WHERE ST3.Parent_Name IS NULL)
Why does this work when just the subquery ran alone returns:
Parent_ID
Parent_Name
Subset_Name
Subset_ID
Address
NULL
NULL
special_shop
9876
1234 road st
Would this always work and why does it work? If not always what are the edge cases?
The subquery is basically only used to find the relevant Subset_IDs. As your subquery results show, only one Subset_ID is relevant (9876) in the source data. When used in the WHERE clause, you're saying, "return all the records from #Sometable where the Subset_ID = 9876". In the source data, there are only 2 records with Subset_ID = 9876 and they are both returned.

SQL SPLIT with New Row

I have an MS Sql databse where I have following data:
ID | ORD_No | Date | User | Note
-----+--------------+------------+------------+---------------
1 | 18/UT00120/ZS| | |---- Saved 10/10/2020 14:08 by John Snow, rest of the note
----Saved on 11/11/2020 13:09 by Mike Kowalsky, rest of the
note ---- Saved on 12/11/2020 11:00 by Barbara Smith, rest of the note
From that I want to create following output:
ID | ORD_No | Date | User | Note
-----+--------------+------------+----------------+---------------
1 | 18/UT00120/ZS| 10/10/2020 | John Snow | rest of the note
-----+--------------+------------+----------------+---------------
2 | 18/UT00120/ZS| 11/11/2020 | Mike Kowalsky | rest of the note
-----+--------------+------------+----------------+---------------
3 | 18/UT00120/ZS| 12/11/2020 | Barbara Smith | rest of the note
Please adivce how can I achive required output.
Thanks!
SQL Server does not have very good string processing functionality. You can do this but it is rather painful -- and not going to be flexible for all the variations on what notes might look like.
One big issue is that the built-in string_split() function does not take multi-character delimiters. The following chooses a character that is not likely to be in the notes.
Also, the leading prefix is not consistent -- something there is an "on" and sometimes not. So, this doesn't attempt to extract the "rest of the string". It leaves in the prefix. You could use additional string manipulations to handle this, but I suspect the real problem is more complex.
In any case, this comes quite close to what you want:
select t.id, t.ord_no, trim(s.value), s2.value as date
from t cross apply
string_split(replace(note, '----', '~'), '~') s cross apply
(select top (1) s2.value
from string_split(s.value, ' ') s2
where try_convert(date, s2.value, 101) >= '2000-01-01'
) s2;
Here is a db<>fiddle.
Note that the date inequality is used because select try_convert(date, '') returns '1900-01-01' rather than NULL as I would expect.
I think, I have a solution for you. However, in different scenario it might not work. I have used SUBSTRING,CHARINDEX,STRING_SPLIT,REPLACE and CAST to achieve your desire answer. Here is my code given below=>
DECLARE #MyTable Table (ID INT, ORD_No VARCHAR(100),Note VARCHAR(300));
INSERT INTO #MyTable VALUES(1,'18/UT00120/ZS','Saved on 10/10/2020 14:08 by John Snow, rest of the note');
INSERT INTO #MyTable VALUES(2,'18/UT00120/ZS','Saved on 11/11/2020 07:08 by Mike Kowalsky, rest of the note');
INSERT INTO #MyTable VALUES(3,'18/UT00120/ZS','Saved on 12/11/2020 16:08 by Barbara Smith, rest of the note');
Select ID,ORD_No ,CAST(substring(Note,9,17) AS DATE) [Date],
(SELECT top 1 value FROM STRING_SPLIT(SUBSTRING(Note,29,CHARINDEX(',',Note,0)),',')) AS [USER],
RIGHT(REPLACE(SUBSTRING(Note, CHARINDEX(',', Note), LEN(Note)), '', ''), len(REPLACE(SUBSTRING(Note, CHARINDEX(',', Note), LEN(Note)), '', ''))-1) AS NOTE
FROM #MyTable
Note: This code will only work if your Note column data is always in same format as you gave in your question. Check also db-fiddle Link.

SQL Server XML Node Parsing

Fairly new to XML parsing in SQL Server. Here's what I have and what I'm trying to do.
I have a table with many rows similar to this:
+-------------------+------------------------------------+
| EDI_Assessment_ID | XML_TEXT |
+-------------------+------------------------------------+
| 12345 | text column containing XML |
| 12346 | text column containing XML |
+-------------------+------------------------------------+
The XML_Text column has a large XML text similar to this structure (i've simplified and only pasted the relevant portions of it:
<Assessment>
<ADLs>
<ADL_Group>
<ADL>bathing</ADL>
<Mapped_ADL Source="Calypso">Bathing</Mapped_ADL>
<ADL_Level>Requires only equipment to complete ADL</ADL_Level>
<Mapped_ADL_Level Source="Calypso">Independent</Mapped_ADL_Level>
<ADL_Equipment>HH shower</ADL_Equipment>
<ADL_Assisted_By_Info>
<ADL_Assisted_By>No one</ADL_Assisted_By>
</ADL_Assisted_By_Info>
</ADL_Group>
<ADL_Group>
<ADL>Continence-Bowel</ADL>
<Mapped_ADL Source="Calypso">Continence</Mapped_ADL>
<ADL_Level>Independent</ADL_Level>
<Mapped_ADL_Level Source="B/A">Independent</Mapped_ADL_Level>
<ADL_Equipment />
<ADL_Assisted_By_Info>
<ADL_Assisted_By>No one</ADL_Assisted_By>
</ADL_Assisted_By_Info>
</ADL_Group>
</Assessment>
How can i parse through the XML for each row in the table to return:
The ADL (bathing, Continence-Bowel) and
the ADL_Assisted_By_Info
I'm looking for the result set to return similar to this:
+-------------------+-------------+----------------------+------------------+----------------------+
| EDI_Assessment_ID | Bathing | ADL_Assisted_By_Info | Continence-Bowel | ADL_Assisted_By_Info |
+-------------------+-------------+----------------------+------------------+----------------------+
| 12345 | Independent | No one | Independent | No one |
+-------------------+-------------+----------------------+------------------+----------------------+
These solutions rely on something which, from the OP's sample data is not true; that the column of the datatype text has valid XML. The sample data is not, so this solution will not work against the sample data they have provided.
In fact, if all of the OP's sample data is poorly formed XML then they SQL Server is completely the wrong choice here. They should, ideally, be fixing their data first, and then changing the datatype to xml so that more bad XML can't in inserted into the database.
If, for whatever reason, they can't do that then they will need to find a different solution. SQL Server, however, isn't the solution. You're going to need something that is very good as string manipulation and work out the values that way. if you're doing this at a (large) dataset value then the process is probably going to slow down to a crawl.
Anyway, onto the point. Note the comment. There are 2 solution, the first, other than the validity, assumes that the bathing node is always the first ADL_GROUP element, and that Continence-Bowel is always the second:
WITH VTE AS(
SELECT 12345 AS ID,
CONVERT(text,
'<Assessment>
<ADLs>
<ADL_Group>
<ADL>bathing</ADL>
<Mapped_ADL Source="Calypso">Bathing</Mapped_ADL>
<ADL_Level>Requires only equipment to complete ADL</ADL_Level>
<Mapped_ADL_Level Source="Calypso">Independent</Mapped_ADL_Level>
<ADL_Equipment>HH shower</ADL_Equipment>
<ADL_Assisted_By_Info>
<ADL_Assisted_By>No one</ADL_Assisted_By>
</ADL_Assisted_By_Info>
</ADL_Group>
<ADL_Group>
<ADL>Continence-Bowel</ADL>
<Mapped_ADL Source="Calypso">Continence</Mapped_ADL>
<ADL_Level>Independent</ADL_Level>
<Mapped_ADL_Level Source="B/A">Independent</Mapped_ADL_Level>
<ADL_Equipment />
<ADL_Assisted_By_Info>
<ADL_Assisted_By>No one</ADL_Assisted_By>
</ADL_Assisted_By_Info>
</ADL_Group>
</ADLs>' + --I have added this line to make the XML valid. The sample you have will NOT work, as it is not valid XML
'</Assessment>') AS XML_Text
)
SELECT V.ID,
X.XML_Type,
T.AA.value('(ADL_Group/Mapped_ADL_Level/text())[1]','varchar(30)') AS Bathing,
T.AA.value('(ADL_Group/ADL_Assisted_By_Info/ADL_Assisted_By/text())[1]','varchar(30)') AS ADL_Assisted_By_Info,
T.AA.value('(ADL_Group/Mapped_ADL_Level/text())[2]','varchar(30)') AS ContinenceBowel,
T.AA.value('(ADL_Group[2]/ADL_Assisted_By_Info/ADL_Assisted_By/text())[1]','varchar(30)') AS ADL_Assisted_By_Info
FROM VTE V
CROSS APPLY (VALUES(TRY_CONVERT(xml, V.XML_Text))) X(XML_Type)
CROSS APPLY X.XML_Type.nodes('/Assessment/ADLs') T(AA);
If, however, that isn't true and there could be other nodes in play, with different values, then you could do the following for the SELECT (CTE not included):
SELECT V.ID,
X.XML_Type,
B.AG.value('(Mapped_ADL_Level/text())[1]','varchar(30)') AS Bathing,
B.AG.value('(ADL_Assisted_By_Info/ADL_Assisted_By/text())[1]','varchar(30)') AS ADL_Assisted_By_Info,
CB.AG.value('(Mapped_ADL_Level/text())[1]','varchar(30)') AS ContinenceBowel,
CB.AG.value('(ADL_Assisted_By_Info/ADL_Assisted_By/text())[1]','varchar(30)') AS ADL_Assisted_By_Info
FROM VTE V
CROSS APPLY (VALUES(TRY_CONVERT(xml, V.XML_Text))) X(XML_Type)
CROSS APPLY X.XML_Type.nodes('/Assessment/ADLs/ADL_Group') B(AG)
CROSS APPLY X.XML_Type.nodes('/Assessment/ADLs/ADL_Group') CB(AG)
WHERE B.AG.value('(ADL/text())[1]','varchar(30)') = 'bathing'
AND CB.AG.value('(ADL/text())[1]','varchar(30)') = 'Continence-Bowel';

SQL Server : extract domain and params from 1 million rows into temp table

I have just over a million rows or Urls in one column. The column name is [url] and the table name is redirects.
I'm running SQL Server 2014.
I need a way to extract the sub domain for each url into a new column in a temp table.
Ideally at the same type select distinct param names for the query string into another column and the param values into another column
My main concern is performance not locking up the server while looping through a million rows.
I would be happy to run 3 queries to get the results if it makes more sense
Examples of the column data:
https://www.google.com/ads/ga-audiences?v=1&aip=1&t=sr&_r=4&tid=UA-9999999-1&cid=9999107657.199999834&jid=472999996&_v=j66&z=1963999907
https://track.kspring.com/livin-like-a-star#pid=370&cid=6546&sid=front
So I end up with 3 columns in a temp table
URL | Param | Qstring
------------------+-------+----------
www.google.com | v | 1
www.google.com | aip | 1
www.google.com | t | dc
www.google.com | tid | UA-1666666-1
www.google.com | jid | 472999996
track.kspring.com | pid | 370
track.kspring.com | cid | 6546
track.kspring.com | sid | front
I've been looking at some examples to extract the domain name from a string but I don't have much experience with regex or string manipulation.
This is the kind of processing at which .Net CLR functions excel. Just use Uri and parse away, from a CLR Table Value Function (so that you can output more than one column in one single call).
Grab a copy of NGrams8K and you can do this:
-- sample data
declare #table table ([url] varchar(8000));
insert #table values
('https://www.google.com/ads/ga-audiences?v=1&aip=1&t=sr&_r=4&tid=UA-9999999-1&cid=9999107657.199999834&jid=472999996&_v=j66&z=1963999907'),
('https://track.kspring.com/livin-like-a-star#pid=370&cid=6546&sid=front');
declare #delimiter varchar(20) = '%[#?;]%'; -- customizable parameter for parsing parameter values
-- solution
select
[url] = substring([url], a1.startPos, a2.aLen-a1.startPos),
[param] = substring(item, 1, charindex('=', split.item)-1),
qString = substring(item, charindex('=', split.item)+1, 8000)
from #table t
cross apply (values (charindex('//',[url])+2)) a1(startPos)
cross apply (values (charindex('/',[url],a1.startPos))) a2(aLen)
cross apply
(
select split.item
from (values (len(substring([url], a2.aLen,8000)), 1)) as l(s,d)
cross apply
( select -(l.d) union all
select ng.position
from dbo.NGrams8k(substring([url], a2.aLen,8000), l.d) as ng
where token LIKE #delimiter
) as d(p)
cross apply (values(replace(substring(substring([url], a2.aLen,8000), d.p+l.d,
isnull(nullif(patindex('%'+#delimiter+'%',
substring(substring([url], a2.aLen,8000), d.p+l.d, l.s)),0)-1, l.s+l.d)),
'&amp',''))) split(item)
where split.item like '%=%'
) split(item);
Results
url param qString
------------------- ------- ---------------------------------
www.google.com v 1
www.google.com aip 1
www.google.com t sr
www.google.com _r 4
www.google.com tid UA-9999999-1
www.google.com cid 9999107657.199999834
www.google.com jid 472999996
www.google.com _v j66
www.google.com z 1963999907
track.kspring.com pid 370
track.kspring.com cid 6546
track.kspring.com sid front

How select XML fields node for all rows

I have a table like this :
YEAR int,
Fields XML
My XML column has this structure for all rows but with different values:
How I can get this result:
YEAR ID NAME LASTNAME
---------------------------------------------------
2011 1000 Nima Agha
2011 1001 Begha Begha
2011 1002 Jigha Jigha
2011 1003 Aba Aba
2012 1034 AAA BBB
...
thanks
How about this:
SELECT
Year,
E.P.value('(ID)[1]', 'INT') AS 'ID',
E.P.value('(Name)[1]', 'VARCHAR(50)') AS 'Name',
E.P.value('(LastName)[1]', 'VARCHAR(50)') AS 'LastName'
FROM
dbo.YourTable
CROSS APPLY
Fields.nodes('/Employees/Person') AS E(P)
You're basically selecting Year from the base table and then extracting each <Person> node from the Fields column into an "inline XML table" called E with a single XML column called P (you can choose whatever names you like for those) that you again query and extract the individual elements from .