XML parsing using SQL Server - sql

I need to parse the XML below, however the values generated is wrong. Anyone has an idea?
Results
Code Pay1_515
5570, Industry1, 1 10
5570, Industry2, 2 10
Sample XML
DECLARE #XML xml = '<?xml version="1.0" encoding="utf-8"?>
<CookedData>
<Column DestinationColumnCode="Code">
<r v="5570, Industry1, 1" />
<r v="5570, Industry2, 2" />
</Column>
<Column DestinationColumnCode="Pay1_515">
<r v="10" />
<r v="10" />
</Column>
</CookedData>';
Sample code
with C as
(
select T.X.value('(Column[#DestinationColumnCode = "Code"]/r/#v)[1]', 'NVARCHAR(3000)') as Code,
T.X.value('(Column[#DestinationColumnCode = "Pay1_515"]/r/#v)[1]', 'NVARCHAR(3000)') as Pay
from #XML.nodes('/CookedData/Column') as T(X)
)
SELECT * FROM C

Try this:
with C as
(
select T.X.value('(//Column[#DestinationColumnCode = "Code"]/r/#v)[1]', 'NVARCHAR(3000)') as Code,
T.X.value('(//Column[#DestinationColumnCode = "Pay1_515"]/r/#v)[1]', 'NVARCHAR(3000)') as Pay
from #XML.nodes('/CookedData/Column') as T(X)

You need to use this:
SELECT
Code = T.X.value('#DestinationColumnCode', 'VARCHAR(30)'),
RValue= r.value('#v', 'varchar(50)')
FROM
#XML.nodes('/CookedData/Column') as T(X)
CROSS APPLY
X.nodes('r') AS T2(R)
This will produce the output:

This has been bugging me all morning. I wondered if I could improve on the answer given by #marc_s (to the detriment of my actual work :-/) and I have managed to get the required output. However, some caveats:
This assumes that the order of child r nodes in each Column node correlates to the other (i.e. the first r node in each should be side by side and so on). This is because there is no other way to tie them together that I can see.
It's not very generic - as it stands, it won't work for n columns without extra code and fiddling about.
You don't mind some pretty filthy code which is likely to be pretty poor performing on large amounts of data (even with some tuning - my brain has now melted and can't cope with any more fiddling on it).
I've tested with a few more rows in the XML but I would not guarantee it's efficacy in all cases, it would need some serious testing.
I couldn't be bothered to get the final column names right.
Someone, anyone, please feel free to improve this, but FWIW here it is (complete, working code):
DECLARE #XML xml = '<?xml version="1.0" encoding="utf-8"?>
<CookedData>
<Column DestinationColumnCode="Code">
<r v="5570, Industry1, 1" />
<r v="5570, Industry2, 2" />
</Column>
<Column DestinationColumnCode="Pay1_515">
<r v="10" />
<r v="10" />
</Column>
</CookedData>';
DECLARE #shredXML TABLE (Code VARCHAR(10), RValue VARCHAR(20))
DECLARE #Codes TABLE (RowNum INT, ColCode VARCHAR(10))
--Do this once at the start and use the results.
INSERT INTO #shredXML
SELECT
Code = T.X.value('#DestinationColumnCode', 'VARCHAR(30)'),
RValue= r.value('#v', 'varchar(50)')
FROM
#XML.nodes('/CookedData/Column') as T(X)
CROSS APPLY
X.nodes('r') AS T2(R)
--First get the distinct list of DestinationColumnCode values.
INSERT INTO #Codes
SELECT ROW_NUMBER() OVER (ORDER BY colcodes.Code), colcodes.Code
FROM #shredXML colcodes
GROUP BY colcodes.Code
SELECT p1.RValue, p2.RValue
FROM (
--Get all the values for the code column
SELECT ROW_NUMBER() OVER (ORDER BY Codes.RValue) AS RowNum, Codes.RValue, Codes.Code
FROM (
SELECT x.Code, x.RValue FROM #shredXML x
INNER JOIN #Codes c
ON c.ColCode = x.Code
WHERE c.RowNum = 1) AS Codes) AS p1
--Join the values column on RowNum
INNER JOIN (
SELECT ROW_NUMBER() OVER (ORDER BY Vals.RValue) AS RowNum, Vals.RValue, Vals.Code
FROM (
SELECT x.Code, x.RValue FROM #shredXML x
INNER JOIN #Codes c
ON c.ColCode = x.Code
WHERE c.RowNum = 2) AS Vals) AS p2
ON p1.RowNum = p2.RowNum
EDIT
Finally got SQLFiddle to play ball and run the example above

Related

SQL Server XML - embedding a dataset within an existing node

There are numerous examples of inserting record sets into XML, but they tend to be very basic and do not cover my specific requirement. In the following instance, I want to extract some data to XML and within that, I need a collection of nodes to represent a variable number of records. I know it can be done, as I have an example, but I do not understand how it works. I do understand my code to return the dataset might make no sense, but it is only meant to represent an example of what I am try to achieve.
SELECT 'Node' AS [A/B],
(
SELECT x.Item
FROM (
SELECT 'Line1' AS [Item]
FROM (SELECT 1 AS x) x
UNION
SELECT 'Line2' AS [Item]
FROM (SELECT 1 AS x) x
) x
FOR XML PATH(''), ROOT('Lines'), TYPE
)
FROM (SELECT 1 AS x) x
FOR XML PATH('Demo')
This gives me the following:
<Demo>
<A>
<B>Node</B>
</A>
<Lines>
<Item>Line1</Item>
<Item>Line2</Item>
</Lines>
</Demo>
What I want is the following:
<Demo>
<A>
<B>Node</B>
<Lines>
<Item>Line1</Item>
<Item>Line2</Item>
</Lines>
</A>
</Demo>
Can anyone help or point me to the correct answer please?
Making some assumptions on the structure of your source data that you should be able to easily adjust around, the following script gives you the output you are after, even across multiple Node groups:
Query
declare #t table([Node] varchar(10),Line varchar(10));
insert into #t values
('Node1','Line1')
,('Node1','Line2')
,('Node1','Line3')
,('Node1','Line4')
,('Node2','Line1')
,('Node2','Line2')
,('Node2','Line3')
,('Node2','Line4')
;
with n as
(
select distinct [Node]
from #t
)
select n.[Node] as [B]
,(select t.Line as Item
from #t as t
where t.[Node] = n.[Node]
for xml path(''), root('Lines'), type
)
from n
for xml path('A'), root('Demo')
;
Output
<Demo>
<A>
<B>Node1</B>
<Lines>
<Item>Line1</Item>
<Item>Line2</Item>
<Item>Line3</Item>
<Item>Line4</Item>
</Lines>
</A>
<A>
<B>Node2</B>
<Lines>
<Item>Line1</Item>
<Item>Line2</Item>
<Item>Line3</Item>
<Item>Line4</Item>
</Lines>
</A>
</Demo>
You need to place A as the node name of the outer query, and Demo as its root.
SELECT 'Node' AS [B],
(
SELECT x.Item
FROM (
SELECT 'Line1' AS [Item]
UNION
SELECT 'Line2' AS [Item]
) x
FOR XML PATH(''), ROOT('Lines'), TYPE
)
FOR XML PATH('A'), ROOT('Demo'), TYPE
SQL Fiddle

How do I get a value from XML column in SQL?

So, I have a table with a large chunk of data stored in XML.
The partial XML schema (down to where I need) looks like this:
<DecisionData>
<Customer>
<SalesAttemptNumber />
<SubLenderID>IN101_CNAC</SubLenderID>
<DecisionType>Decision</DecisionType>
<DealerID />
<CustomerNumber>468195994772076</CustomerNumber>
<CustomerId />
<ApplicationType>Personal</ApplicationType>
<ApplicationDate>9/16/2008 11:32:07 AM</ApplicationDate>
<Applicants>
<Applicant PersonType="Applicant">
<CustNum />
<CustomerSSN>999999999</CustomerSSN>
<CustLastName>BRAND</CustLastName>
<CustFirstName>ELIZABETH</CustFirstName>
<CustMiddleName />
<NumberOfDependants>0</NumberOfDependants>
<MaritalStatus>Single</MaritalStatus>
<DateOfBirth>1/1/1911</DateOfBirth>
<MilitaryRank />
<CurrentAddress>
<ZipCode>46617</ZipCode>
Unfortunately, I am unfamiliar with pulling from XML, and my google-fu has failed me.
select TransformedXML.value('(/DecisionData/Customer/Applicants/Applicant PersonType="Applicant"/CurrentAddress/ZipCode/node())[1]','nvarchar(max)') as zip
from XmlDecisionInputText as t
I believe my problem lies with the portion that goes Applicant PersonType="Applicant", but am unsure how to deal with it.
Thanks for any help.
The xpath in its simplest form would be:
TransformedXML.value('(//ZipCode)[1]', 'nvarchar(100)') AS zip
This will find the first ZipCode node anywhere inside your document. If there are multiple, just be specific (as much as you want but not any more):
TransformedXML.value('(/DecisionData/Customer/Applicants/Applicant[#PersonType="Applicant"]/CurrentAddress/ZipCode)[1]', 'nvarchar(100)') AS zip
DB Fiddle
If there are MULTIPLE applicants, you can use a CROSS APPLY
Example
Select A.ID
,B.*
From XmlDecisionInputText A
Cross Apply (
Select PersonType = x.v.value('#PersonType','VARCHAR(150)')
,CustLastName = x.v.value('CustLastName[1]','VARCHAR(150)')
,CustFirstName = x.v.value('CustFirstName[1]','VARCHAR(150)')
,ZipCode = x.v.value('CurrentAddress[1]/ZipCode[1]','VARCHAR(150)')
From XmlDecisionInputText.nodes('DecisionData/Customer/Applicants/*') x(v)
) B

How to insert a XML file into a SQL table

How do I insert this into a SQL table?
<ITEM id="1"
name="Swimmer Head"
mesh_name="eq_head_swim"
totalpoint="0"
type="equip"
res_sex="m"
res_level="0"
slot="head"
weight="2"
bt_price="0"
hp="4"
ap="8"
maxwt="0"
sf="0"
fr="0"
cr="0"
pr="0"
lr="0"
color="#FFFFFFFF"
desc="Part of an everyday swimming outfit" />
Also, theres a lot of more lines in this XML file, so how can I do this with 1 .sql file?
Here is one method which will give you an EAV structure (Entity Attribute Value).
You may notice I only have to identify ONE key element ... id
I truncated a few elements and added a second item for demonstrative purposes only
Declare #XML xml = '
<ITEM id="1" name="Swimmer Head" mesh_name="eq_head_swim" totalpoint="0" type="equip" res_sex="m" res_level="0" slot="head" weight="2" bt_price="0" color="#FFFFFFFF" desc="Part of an everyday swimming outfit" />
<ITEM id="2" name="Boxer Feet" mesh_name="eq_feet_boxer" totalpoint="0" type="equip" res_sex="m" res_level="0" slot="head" weight="2" bt_price="25.00" color="#FFFFFFFF" desc="Somthing for the boxer" />
'
Select ID = r.value('#id','int')
,Item = attr.value('local-name(.)','varchar(100)')
,Value = attr.value('.','varchar(max)')
From #XML.nodes('/ITEM') as A(r)
Cross Apply A.r.nodes('./#*') AS B(attr)
Where attr.value('local-name(.)','varchar(100)') not in ('id')
Returns (which can easily be Pivoted if necessary)
EDIT - To load XML from a FILE
Declare #XML xml
Select #XML = BulkColumn FROM OPENROWSET(BULK 'C:\Working\SomeXMLFile.xml', SINGLE_BLOB) x;
Select ID = r.value('#id','int')
,Item = attr.value('local-name(.)','varchar(100)')
,Value = attr.value('.','varchar(max)')
From #XML.nodes('/ITEM') as A(r)
Cross Apply A.r.nodes('./#*') AS B(attr)
Where attr.value('local-name(.)','varchar(100)') not in ('id')
Re-asking Siyual's question, but more specific:
Is this one line of many that should go into a table?
And is it not nested?
In other words, would it continue with repetitions of <ITEM id= [...] desc="something" /> ? If the answer is yes, consider a perl script that picks everything after an equal sign and between double quotes and concatenates the obtained bits, separating them by, say, comma, creating one line per <ITEM [...] /> .
This way, you'd get a CSV file to load. Of course, you'd have to create the target table first.

XML Output from SQL Server 2008

I am trying to create an XML output from SQL that has 3 nested statements but have pretty minimal experience in this area. The code I've written is below:
select
replace(replace(replace(
(
select ID as [#ID],
(select cast(Name as int) as [#Name],
(select num as [#Number],
from #tbl_new_claims_export
for xml path('Num'),root('Numbers'), type
)
from #tbl_new_claims_export
for xml path('LineItem'), type
)
from #tbl_new_claims_export
for XML PATH('Line'),ROOT('Lines')
),'><','>'+char(10)+'<'),'<Num', char(9)+'<Num'), '<Num>', char(9)+'<Num>') ;
I am trying to create an output that looks like this:
<Lines>
<Line ID ="1">
<LineItem Name ="Michael"/>
<Numbers>
<Num Number="24"</Num>
</Numbers>
</LineItem>
</Line>
For each Line, I want to see the Line, Name, and Number as shown above. However, it is showing multiple Names under each Line and then repeats the Number below. Can anybody help me troubleshoot this code?
Thanks.
Without sample data with 1:n examples and the expected output it is reading in the magic glass bulb...
Anyway, this
SELECT
1 AS [Line/#ID]
,'Michael' AS [LineItem/#Name]
,24 AS [Numbers/Num/#Number]
FOR XML PATH('Lines')
will produce exactly the output you specify:
<Lines>
<Line ID="1" />
<LineItem Name="Michael" />
<Numbers>
<Num Number="24" />
</Numbers>
</Lines>
If you need further help, please specify a minimal and reduced test scenario. Best would be a fiddle or some pasteable code like
DECLARE #tbl TABLE(ID INT, col1 VARCHAR(MAX)/*more columns*/);
INSERT INTO #tbl VALUES (1,'test1')/*more values*/

Select all path from an Xml that have a content like a graph

I have an XML column with an element like this:
<Root>
<Word Type="pre1" Value="A" />
<Word Type="pre1" Value="D" />
<Word Type="base" Value="B" />
<Word Type="post1" Value="C" />
<Word Type="post1" Value="E" />
<Word Type="post1" Value="F" />
</Root>
that model something like:
and want to select all possible path using XQuery in MSSQL to have something like this result:
ABC
ABE
ABF
DBC
DBE
DBF
Or somthing like:
<Root>
<Word Type="pre1" Value="A" />
<Word Type="pre1" Value="D" />
<Word Type="pre2" Value="G" />
<Word Type="pre2" Value="H" />
<Word Type="base" Value="B" />
<Word Type="post1" Value="C" />
<Word Type="post1" Value="E" />
<Word Type="post1" Value="F" />
</Root>
with this result:
AHBC
AHBE
AHBF
DHBC
DHBE
DHBF
AGBC
AGBE
AGBF
DGBC
DGBE
DGBF
You can use a CTE to build the unique type list and then use that in a recursive CTE to build the strings. Finally you pick out the strings generated in the last iteration.
with Types as
(
select row_number() over(order by T.N) as ID,
T.N.value('.', 'varchar(10)') as Type
from (select #XML.query('for $t in distinct-values(/Root/Word/#Type)
return <T>{$t}</T>')
) as X(T)
cross apply X.T.nodes('/T') as T(N)
),
Recu as
(
select T.Type,
T.ID,
X.N.value('#Value', 'varchar(max)') as Value
from Types as T
cross apply #XML.nodes('/Root/Word[#Type=sql:column("T.Type")]') as X(N)
where T.ID = 1
union all
select T.Type,
T.ID,
R.Value+X.N.value('#Value', 'varchar(max)') as Value
from Types as T
inner join Recu as R
on T.ID = R.ID + 1
cross apply #XML.nodes('/Root/Word[#Type=sql:column("T.Type")]') as X(N)
)
select R.Value
from Recu as R
where R.ID = (select max(T.ID) from Types as T)
order by R.Value
SQL Fiddle
Update
Here is a version that have better performance. It shreds the XML to two temp tables. One for each type and one for all words. The recursive CTE is still needed but it uses the tables instead of the XML. There is also one index on each of the temp tables that is used by the joins in the CTE.
-- Table to hold all values
create table #Values
(
Type varchar(10),
Value varchar(10)
);
-- Clustered index on Type is used in the CTE
create clustered index IX_#Values_Type on #Values(Type)
insert into #Values(Type, Value)
select T.N.value('#Type', 'varchar(10)'),
T.N.value('#Value', 'varchar(10)')
from #XML.nodes('/Root/Word') as T(N);
-- Table that holds one row for each Type
create table #Types
(
ID int identity,
Type varchar(10),
primary key (ID)
);
-- Add types by document order
-- Table-Valued Function Showplan Operator for nodes guarantees document order
insert into #Types(Type)
select T.Type
from (
select row_number() over(order by T.N) as rn,
T.N.value('#Type', 'varchar(10)') as Type
from #XML.nodes('/Root/Word') as T(N)
) as T
group by T.Type
order by min(T.rn);
-- Last level of types
declare #MaxID int;
set #MaxID = (select max(ID) from #Types);
-- Recursive CTE that builds the strings
with C as
(
select T.ID,
T.Type,
cast(V.Value as varchar(max)) as Value
from #Types as T
inner join #Values as V
on T.Type = V.Type
where T.ID = 1
union all
select T.ID,
T.Type,
C.Value + V.Value
from #Types as T
inner join C
on T.ID = C.ID + 1
inner join #Values as V
on T.Type = V.Type
)
select C.Value
from C
where C.ID = #MaxID
order by C.Value;
-- Cleanup
drop table #Types;
drop table #Values;
SQL Fiddle
You need the cross product of these three element sets, so basically write a join without conditions:
for $pre in //Word[#Type="pre1"]
for $base in //Word[#Type="base"]
for $post in //Word[#Type="post1"]
return concat($pre/#Value, $base/#Value, $post/#Value)
For the extended version, I used two helper functions which fetch all attributes and then recursively concat the results.
It seems MSSQL doesn't allow custom XQuery functions. This code is valid for conformant XQuery 1.0 (and newer) processors.
declare function local:call($prefix as xs:string) as xs:string* {
local:recursion('',
for $value in distinct-values(//Word/#Type[starts-with(., $prefix)])
order by $value
return $value
)
};
declare function local:recursion($strings as xs:string*, $attributes as xs:string*) as xs:string* {
if (empty($attributes))
then $strings
else
for $string in $strings
for $append in //Word[#Type=$attributes[1]]
return local:recursion(concat($string, $append/#Value), $attributes[position() != 1])
};
for $pre in local:call('pre')
for $base in local:call('base')
for $post in local:call('post')
return concat($pre, $base, $post)
If I understand your XML correctly, all of your graphs are essentially sequences of steps, where no step may be omitted and each step may have several alternatives. (So the set of paths through the graph is essentially the Cartesian product of the various sets of alternatives.) If that's not true, what follows won't be what you want.
The easiest way to get the Cartesian product here is to use an XQuery FLWOR expression with one for clause for each factor in the Cartesian product, as illustrated in Jens Erat's initial answer.
If you don't know in advance how many factors there will be (because you don't know what sequence of 'Type' values may occur in a graph), and don't want to formulate the query afresh each time, then the simplest thing to do is to write a recursive function which takes a sequence of 'Type' values as one argument and the 'Root' element you're working on as another argument, and handles one factor at a time.
This function does that job, for your sample input:
declare function local:cartesian-product(
$doc as element(),
$types as xs:string*
) as xs:string* {
(: If we have no $types left, we are done.
Return the empty string. :)
if (empty($types)) then
''
(: Otherwise, take the first value off the
sequence of types and return the Cartesian
product of all Words with that type and
the Cartesian product of all the remaining
types. :)
else
let $t := $types[1],
$rest := $types[position() > 1]
for $val in $doc/Word[#Type = $t]/#Value
for $suffix in
local:cartesian-product($doc,$rest)
return concat($val, $suffix)
};
The only remaining problem is the slightly tricky one of getting the sequence of distinct 'Type' values in document order. We could just call distinct-values($doc//Word/#Type) to get the values, but there is no guarantee they will be in document order.
Borrowing from Dimitre Novatchev's solution to a related problem, we can calculate an appropriate sequence of 'Type' values thus:
let $doc := <Root>
<Word Type="pre1" Value="A" />
<Word Type="pre1" Value="D" />
<Word Type="pre2" Value="G" />
<Word Type="pre2" Value="H" />
<Word Type="base" Value="B" />
<Word Type="post1" Value="C" />
<Word Type="post1" Value="E" />
<Word Type="post1" Value="F" />
</Root>
let $types0 := ($doc/Word/#Type),
$types := $types0[index-of($types0,.)[1]]
This returns the distinct values, in document order.
Now we are ready to calculate the result you want:
return local:cartesian-product($doc, $types)
The results are returned in an order that differs slightly from the order you give; I assume you do not care about the sequence of results:
AGBC AGBE AGBF AHBC AHBE AHBF DGBC DGBE DGBF DHBC DHBE DHBF