Using TSQL to generate XML

Using TSQL to generate XML - sql

I am trying to generate some XML using TSQL and having some trouble, I was able to get to traverse 2 levels but getting stuck at 3. How it should work in the example I have posted you have to Purchase Orders, the first purchase order only has one part with one release for that part. The second Purchase order as 2 parts each part with its own release. At the end I should have 2 XML documents that traverse 3 levels, release/part/purchase order. I think I am getting confused when linking to the next stage up as well as the group by.The example only shows 2 parts but there can be many parts to one purchase order but only ever one release per part, any help would be much appreciated.
DECLARE #sdv AS TABLE
(
PID INT IDENTITY(1,1) PRIMARY KEY,
PurchaseOrder varchar(20),
PartNo int,
Release int
)
INSERT INTO #sdv values ('PURC123', 111, 1)
INSERT INTO #sdv values ('PURC333', 222, 1)
INSERT INTO #sdv values ('PURC333', 333, 1)
select
PurchaseOrder,
(
select
PartNo,
(
select
Release
from
#sdv a3
where
a3.Release = a2.Release and
a3.PartNo = a2.PartNo and
a3.PurchaseOrder = a1.PurchaseOrder
for xml path('Release'), type
)
from
#sdv a2
where
a2.PartNo = a1.PartNo
group by a2.PartNo
for xml path('part'), type
) from #sdv a1
group by PurchaseOrder
for xml path('Purchase')

I am going to shoot you an answer so you can play around with it. I am around for a little bit longer today, but will check back tomorrow.
For this example I just used attributes as I can read it a bit easier. I did my grouping using a CTE and assumed the relationship was 1 to many between a part and a release. (could be wrong there, but need info from you). Was also a bit hazy about the levels needing to go a third deep. If we are talking about iterations of parts (versions) then you could simple add a #Release to the second level. Note: If you don't want attributes just remove the #before each of the tags.
I would highly recommend being very diligent about your code format whenever doing hierarchical xml work. Its easy to get lost.
CODE
DECLARE #sdv AS TABLE
(
PID INT IDENTITY(1,1) PRIMARY KEY,
PurchaseOrder varchar(20),
PartNo int,
Release int
)
INSERT INTO #sdv values ('PURC123', 111, 1)
INSERT INTO #sdv values ('PURC333', 222, 1)
INSERT INTO #sdv values ('PURC333', 333, 1)
;WITH PO AS (
SELECT DISTINCT PurchaseOrder
FROM #sdv
)
SELECT L1.PurchaseOrder AS "#PurchseOrder",
( SELECT PartNo AS "#PartNo",
( SELECT DISTINCT L3.Release AS "#Release"
FROM #sdv AS L3
WHERE L2.PartNo = L3.PartNo
ORDER BY L3.Release ASC
FOR XML PATH('RELEASE'),TYPE
)
FROM #sdv AS L2
WHERE L2.PurchaseOrder = L1.PurchaseOrder
FOR XML PATH('PART'), TYPE
)
FROM PO AS L1
FOR XML PATH('PurchaseOrder'), ROOT('PurchaseOrders')
XML OUTPUT
<PurchaseOrders>
<PurchaseOrder PurchseOrder="PURC123">
<PART PartNo="111">
<RELEASE Release="1" />
</PART>
</PurchaseOrder>
<PurchaseOrder PurchseOrder="PURC333">
<PART PartNo="222">
<RELEASE Release="1" />
</PART>
<PART PartNo="333">
<RELEASE Release="1" />
</PART>
</PurchaseOrder>
</PurchaseOrders>

Related

SQL Server - matching attributes query

SQL Server Gurus ...
Currently using MS SQL Server 2016
I know Joe Celko and all SQL purists are squirming at the thought of using bitmasks, but I have a use case in which I need to query for all widgets that contain a set of given attributes.
Each widget may contain several hundred attributes.
The attributes of a widget are either present or not (1 = present, 0 = not
present)
One way I thought to do this is via bitmasks – the attributes to be found (a bitmask) could be ANDed with the attributes of each widget to find matches in a single operation. For example, the widgets table might be:
widets table:
widget_uid Uniqueidentifier
attributes BigInt
SELECT widget_uid
FROM widgets
WHERE ( attributes & bitmask ) = bitmask;
Problem is, using a BigInt for the attributes limits the number of attributes to 64 (a widget can have several hundred attributes), I could group the attributes in chunks of 64 bits, ie:
widets table:
widget_uid Uniqueidentifier
attributes0 BigInt -- Attributes 0-63
attributes1 BigInt -- Attributes 64-127
attributes2 BigInt -- Attributes 128-191
SELECT widget_uid
FROM widgets
WHERE ( attributes0 & bitmask0 ) = bitmask0
AND ( attributes1 & bitmask1 ) = bitmask1
AND ( attributes2 & bitmask2 ) = bitmask2
... but was wondering if anyone has come up with a solution for bit operations using bitmasks with greater than 64 bits – or if other (more efficient?) solutions would exist?
In the use case, the widgets table does contain other columns, but I am only concerned with the attributes matching portion of the query at the moment.
Any and all ideas are welcome - would be interested in knowing how others tackle this particular problem.
Thanks in advance.

We had a similar use case, on a significantly large data set. This was for an e-commerce site with products and attributes. Our case was a bit more complex than here, where we had any possible number of attributes and then values assigned to those attributes. e.g. Color - Red/Green/Blue, Size - S/M/L etc.
We found that associated tables with good indexing was the key in our case. While this may not be an option for you we found this to be the optimal solution for a dynamic data set.
I can code you up an example if you feel it will be helpful.
Edited to add example:
DROP TABLE IF EXISTS #Widgets
DROP TABLE IF EXISTS #Attributes
DROP TABLE IF EXISTS #WidgetAttributes
CREATE TABLE #Widgets (widget_UID UNIQUEIDENTIFIER PRIMARY KEY CLUSTERED, Name NVARCHAR(255))
CREATE TABLE #Attributes (Attribute_UID UNIQUEIDENTIFIER PRIMARY KEY CLUSTERED, Name NVARCHAR(255))
CREATE TABLE #WidgetAttributes (widget_UID UNIQUEIDENTIFIER,Attribute_UID UNIQUEIDENTIFIER)
CREATE NONCLUSTERED INDEX ix_WidgetAttribute ON #WidgetAttributes (Attribute_UID) INCLUDE (widget_UID)
INSERT INTO #Widgets (widget_UID, Name) values
( '{c63bea73-2331-4698-82c9-f71845ab8601}', N'Widget 1' ),
( '{a0865b8f-606b-4273-9207-39a8a26016c4}', N'Widget 2' ),
( '{211fe27e-ab98-4b61-83a3-3d006d66db5a}', N'Widget 3' )
INSERT INTO #Attributes (Attribute_UID, Name)
VALUES
( '{99354dc0-d0b2-4919-a887-edf115eeb1bd}', N'Height' ),
( '{136bbe4c-497d-472f-a905-670e4a7805d0}', N'Width' ),
( '{f006f950-30d1-453e-8e09-4f7d140fa3cb}', N'Depth' ),
( '{0d190639-677f-4b75-8d36-1bdac00de132}', N'Colour' )
-- Set links
-- Widget 1 All attributes
-- Widget 2 Height Width
-- Widget 3 Colour
INSERT INTO #WidgetAttributes (widget_UID, Attribute_UID)
SELECT '{c63bea73-2331-4698-82c9-f71845ab8601}',Attribute_UID FROM #Attributes
UNION ALL
SELECT TOP (2) '{a0865b8f-606b-4273-9207-39a8a26016c4}',Attribute_UID FROM #Attributes WHERE Name<> 'Colour'
UNION ALL
SELECT '{211fe27e-ab98-4b61-83a3-3d006d66db5a}',Attribute_UID FROM #Attributes WHERE Name = 'Colour'
-- #SearchAttributes to hold list of attributes you are trying to find
DECLARE #SearchAttributes TABLE (Attribute_UID UNIQUEIDENTIFIER)
INSERT INTO #SearchAttributes
SELECT Attribute_UID FROM #Attributes WHERE Name<> 'Colour'
;WITH cte AS (
SELECT WA.widget_UID, COUNT(1) AttributesPresent FROM #WidgetAttributes WA
JOIN #SearchAttributes SA ON SA.Attribute_UID = WA.Attribute_UID
GROUP BY WA.widget_UID
)
SELECT cte.AttributesPresent
, W.widget_UID
, W.Name
FROM cte
JOIN #Widgets W ON W.widget_UID = cte.widget_UID
ORDER BY cte.AttributesPresent DESC
Gives an output of:
AttributesPresent widget_UID Name
----------------- ------------------------------------ ----------
3 C63BEA73-2331-4698-82C9-F71845AB8601 Widget 1
2 A0865B8F-606B-4273-9207-39A8A26016C4 Widget 2
We used an approach of counting how many attributes were present for each so we not only had the option of "exact match" but also "closest fit".

Using bitmask in databases is wrong approach. Even if you somewhow manage it to work, you will not be able to use indexes to speed up execution.
Use standard solution, this is standard situation. There is standard M:N relationship between Widgets and Attributes (both should be tables, of course). You will add another table that will assign Attributes to Widgets - you can call it WidgetAttributes.
It will have 3 columns: Id, WidgetId, AttributeId
Then you can simply for example get list of Widgets that have Attribute:
select w.*
from Widgets w
inner join WidgetAttributes wa on wa.WidgetId = w.Id
inner join Attributes a on a.Id = wa.AttributeId
where a.AttributeName='xxx'

SQL Server recursive query which is not working

I have a table with three columns
unified_id
master_id
(the third column is a primary key identity)
For an id which could be in either id column, I want to find the set of ids.
For example, if the data looks like this
X001,X123
X002,X123
X123,X123
then starting from X001 I want to return with the set of X001, X002 and X123. Likewise I want the same set starting with any member.
We can assume that the set is composed of values in the unified_id column, that is we assume that there is always a row like the third one, relating the master id to itself.
When I have just one row the follow query hits maximum recursion level, so I frankly have no idea how to solve this with recursive queries.
2BM0000023, 1TE0000111
Code:
CREATE TABLE link_unified_id
(unified_id VARCHAR(16) NOT NULL,
master_id VARCHAR(16) NOT NULL
);
insert into link_unified_id values ('X001','X123');
insert into link_unified_id values ('X002','X123');
insert into link_unified_id values ('X003','X123');
insert into link_unified_id values ('X123','X123');
insert into link_unified_id values ('X999','X999');
with cte1 ( unified_id) as
(
select
up1.unified_id
from
link_unified_id as up1
where
up1.unified_id = 'X001'
union all
select
up2.unified_id
from
link_unified_id as up2
inner join
cte1 on up2.unified_id = cte1.unified_id
)
select *
from cte1
/* hoping to see output like this:
X001
X002
X003
X123
*/

Maybe I don't get it but why not just join the master id from the selected item to the same table using that master id?
Something like this:
select lu1.unified_id
from link_unified_id lu1,
link_unified_id lu2
where lu1.master_id = lu2.master_id
and lu2.unified id = 'X001'

In my opinion, my requirement is to build an "undirected graph" and then find paths in it (not sure what the terminology is), and this the solution to this in SQL is rather procedural (e.g. Recursive CTE in presence of circular references). I have decided to actually build a graph by inserting these pairs via a Python library which seems more elegant than trying to do this in SQL.

Importing multiple xml files, filter, then output to different xml format in SSIS

I have a task that requires me to pull in a set of xml files, which are all related, then pick out a subset of records from these files, transform some columns, and export to a a single xml file using a different format.
I'm new to SSIS, and so far I've managed to first import two of the xml files (for simplicity, starting with just two files).
The first file we can call "Item", containing some basic metadata, amongst those an ID, which is used to identify related records in the second file "Milestones". I filter my "valid records" using a lookup transformation in my dataflow - now I have the valid Item ID's to fetch the records I need. I funnel these valid ID's (along with the rest of the columns from Item.xml through a Sort, then into a merge join.
The second file is structured with 2 outputs, one containing two columns (ItemID and RowID). The second containing all of the Milestone related data plus a RowID. I put these through a inner merge join, based on RowID, so I have the ItemID in with the milestone data. Then I do a full outer join merge join on both files, using ItemID.
This gives me data sort of like this:
ItemID[1] - MilestoneData[2]
ItemID[1] - MilestoneData[3]
ItemID[1] - MilestoneData[4]
ItemID[1] - MilestoneData[5]
ItemID[2] - MilestoneData[6]
ItemID[2] - MilestoneData[7]
ItemID[2] - MilestoneData[8]
I can put this data through derived column transformations to create the columns of data I actually need, but I can't see how to structure this in a relational way/normalize it into a different xml format.
The idea is to output something like:
<item id="1">
<Milestone id="2">
<Milestone />
<Milestone id="3">
<Milestone />
</item>
Can anyone point me in the right direction?
UPDATE:
A bit more detailed picture of what I have, and what I'd like to achieve:
Item.xml:
<Items>
<Item ItemID="1">
<Title>
Data
</Title>
</Item>
<Item ItemID="2">
...
</Item>
...
</Items>
Milestone.xml:
<Milestones>
<Item ItemID="2">
<MS id="3">
<MS_DATA>
Data
</MS_DATA>
</MS>
<MS id="4">
<MS_DATA>
Data
</MS_DATA>
</MS>
</Item>
<Item ItemID="3">
<MS id="5">
<MS_DATA>
Data
</MS_DATA>
</MS>
</item>
</Milestones>
The way it's presented in SSIS when I use XML source, is not entirely intuitive, meaning the Item rows and the MS rows are two seperate outputs. I had to run these through a join in order to get the Milestones that corresponds to specific Items. No problem here, then run it through a full outer join with the items, so I get a flattened table with multiple rows containing obviously the same data for an Item and with different data for the MS. Basically I get what I tried to show in my table, lots of redundant Item data, for each unique MilestoneData.
In the end it has to look similar to:
<NewItems>
<myNewItem ItemID="2">
<SomeDataDirectlyFromItem>
e.g. Title
</SomeDataDirectlyFromItem>
<DataConstructedFromMultipleColumnsInItem>
<MyMilestones>
<MS_DATA_TRANSFORMED MSID="3">
data
</MS_DATA_TRANSFORMED>
<MS_DATA_TRANSFORMED MSID="4">
data
</MS_DATA_TRANSFORMED>
</MyMilestones>
</DataConstructedFromMultipleColumnsInItem>
<myNewItem ItemID="3">
<SomeDataDirectlyFromItem>
e.g. Title
</SomeDataDirectlyFromItem>
<DataConstructedFromMultipleColumnsInItem>
<MyMilestones>
<MS_DATA_TRANSFORMED MSID="5">
data
</MS_DATA_TRANSFORMED>
</MyMilestones>
</DataConstructedFromMultipleColumnsInItem>
</myNewItem>
<myNewItem ItemID="4">
<SomeDataDirectlyFromItem>
e.g. Title
</SomeDataDirectlyFromItem>
<DataConstructedFromMultipleColumnsInItem>
<MyMilestones></MyMilestones>
</DataConstructedFromMultipleColumnsInItem>
</myNewItem>
</NewItems>

I would try to handle this using a script component with the component type transformation. As you are new to ssis, i assume you haven't used this before. So basically you
define input columns, your component will expect (i.e. column input_xml containing ItemID[1] - MilestoneData[2];...
use c# to create a logic which cuts and sticks together
define output columns your component will use to deliver the transformed row
You will face the problem that one row will probably be used two times in the end, like i.e.
ItemID[1] - MilestoneData[2]
will result in
<item id="1">
<Milestone id="2">
I have done something pretty similar using Pentaho kettle, even without using something like a script component in which you define own logic. But i guess ssis has a lack of tasks here.

How about importing the XML into relational tables ( eg in tempdb ) then using FOR XML PATH to reconstruct the XML? FOR XML PATH offers a high degree of control over how you want the XML to look. A very simple example below:
CREATE TABLE #items ( itemId INT PRIMARY KEY, title VARCHAR(50) NULL )
CREATE TABLE #milestones ( itemId INT, msId INT, msData VARCHAR(50) NOT NULL, PRIMARY KEY ( itemId, msId ) )
GO
DECLARE #itemsXML XML
SELECT #itemsXML = x.y
FROM OPENROWSET( BULK 'c:\temp\items.xml', SINGLE_CLOB ) x(y)
INSERT INTO #items ( itemId, title )
SELECT
i.c.value('#ItemID', 'INT' ),
i.c.value('(Title/text())[1]', 'VARCHAR(50)' )
FROM #itemsXML.nodes('Items/Item') i(c)
GO
DECLARE #milestoneXML XML
SELECT #milestoneXML = x.y
FROM OPENROWSET( BULK 'c:\temp\milestone.xml', SINGLE_CLOB ) x(y)
INSERT INTO #milestones ( itemId, msId, msData )
SELECT
i.c.value('#ItemID', 'INT' ),
i.c.value('(MS/#id)[1]', 'VARCHAR(50)' ) msId,
i.c.value('(MS/MS_DATA/text())[1]', 'VARCHAR(50)' ) msData
FROM #milestoneXML.nodes('Milestones/Item') i(c)
GO
SELECT
i.itemId AS "#ItemID"
FROM #items i
INNER JOIN #milestones ms ON i.itemId = ms.itemId
FOR XML PATH('myNewItem'), ROOT('NewItems'), TYPE
DROP TABLE #items
DROP TABLE #milestones

Join a table to itself

this is one on my database tables template.
Id int PK
Title nvarchar(10) unique
ParentId int
This is my question.Is there a problem if i create a relation between "Id" and "ParentId" columns?
(I mean create a relation between a table to itself)
I need some advices about problems that may occur during insert or updater or delete operations at developing step.thanks

You can perfectly join the table with it self.
You should be aware, however, that your design allows you to have multiple levels of hierarchy. Since you are using SQL Server (assuming 2005 or higher), you can have a recursive CTE get your tree structure.
Proof of concept preparation:
declare #YourTable table (id int, parentid int, title varchar(20))
insert into #YourTable values
(1,null, 'root'),
(2,1, 'something'),
(3,1, 'in the way'),
(4,1, 'she moves'),
(5,3, ''),
(6,null, 'I don''t know'),
(7,6, 'Stick around');
Query 1 - Node Levels:
with cte as (
select Id, ParentId, Title, 1 level
from #YourTable where ParentId is null
union all
select yt.Id, yt.ParentId, yt.Title, cte.level + 1
from #YourTable yt inner join cte on cte.Id = yt.ParentId
)
select cte.*
from cte
order by level, id, Title

No, you can do self join in your table, there will not be any problem. Are you talking which types of problems in insert, update, delete operation ? You can check some conditions like ParentId exists before adding new record, or you can check it any child exist while deleting parent.
You can do self join like :
select t1.Title, t2.Title as 'ParentName'
from table t1
left join table t2
on t1.ParentId = t2.Id

You've got plenty of good answers here. One other thing to consider is referential integrity. You can have a foreign key on a table that points to another column in the same table. Observe:
CREATE TABLE tempdb.dbo.t
(
Id INT NOT NULL ,
CONSTRAINT PK_t PRIMARY KEY CLUSTERED ( Id ) ,
ParentId INT NULL ,
CONSTRAINT FK_ParentId FOREIGN KEY ( ParentId ) REFERENCES tempdb.dbo.t ( Id )
)
By doing this, you ensure that you're not going to get garbage in the ParentId column.

Its called Self Join and it can be added to a table as in following example
select e1.emp_name 'manager',e2.emp_name 'employee'
from employees e1 join employees e2
on e1.emp_id=e2.emp_manager_id

I have seen this done without errors before on a table for menu hierarchy you shouldnt have any issues providing your insert / update / delete queries are well written.
For instance when you insert check a parent id exists, when you delete check you delete all children too if this action is appropriate or do not allow deletion of items that have children.

It is fine to do this (it's a not uncommon pattern). You must ensure that you are adding a child record to a parent record that actually exists etc., but there's noting different here from any other constraint.
You may want to look at recursive common table expressions:
http://msdn.microsoft.com/en-us/library/ms186243.aspx
As a way of querying an entire 'tree' of records.

This is not a problem, as this is a relationship that's common in real life. If you do not have a parent (which happens at the top level), you need to keep this field "null", only then do update and delete propagation work properly.

Build XML Off of Common Table Expression

I am using a CTE to recurse data I have stored in a recursive table. The trouble is I am trying to figure out how I can use "FOR XML" to build the desired xml output. I have a Table of Contents table I am recursing and I want to be able to use that data to generate the XML.
Here is an example of what the data is simliar to:
ID|TOC_ID|TOC_SECTION|TOC_DESCRIPTON|PARENT_ID
1|I|Chapter|My Test Chapter|-1
2|A|Section|My Test Section|1
3|1|SubSection|My SubSection|2
I want to be able to spit out the data like so:
XML Attributes:
ID = Appended values from the TOC_ID field
value = value from TOC_Section field
<FilterData>
<Filter id="I" value="Chapter">
<Description>My Test Chapter</Description>
<Filter id="I_A" value="Section">
<Description>My Test Section</Description>
<Filter id="I_A_1" value="SubSection">
<Description>My Test SubSection</Description>
</Filter>
</Filter>
</Filter>
</FilterData>
Not sure how I can take the CTE data and produce a similar format to the above. When the data is in separate tables it isn't too difficult to build this type of output.
As always appreciate the input.
Thanks,
S

You may get some mileage from Recursive Hierarchies to XML in Christian Wade's blog - it all looks mighty painful to me!

Check this out Will (Not sure you are still following)....this does have a 32 level max, but that should still work fine for my stuff...can't see going deeper than that. Found this on another forum:
CREATE TABLE tree ( id INT, name VARCHAR(5), parent_id INT )
GO
INSERT INTO tree VALUES ( 1, 'N1', NULL )
INSERT INTO tree VALUES ( 3, 'N4', 1 )
INSERT INTO tree VALUES ( 4, 'N10', 3 )
INSERT INTO tree VALUES ( 5, 'N7', 3 )
GO
CREATE FUNCTION dbo.treeList(#parent_id int)
RETURNS XML
WITH RETURNS NULL ON NULL INPUT
BEGIN RETURN
(SELECT id as "#id", name as "#name",
CASE WHEN parent_id=#parent_id
THEN dbo.treeList(id)
END
FROM dbo.tree WHERE parent_id=#parent_id
FOR XML PATH('tree'), TYPE)
END
GO
SELECT id AS "#id", name AS "#name",
CASE WHEN id=1
THEN dbo.treeList(id)
END
FROM tree
WHERE id=1
FOR XML PATH('tree'), TYPE
Now isn't that nice and simple?
Customised from the great example over on http://msdn.microsoft.com/en-us/library/ms345137.aspx

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Using TSQL to generate XML - sql

Related

SQL Server - matching attributes query

SQL Server recursive query which is not working

Importing multiple xml files, filter, then output to different xml format in SSIS

Join a table to itself

Build XML Off of Common Table Expression

Categories

Resources