Need to convert a recursive CTE query to an index friendly query - sql

After going through all the hard work of writing a recursive CTE query to meet my needs, I realize I can't use it because it doesn't work in an indexed view. So I need something else to replace the CTE below. (Yes you can use a CTE in a non-indexed view, but that's too slow for me).
The requirements:
My ultimate goal is to have a self updating indexed view (it doesn't have to be a view, but something similar)... that is, if data changes in any of the tables the view joins on, then the view needs to update itself.
The view needs to be indexed because it has to be very fast, and the data doesn't change very frequently. Unfortunately, the non-indexed view using a CTE takes 3-5 seconds to run which is way too long for my needs. I need the query to run in milliseconds. The recursive table has a few hundred thousand records in it.
As far as my research has taken me, the best solution to meet all these requirements is an indexed view, but I'm open to any solution.
The CTE can be found in the answer to my other post.
Or here it is again:
DECLARE #tbl TABLE (
Id INT
,[Name] VARCHAR(20)
,ParentId INT
)
INSERT INTO #tbl( Id, Name, ParentId )
VALUES
(1, 'Europe', NULL)
,(2, 'Asia', NULL)
,(3, 'Germany', 1)
,(4, 'UK', 1)
,(5, 'China', 2)
,(6, 'India', 2)
,(7, 'Scotland', 4)
,(8, 'Edinburgh', 7)
,(9, 'Leith', 8)
;
DECLARE #tbl2 table (id int, abbreviation varchar(10), tbl_id int)
INSERT INTO #tbl2( Id, Abbreviation, tbl_id )
VALUES
(100, 'EU', 1)
,(101, 'AS', 2)
,(102, 'DE', 3)
,(103, 'CN', 5)
;WITH abbr AS (
SELECT a.*, isnull(b.abbreviation,'') abbreviation
FROM #tbl a
left join #tbl2 b on a.Id = b.tbl_id
), abcd AS (
-- anchor
SELECT id, [Name], ParentID,
CAST(([Name]) AS VARCHAR(1000)) [Path],
cast(abbreviation as varchar(max)) abbreviation
FROM abbr
WHERE ParentId IS NULL
UNION ALL
--recursive member
SELECT t.id, t.[Name], t.ParentID,
CAST((a.path + '/' + t.Name) AS VARCHAR(1000)) [Path],
isnull(nullif(t.abbreviation,'')+',', '') + a.abbreviation
FROM abbr AS t
JOIN abcd AS a
ON t.ParentId = a.id
)
SELECT *, [Path] + ':' + abbreviation
FROM abcd

After hitting all the roadblocks with indexed views (self join, cte, udf accessing data etc), I propose that the below as a solution for you.
Create support function
Based on maximum depth of 4 from root (5 total). Or use a CTE
CREATE FUNCTION dbo.GetHierPath(#hier_id int) returns varchar(max)
WITH SCHEMABINDING
as
begin
return (
select FullPath =
isnull(H5.Name+'/','') +
isnull(H4.Name+'/','') +
isnull(H3.Name+'/','') +
isnull(H2.Name+'/','') +
H1.Name
+
':'
+
isnull(STUFF(
isnull(','+A1.abbreviation,'') +
isnull(','+A2.abbreviation,'') +
isnull(','+A3.abbreviation,'') +
isnull(','+A4.abbreviation,'') +
isnull(','+A5.abbreviation,''),1,1,''),'')
from dbo.HIER H1
left join dbo.ABBR A1 on A1.hier_id = H1.Id
left join dbo.HIER H2 on H1.ParentId = H2.Id
left join dbo.ABBR A2 on A2.hier_id = H2.Id
left join dbo.HIER H3 on H2.ParentId = H3.Id
left join dbo.ABBR A3 on A3.hier_id = H3.Id
left join dbo.HIER H4 on H3.ParentId = H4.Id
left join dbo.ABBR A4 on A4.hier_id = H4.Id
left join dbo.HIER H5 on H4.ParentId = H5.Id
left join dbo.ABBR A5 on A5.hier_id = H5.Id
where H1.id = #hier_id)
end
GO
Add columns to the table itself
For example the fullpath column, if you need, add the other 2 columns in the CTE by splitting the result of dbo.GetHierPath on ':' (left=>path, right=>abbreviations)
-- index maximum key length is 900, based on your data, 400 is enough
ALTER TABLE HIER ADD FullPath VARCHAR(400)
Maintain the columns
Because of the hierarchical nature, record X could be deleted that affects a Y descendent and Z ancestor, which is quite hard to identify in either of INSTEAD OF or AFTER triggers. So the alternative approach is based on the conditions
if data changes in any of the tables the view joins on, then the view needs to update itself.
the non-indexed view using a CTE takes 3-5 seconds to run which is way too long for my needs
We maintain the data simply by running through the entire table again, taking 3-5 seconds per update (or faster if the 5-join query works out better).
CREATE TRIGGER TG_HIER
ON HIER
AFTER INSERT, UPDATE, DELETE
AS
UPDATE HIER
SET FullPath = dbo.GetHierPath(HIER.Id)
Finally, index the new column(s) on the table itself
create index ix_hier_fullpath on HIER(FullPath)
If you intended to access the path data via the id, then it is already in the table itself without adding an additional index.
The above TSQL references these objects
Modify the table and column names to suit your schema.
CREATE TABLE dbo.HIER (Id INT Primary Key Clustered, [Name] VARCHAR(20) ,ParentId INT)
;
INSERT dbo.HIER( Id, Name, ParentId ) VALUES
(1, 'Europe', NULL)
,(2, 'Asia', NULL)
,(3, 'Germany', 1)
,(4, 'UK', 1)
,(5, 'China', 2)
,(6, 'India', 2)
,(7, 'Scotland', 4)
,(8, 'Edinburgh', 7)
,(9, 'Leith', 8)
,(10, 'Antartica', NULL)
;
CREATE TABLE dbo.ABBR (id int primary key clustered, abbreviation varchar(10), hier_id int)
;
INSERT dbo.ABBR( Id, Abbreviation, hier_id ) VALUES
(100, 'EU', 1)
,(101, 'AS', 2)
,(102, 'DE', 3)
,(103, 'CN', 5)
GO
EDIT - Possibly faster alternative
Given that all records are recalculated each time, there is no real need for a function that returns the FullPath for a single HIER.ID. The query in the support function can be used without the where H1.id = #hier_id filter at the end. Furthermore, the expression for FullPath can be broken into PathOnly and Abbreviation easily down the middle. Or just use the original CTE, whichever is faster.

Related

Use Left Join Alias in Column Select in SQL Views

I am working on creating a view in SQL server one of the columns for which needs to be a comma separated value from a different table. Consider the tables below for instance -
CREATE TABLE Persons
(
Id INT NOT NULL PRIMARY KEY,
Name VARCHAR (100)
)
CREATE TABLE Skills
(
Id INT NOT NULL PRIMARY KEY,
Name VARCHAR (100),
)
CREATE TABLE PersonSkillLinks
(
Id INT NOT NULL PRIMARY KEY,
SkillId INT FOREIGN KEY REFERENCES Skills(Id),
PersonId INT FOREIGN KEY REFERENCES Persons(Id),
)
Sample data
INSERT INTO Persons VALUES
(1, 'Peter'),
(2, 'Sam'),
(3, 'Chris')
INSERT INTO Skills VALUES
(1, 'Poetry'),
(2, 'Cooking'),
(3, 'Movies')
INSERT INTO PersonSkillLinks VALUES
(1, 1, 1),
(2, 2, 1),
(3, 3, 1)
What I want is something like shown in the image
While I have been able to get the results using the script below, I have a feeling that this is not the best (and certainly not the only) way to do as far as performance goes -
CREATE VIEW vwPersonsAndTheirSkills
AS
SELECT p.Name,
ISNULL(STUFF((SELECT ', ' + s.Name FROM Skills s JOIN PersonSkillLinks psl ON s.Id = psl.SkillId WHERE psl.personId = p.Id FOR XML PATH ('')), 1, 2, ''), '') AS Skill
FROM Persons p
GO
I also tried my luck with the script below -
CREATE VIEW vwPersonsAndTheirSkills
AS
SELECT p.Name,
ISNULL(STUFF((SELECT ', ' + skill.Name FOR XML PATH ('')), 1, 2, ''), '') AS Skill
FROM persons p
LEFT JOIN
(
SELECT s.Name, psl.personid FROM Skills s
JOIN PersonSkillLinks psl ON s.Id = psl.SkillId
) skill ON skill.personId = p.Id
GO
but it is not concatenating the strings and returning separate rows for each skill as shown below -
So, is my assumption about the first script correct? If so, what concept am I missing about it and what should be the most efficient way to achieve it.
I would try with APPLY :
SELECT p.Name, STUFF(ss.skills, 1, 2, '') AS Skill
FROM Persons p OUTER APPLY
(SELECT ', ' + s.Name
FROM Skills s JOIN
PersonSkillLinks psl
ON s.Id = psl.SkillId
WHERE psl.personId = p.Id
FOR XML PATH ('')
) ss(skills);
By this way, optimizer will call STUFF() once not for all rows returned by outer query.

Grouping on some common values

This is a hard problem to explain, but I'm trying to create a SQL query that generates a list of parent groups that contains all groups where at least one group shares a product with another group. But they don't ALL have to share products, as long as one other group does they would be included in the parent group.
So for example: Because group 1 has {101,102,103} and group 5 has {101,104,105} they would be considered both part of the same parent group because they share
product 101 in common. So would group 4 {104}, because it has product 104 in common with group 5 (even though it doesn't have a product id in common with group 1).
Example Data:
group_id product_id
1 101
1 102
1 103
2 101
3 103
4 104
5 101
5 104
5 105
6 105
6 106
6 107
7 110
7 111
Results:
parent_group_id group_id
1 1
1 2
1 3
1 4
1 5
1 6
2 7
There is no real limit to the amount of products that could be listed under a group.
I'm not really sure how to go about tackling this. Perhaps recursion using a CTE?
Ideally I'd like to be able to do this on the fly so that I can find all linked products and query them together as a large set.
Edit:
I based the following solution on Raul's answer below. The change was to the bottomLevel CTE. In their solution, the value of the group_id matters and grouping could be "missed". For example, in dataset below, group 2 would not be seen to have a parent group id of 1 because the groups that links 2 to 1 (5,6 and 8) have group ids larger then 2. My solution is to just use a straightforward self join on product id. This solves that problem, but the performance is brutal (stopped it after 30mins) when I use my testing dataset of 150K rows. In production I could expect millions.
I tried tossing the bottomLevel CTE into a temp table and putting an index on it and that helps a bit with smaller datasets, but still way too slow on the full set.
Am I out of luck here?
CREATE TABLE #products
(
group_id int not null,
product_id int not null
)
INSERT INTO #products
VALUES(1, 101)
,(1, 102)
,(1, 103)
,(2, 110)
,(2, 111)
,(3, 103)
,(4, 104)
,(5, 101)
,(5, 104)
,(5, 105)
,(6, 105)
,(6, 106)
,(6, 107)
,(8,106)
,(8,111)
,(9,201)
,(10,300)
,(11,300)
,(11,301)
CREATE CLUSTERED INDEX cx_prods ON #products (group_id,product_id);
----------------------------------------------------------------
;WITH bottomLevel AS (
SELECT DISTINCT
sp.group_id as parent_group_id
,p.group_id
FROM
#products p
inner JOIN
#products sp
ON
sp.product_id = p.product_id
),
rc AS (
SELECT parent_group_id
, group_id
FROM bottomLevel
UNION ALL
SELECT b.parent_group_id
, r.group_id
FROM rc r
INNER JOIN bottomLevel b
ON r.parent_group_id = b.group_id
AND b.parent_group_id < r.parent_group_id
)
SELECT MIN(parent_group_id) as parent_group_id
, group_id
FROM rc
GROUP BY group_id
ORDER BY group_id
OPTION (MAXRECURSION 32767)
DROP TABLE #products
Marking Raul's answer as accepted because it helped me find the right direction.
But for those who may find this later, here is what I did.
The CTE method I based on Raul's answer worked, but was much too slow for my needs. I explored using the new graph features in SQL Server 2017, but it doesn't support transitive closure yet. So no luck there. But it did provide me with a term to search for : transitive closure clustering. I found the following two articles on doing it in SQL Server.
This one from Davide Mauri:
http://sqlblog.com/blogs/davide_mauri/archive/2017/11/12/lateral-thinking-transitive-closure-clustering-with-sql-server-uda-and-json.aspx
And this one from Itzik Ben-Gan:
http://www.itprotoday.com/microsoft-sql-server/t-sql-puzzle-challenge-grouping-connected-items
Both very helpful in understanding the problem, but I used Ben-Gan's solution 4.
It uses a while loop to unfold the connected nodes and removes processed edges from the temp input table as it runs.
It runs very fast on small to medium sets, and scales well. My test data of 1.2m rows runs in 2 minutes.
Here is my version of it:
First create a table to store the test data:
CREATE TABLE [dbo].[GroupsToProducts](
[group_id] [INT] NOT NULL,
[product_id] [INT] NOT NULL,
CONSTRAINT [PK_GroupsToProducts] PRIMARY KEY CLUSTERED
(
[group_id] ASC,
[product_id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
INSERT INTO GroupsToProducts
VALUES(1, 101)
,(1, 102)
,(1, 103)
,(2, 110)
,(2, 111)
,(3, 103)
,(4, 104)
,(5, 101)
,(5, 104)
,(5, 105)
,(6, 105)
,(6, 106)
,(6, 107)
,(8,106)
,(8,111)
,(9,201)
,(10,300)
,(11,300)
,(11,301)
Then run the script to generate the clusters.
CREATE TABLE #group_rels
(
from_group_id int not null,
to_group_id int not null
)
INSERT INTO #group_rels
SELECT
p.group_id AS from_group_id,
sp.group_id AS to_group_id
FROM
GroupsToProducts p
inner JOIN
GroupsToProducts sp
ON
sp.product_id = p.product_id
AND p.group_id < sp.group_id
GROUP BY
p.group_id,
sp.group_id
CREATE UNIQUE CLUSTERED INDEX idx_from_group_id_to_group_id ON #group_rels(from_group_id, to_group_id);
CREATE UNIQUE NONCLUSTERED INDEX idx_to_group_id_from_group_id ON #group_rels(to_group_id, from_group_id);
-------------------------------------------------
CREATE TABLE #G
(
group_id INT NOT NULL,
parent_group_id INT NOT NULL,
lvl INT NOT NULL,
PRIMARY KEY NONCLUSTERED (group_id),
UNIQUE CLUSTERED(lvl, group_id)
);
DECLARE #lvl AS INT = 1, #added AS INT, #from_group_id AS INT, #to_group_id AS INT;
DECLARE #CurIds AS TABLE(id INT NOT NULL);
-- gets the first relationship pair
-- will use the from_group_id as a 'root' group
SELECT TOP (1)
#from_group_id = from_group_id,
#to_group_id = to_group_id
FROM
#group_rels
ORDER BY
from_group_id,
to_group_id;
SET #added = ##ROWCOUNT;
WHILE #added > 0
BEGIN
-- inserts two rows into the output table:
-- a self pairing using from_group_id
-- AND the actual relationship pair
INSERT INTO #G
(group_id, parent_group_id, lvl)
VALUES
(#from_group_id, #from_group_id, #lvl),
(#to_group_id, #from_group_id, #lvl);
-- removes the pair from input table
DELETE FROM #group_rels
WHERE
from_group_id = #from_group_id
AND to_group_id = #to_group_id;
WHILE #added > 0
BEGIN
-- increment the lvl variable so we only look at the most recently inserted data
SET #lvl += 1;
----------------------------------------------------------------------------
-- the same basic chunk of code is done twice
-- once for group_ids in the output table that join against from_group_id and
-- once for group_ids in the output table that join against to_group_id
-- 1 - join the output table against the input table, looking for any groups that join
-- against groups that have already been found to (directly or indirectly) connect to the root group.
-- 2 - store the found group_ids in the #CurIds table variable and delete the relationship from the input table.
-- 3 - insert the group_ids in the output table using #from_group_id (the current root node id) as the parent group id
-- if any rows are added to the output table in either chunk, loop and look for any groups that may connect to them.
------------------------------------------------------------------------------
DELETE FROM #CurIds;
DELETE FROM group_rels
OUTPUT deleted.to_group_id AS id INTO #CurIds(id)
FROM
#G AS G
INNER JOIN #group_rels AS group_rels
ON G.group_id = group_rels.from_group_id
WHERE
lvl = #lvl - 1;
INSERT INTO #G
(group_id, parent_group_id, lvl)
SELECT DISTINCT
id,
#from_group_id AS parent_group_id,
#lvl AS lvl
FROM
#CurIds AS C
WHERE
NOT EXISTS
(
SELECT
*
FROM
#G AS G
WHERE
G.group_id = C.id
);
SET #added = ##ROWCOUNT;
-----------------------------------------------------------------------------------
DELETE FROM #CurIds;
DELETE FROM group_rels
OUTPUT deleted.from_group_id AS id INTO #CurIds(id)
FROM
#G AS G
INNER JOIN #group_rels AS group_rels
ON G.group_id = group_rels.to_group_id
WHERE
lvl = #lvl - 1;
INSERT INTO #G
(group_id, parent_group_id, lvl)
SELECT DISTINCT
id,
#from_group_id AS grp,
#lvl AS lvl
FROM
#CurIds AS C
WHERE
NOT EXISTS
(
SELECT
*
FROM
#G AS G
WHERE
G.group_id = C.id
);
SET #added += ##ROWCOUNT;
END;
------------------------------------------------------------------------------
-- At this point, no new rows were added, so the cluster should be complete.
-- Look for another row in the input table to use as a root group
SELECT TOP (1)
#from_group_id = from_group_id,
#to_group_id = to_group_id
FROM
#group_rels
ORDER BY
from_group_id,
to_group_id;
SET #added = ##ROWCOUNT;
END;
SELECT
parent_group_id,
group_id,
lvl
FROM #G
--ORDER BY
--parent_group_id,
--group_id,
--lvl
-------------------------------------------------
DROP TABLE #G
DROP TABLE #group_rels
Take the following statement as a head start:
CREATE TABLE products
(
group_id int not null,
product_id int not null
)
INSERT INTO products
VALUES(1, 101)
,(1, 102)
,(1, 103)
,(2, 101)
,(3, 103)
,(4, 104)
,(5, 101)
,(5, 104)
,(5, 105)
,(6, 105)
,(6, 106)
,(6, 107)
,(7, 110)
,(7, 111)
;WITH bottomLevel AS (
SELECT ISNULL(MIN(matchedGroup),group_id) as parent_group_id
, group_id
FROM products p
OUTER APPLY (
SELECT MIN(group_id) AS matchedGroup
FROM products sp
WHERE sp.group_id != p.group_id
AND sp.product_id = p.product_id
) oa
GROUP BY p.group_id
),
rc AS (
SELECT parent_group_id
, group_id
FROM bottomLevel
UNION ALL
SELECT b.parent_group_id
, r.group_id
FROM rc r
INNER JOIN bottomLevel b
ON r.parent_group_id = b.group_id
AND b.parent_group_id < r.parent_group_id
)
SELECT MIN(parent_group_id) as parent_group_id
, group_id
FROM rc
GROUP BY group_id
ORDER BY group_id
OPTION (MAXRECURSION 32767)
I first grouped by group_id getting the smallest group_id having matching products and recursively joined the parents that have a minor parent.
Now this solution will probably not cover all exceptions you might encounter in a production, but should help you start somewhere.
Also if you have a large product table, this might run really slow, so consider doing this data matching using C#, Spark or SSIS or any other data manipulation engine.

T-SQL - Concatenation of names on TWO tables/orphans

I'm prepared to be crucified for asking my first question on SO and what is a potentially duplicate question, but I cannot find it for the life of me.
I have three tables, a product table, a linking table, and a child table with names. Preloaded on SQLFiddle >> if I still have your attention.
CREATE TABLE Product (iProductID int NOT NULL PRIMARY KEY
, sProductName varchar(50) NOT NULL
, iPartGroupID int NOT NULL)
INSERT INTO Product VALUES
(10001, 'Avionic Tackle', '1'),
(10002, 'Eigenspout', '2'),
(10003, 'Impulse Polycatalyst', '3'),
(10004, 'O-webbing', '2'),
(10005, 'Ultraservo', '3'),
(10006, 'Yttrium Coil', '5')
CREATE TABLE PartGroup (iPartGroupID int NOT NULL
, iChildID int NOT NULL)
INSERT INTO PartGroup VALUES
(1, 1),
(2, 2),
(3, 1),
(3, 2),
(3, 3),
(3, 4),
(4, 5),
(4, 6),
(5, 1)
CREATE TABLE PartNames (iChildID int NOT NULL PRIMARY KEY
, sPartNameText varchar(50) NOT NULL)
INSERT INTO PartNames VALUES
(1, 'Bulbcap Lube'),
(2, 'Chromium Deltaquartz'),
(3, 'Dilation Gyrosphere'),
(4, 'Fliphose'),
(5, 'G-tightener Bypass'),
(6, 'Heisenberg Shuttle')
I am trying to find out how to list all the part groups (that may or may not belong to a product), and translate their child names. That is, how do I use only the linking table and child name table to list all the translated elements of the linking table. I am trying to find orphans.
I have two queries:
SELECT P.iPartGroupID
,STUFF(
(SELECT
CONCAT(', ', PN.sPartNameText)
FROM PartGroup PG
INNER JOIN PartNames PN ON PN.iChildID = PG.iChildID
WHERE PG.iPartGroupID = P.iPartGroupID
FOR XML PATH(''), TYPE
).value('.', 'VARCHAR(MAX)')
, 1, 2, ''
) AS [Child Elements]
FROM Product P
GROUP BY P.iPartGroupID
This lists all the part groups that belong to a product, and their child elements by name. iPartGroupID = 4 is not here.
I also have:
SELECT PG.iPartGroupID
,STUFF(
(SELECT
CONCAT(', ', PGList.iChildID)
FROM PartGroup PGList
WHERE PGList.iPartGroupID = PG.iPartGroupID
FOR XML PATH(''), TYPE
).value('.', 'VARCHAR(MAX)')
, 1, 2, ''
) AS [Child Elements]
FROM PartGroup PG
GROUP BY PG.iPartGroupID
This lists all the part groups, and their child elements by code. iPartGroupID = 4 is covered here, but the names aren't translated.
What query can I use to list the orphan part groups (and also the orphan parts):
4 G-tightener Bypass, Heisenberg Shuttle
Ideally it is included in a list of all the other part groups, but if not, I can union the results.
Every other SO question I've looked up uses either 3 tables, or only 1 table, self joining with aliases. Does anyone have any ideas?
No XML in the part names, no particular preference for CONCAT or SELECT '+'.
I would link to other posts, but I can't without points :(
I'm not entirely sure what do you mean, exactly, when you use the word "translate". And your required output seems to contradict your sample data (if I'm not lost something).
Nevertheless, try this query, maybe it's what you need:
select sq.iPartGroupID, cast((
select pn.sPartNameText + ',' as [data()] from #PartNames pn
inner join #PartGroup p on pn.iChildID = p.iChildID
where p.iPartGroupID = sq.iPartGroupID
order by pn.iChildID
for xml path('')
) as varchar(max)) as [GroupList]
from (select distinct pg.iPartGroupID from #PartGroup pg) sq
left join #Product pr on sq.iPartGroupID = pr.iPartGroupID
where pr.iProductID is null;
Following way you can use to get the answer you want
SELECT pg.iPartGroupID,
CASE COUNT(pg.iPartGroupID)
WHEN 1 THEN (
SELECT pn2.sPartNameText
FROM PartNames pn2
WHERE pn2.iChildID = pg.iPartGroupID
)
ELSE (
SELECT CASE ROW_NUMBER() OVER(ORDER BY(SELECT 1))
WHEN 1 THEN ''
ELSE ','
END + pn2.sPartNameText
FROM PartNames pn2
INNER JOIN PartGroup pg2
ON pg2.iChildID = pn2.iChildID
WHERE pg2.iPartGroupID = pg.iPartGroupID
FOR XML PATH('')
)
END
FROM PartGroup pg
GROUP BY
pg.iPartGroupID

Help with recursive CTE query joining to a second table

My objective is to recurse through table tbl and while recursing through that table select a country abbreviation (if it exists) from another table tbl2 and append those results together which are included in the final output.
The example I'll use will come from this post
tbl2 has a Foreign Key 'tbl_id' to tbl and looks like this
INSERT INTO #tbl2( Id, Abbreviation, tbl_id )
VALUES
(100, 'EU', 1)
,(101, 'AS', 2)
,(102, 'DE', 3)
,(103, 'CN', 5)
*Note: not all the countries have abbreviations.
The trick is, I want all the countries in Asia to at least show the abbreviation of Asia which is 'AS' even if a country doesn't have an abbreviation (like India for example). If the country does have an abbreviation the result needs to look like this: China:CN,AS
I've got it partly working using a subquery, but India always returns NULL for the abbreviation. It's acting like if there isn't a full recursive path back to the abbreviation, then it returns null. Maybe the solution is to use a left outer join on the abbreviation table? I've tried for hours many different variations and the subquery is as close as I can get.
WITH abcd
AS (
-- anchor
SELECT id, [Name], ParentID,
CAST(([Name]) AS VARCHAR(1000)) AS "Path"
FROM #tbl
WHERE ParentId IS NULL
UNION ALL
--recursive member
SELECT t.id, t.[Name], t.ParentID,
CAST((a.path + '/' + t.Name + ':' +
(
select t2.abbreviation + ','
from #tbl2
where t.id = t2.id
)) AS VARCHAR(1000)) AS "Path"
FROM #tbl AS t
JOIN abcd AS a
ON t.ParentId = a.id
)
SELECT * FROM abcd
btw, I'm using sql server 2005 if that matters
Try this example, which will give you the output (1 sample row)
id Name ParentID Path abbreviation (No column name)
5 China 2 Asia/China CN,AS Asia/China:CN,AS
The TSQL being
DECLARE #tbl TABLE (
Id INT
,[Name] VARCHAR(20)
,ParentId INT
)
INSERT INTO #tbl( Id, Name, ParentId )
VALUES
(1, 'Europe', NULL)
,(2, 'Asia', NULL)
,(3, 'Germany', 1)
,(4, 'UK', 1)
,(5, 'China', 2)
,(6, 'India', 2)
,(7, 'Scotland', 4)
,(8, 'Edinburgh', 7)
,(9, 'Leith', 8)
;
DECLARE #tbl2 table (id int, abbreviation varchar(10), tbl_id int)
INSERT INTO #tbl2( Id, Abbreviation, tbl_id )
VALUES
(100, 'EU', 1)
,(101, 'AS', 2)
,(102, 'DE', 3)
,(103, 'CN', 5)
;WITH abbr AS (
SELECT a.*, isnull(b.abbreviation,'') abbreviation
FROM #tbl a
left join #tbl2 b on a.Id = b.tbl_id
), abcd AS (
-- anchor
SELECT id, [Name], ParentID,
CAST(([Name]) AS VARCHAR(1000)) [Path],
cast(abbreviation as varchar(max)) abbreviation
FROM abbr
WHERE ParentId IS NULL
UNION ALL
--recursive member
SELECT t.id, t.[Name], t.ParentID,
CAST((a.path + '/' + t.Name) AS VARCHAR(1000)) [Path],
isnull(nullif(t.abbreviation,'')+',', '') + a.abbreviation
FROM abbr AS t
JOIN abcd AS a
ON t.ParentId = a.id
)
SELECT *, [Path] + ':' + abbreviation
FROM abcd

SQL - Ordering by multiple criteria

I have a table of categories. Each category can either be a root level category (parent is NULL), or have a parent which is a root level category. There can't be more than one level of nesting.
I have the following table structure:
Categories Table Structure http://img16.imageshack.us/img16/8569/categoriesi.png
Is there any way I could use a query which produced the following output:
Free Stuff
Hardware
Movies
CatA
CatB
CatC
Software
Apples
CatD
CatE
So the results are ordered by top level category, then after each top level category, subcategories of that category are listed?
It's not really ordering by Parent or Name, but a combo of the two. I'm using SQL Server.
It seems to me like you are looking to flatten and order your hierarchy, the cheapest way to get this ordering would be to store an additional column in the table that has the full path.
So for example:
Name | Full Path
Free Stuff | Free Stuff
aa2 | Free Stuff - aa2
Once you store the full path, you can order on it.
If you only have a depth of one you can auto generate a string to this effect with a single subquery (and order on it), but this solution does not work that easily when it gets deep.
Another option, is to move this all over to a temp table and calculate the full path there, on demand. But it is fairly expensive.
You could make the table look at itself, ordering by the parent Name then the child Name.
select categories.Name AS DisplayName
from categories LEFT OUTER JOIN
categories AS parentTable ON categories.Parent = parentTable.ID
order by parentTable.Name, DisplayName
Ok, here we go :
with foo as
(
select 1 as id, null as parent, 'CatA' as cat from dual
union select 2, null, 'CatB' from dual
union select 3, null, 'CatC' from dual
union select 4, 1, 'SubCatA_1' from dual
union select 5, 1, 'SubCatA_2' from dual
union select 6, 2, 'SubCatB_1' from dual
union select 7, 2, 'SubCatB_2' from dual
)
select child.cat
from foo parent right outer join foo child on parent.id = child.parent
order by case when parent.id is not null then parent.cat else child.cat end,
case when parent.id is not null then 1 else 0 end
Result :
CatA
SubCatA_1
SubCatA_2
CatB
SubCatB_1
SubCatB_2
CatC
Edit - Solution change inspire from van's order by ! Much simpler that way.
Not entirely sure of your questions but it sounds like PARTITION BY might be useful for you. There's a good introductory post on PARTITION BY here.
Here you have a complete working example using a resursive common table expression.
DECLARE #categories TABLE
(
ID INT NOT NULL,
[Name] VARCHAR(50),
Parent INT NULL
);
INSERT INTO #categories VALUES (4, 'Free Stuff', NULL);
INSERT INTO #categories VALUES (1, 'Hardware', NULL);
INSERT INTO #categories VALUES (3, 'Movies', NULL);
INSERT INTO #categories VALUES (2, 'Software', NULL);
INSERT INTO #categories VALUES (10, 'a', 0);
INSERT INTO #categories VALUES (12, 'apples', 2);
INSERT INTO #categories VALUES (8, 'catD', 2);
INSERT INTO #categories VALUES (9, 'catE', 2);
INSERT INTO #categories VALUES (5, 'catA', 3);
INSERT INTO #categories VALUES (6, 'catB', 3);
INSERT INTO #categories VALUES (7, 'catC', 3);
INSERT INTO #categories VALUES (11, 'aa2', 4);
WITH categories(ID, Name, Parent, HierarchicalName)
AS
(
SELECT
c.ID
, c.[Name]
, c.Parent
, CAST(c.[Name] AS VARCHAR(200)) AS HierarchicalName
FROM #categories c
WHERE c.Parent IS NULL
UNION ALL
SELECT
c.ID
, c.[Name]
, c.Parent
, CAST(pc.HierarchicalName + c.[Name] AS VARCHAR(200))
FROM #categories c
JOIN categories pc ON c.Parent = pc.ID
)
SELECT c.*
FROM categories c
ORDER BY c.HierarchicalName
SELECT
ID,
Name,
Parent,
RIGHT(
'000000000000000' +
CASE WHEN Parent IS NULL
THEN CONVERT(VARCHAR, Id)
ELSE CONVERT(VARCHAR, Parent)
END, 15
)
+ '_' + CASE WHEN Parent IS NULL THEN '0' ELSE '1' END
+ '_' + Name
FROM
categories
ORDER BY
4
The long padding is to account for the fact that SQL Server's INT data type goes from 2,147,483,648 through 2,147,483,647.
You can ORDER BY the expression directly, no need to use ORDER BY 4. It was just to show what it is sorting on.
It is worth noting that this expression cannot use any index. This means sorting a large table will be slow.