Finding breadcrumbs for nested sets - sql

I'm using nested sets (aka modified preorder tree traversal) to store a list of groups, and I'm trying to find a quick way to generate breadcrumbs (as a string, not a table) for ALL of the groups at once. My data is also stored using the adjacency list model (there are triggers to keep the two in sync).
So for example:
ID Name ParentId Left Right
0 Node A 0 1 12
1 Node B 0 2 5
2 Node C 1 3 4
3 Node D 0 6 11
4 Node E 3 7 8
5 Node F 4 9 9
Which represents the tree:
Node A
Node B
Node C
Node D
Node E
Node F
I would like to be able to have a user-defined function that returns a table:
ID Breadcrumb
0 Node A
1 Node A > Node B
2 Node A > Node B > Node C
3 Node A > Node D
4 Node A > Node D > Node E
5 Node A > Node D > Node F
To make this slightly more complicated (though it's sort of out of the scope of the question), I also have user restrictions that need to be respected. So for example, if I only have access to id=3, when I run the query I should get:
ID Breadcrumb
3 Node D
4 Node D > Node E
5 Node D > Node F
I do have a user-defined function that takes a userid as a parameter, and returns a table with the ids of all groups that are valid, so as long as somewhere in the query
WHERE group.id IN (SELECT id FROM dbo.getUserGroups(#userid))
it will work.
I have an existing scalar function that can do this, but it just does not work on any reasonable number of groups (takes >10 seconds on 2000 groups). It takes a groupid and userid as a parameter, and returns a nvarchar. It finds the given groups parents (1 query to grab the left/right values, another to find the parents), restricts the list to the groups the user has access to (using the same WHERE clause as above, so yet another query), and then uses a cursor to go through each group and append it to a string, before finally returning that value.
I need a method to do this that will run quickly (eg. <= 1s), on the fly.
This is on SQL Server 2005.

here's the SQL that worked for me to get the "breadcrumb" path from any point in the tree. Hope it helps.
SELECT ancestor.id, ancestor.title, ancestor.alias
FROM `categories` child, `categories` ancestor
WHERE child.lft >= ancestor.lft AND child.lft <= ancestor.rgt
AND child.id = MY_CURRENT_ID
ORDER BY ancestor.lft
Kath

Ok. This is for MySQL, not SQL Server 2005. It uses a GROUP_CONCAT with a subquery.
This should return the full breadcrumb as single column.
SELECT
(SELECT GROUP_CONCAT(parent.name SEPARATOR ' > ')
FROM category parent
WHERE node.Left >= parent.Left
AND node.Right <= parent.Right
ORDER BY Left
) as breadcrumb
FROM category node
ORDER BY Left

If you can, use a path (or I think I've heard it referred as a lineage) field like:
ID Name ParentId Left Right Path
0 Node A 0 1 12 0,
1 Node B 0 2 5 0,1,
2 Node C 1 3 4 0,1,2,
3 Node D 0 6 11 0,3,
4 Node E 3 7 8 0,3,4,
5 Node F 4 9 9 0,3,4,
To get just node D and onward (psuedocode):
path = SELECT Path FROM Nodes WHERE ID = 3
SELECT * FROM Nodes WHERE Path LIKE = path + '%'

I modified the Statement of Kathy to get breadcrumbs for every element
SELECT
GROUP_CONCAT(
ancestor.name
ORDER BY ancestor.lft ASC
SEPARATOR ' > '
),
child.*
FROM `categories` child
JOIN `categories` ancestor
ON child.lft >= ancestor.lft
AND child.lft <= ancestor.rgt
GROUP BY child.lft
ORDER BY child.lft
Feel free to add a WHERE condition e.g.
WHERE ancestor.lft BETWEEN 6 AND 11

What I ended up doing is making a large join that simply ties this table to itself, over and over for every level.
First I populate a table #topLevelGroups with just the 1st level groups (if you only have one root you can skip this step), and then #userGroups with the groups that user can see.
SELECT groupid,
(level1
+ CASE WHEN level2 IS NOT NULL THEN ' > ' + level2 ELSE '' END
+ CASE WHEN level3 IS NOT NULL THEN ' > ' + level3 ELSE '' END
)as [breadcrumb]
FROM (
SELECT g3.*
,g1.name as level1
,g2.name as level2
,g3.name as level3
FROM #topLevelGroups g1
INNER JOIN #userGroups g2 ON g2.parentid = g1.groupid and g2.groupid <> g1.groupid
INNER JOIN #userGroups g3 ON g3.parentid = g2.groupid
UNION
SELECT g2.*
,g1.name as level1
,g2.name as level2
,NULL as level3
FROM #topLevelGroups g1
INNER JOIN #userGroups g2 ON g2.parentid = g1.groupid and g2.groupid <> g1.groupid
UNION
SELECT g1.*
,g1.name as level1
,NULL as level2
,NULL as level3
FROM #topLevelGroups g1
) a
ORDER BY [breadcrumb]
This is a pretty big hack, and is obviously limited to a certain number of levels (for my app, there is a reasonable limit I can pick), with the problem that the more levels are supported, it increases the number of joins exponentially, and thus is much slower.
Doing it in code is most certainly easier, but for me that is simply not always an option - there are times when I need this available directly from a SQL query.
I'm accepting this as the answer, since it's what I ended up doing and it may work for other people -- however, if someone can come up with a more efficient method I'll change it to them.

no sql server specific code, but are you simply looking for :
SELECT * FROM table WHERE left < (currentid.left) AND right > (currentid.right)

Related

Find all descendants of root node using CTE

I need to show how many descendants the parent node have but in below query it's showing the other way around.
WITH AssemblyList AS
(
SELECT
T.PKAssembly,
T.FKParentAssembly,
T.AssemblyNumber,
0 AS AssemblyLevel
FROM CT_Assemblies T WHERE T.FKParentAssembly = T.PKAssembly AND T.FKCustomer = 12 AND T.FKState = 1
UNION ALL
SELECT A.PKAssembly,
A.FKParentAssembly,
A.AssemblyNumber,
AL.AssemblyLevel + 1
FROM CT_Assemblies A
INNER JOIN AssemblyList AS AL ON A.FKParentAssembly = AL.PKAssembly
WHERE A.FKParentAssembly != A.PKAssembly AND A.FKCustomer = 12 AND A.FKState = 1
)
SELECT * FROM AssemblyList
In my table structure we have PKAssembly as childId and FKParentAssembly as parentId. If both are same for a record it means its the parent or single root.
The result shows like this precisely:
PKAssembly FKParentAssembly AssemblyNumber AssemblyLevel
500 500 A11111 0
507 506 A77777 6
Here the first record should show 6 and the second one should show 0 but its coming the opposite way.
What should i change in the query to make it work?

Calculating relative frequencies in SQL

I am working on a tag recommendation system that takes metadata strings (e.g. text descriptions) of an object, and splits it into 1-, 2- and 3-grams.
The data for this system is kept in 3 tables:
The "object" table (e.g. what is being described),
The "token" table, filled with all 1-, 2- and 3-grams found (examples below), and
The "mapping" table, which maintains associations between (1) and (2), as well as a frequency count for these occurrences.
I am therefore able to construct a table via a LEFT JOIN, that looks somewhat like this:
SELECT mapping.object_id, mapping.token_id, mapping.freq, token.token_size, token.token
FROM mapping LEFT JOIN
token
ON (mapping.token_id = token.id)
WHERE mapping.object_id = 1;
object_id token_id freq token_size token
+-----------+----------+------+------------+--------------
1 1 1 2 'a big'
1 2 1 1 'a'
1 3 1 1 'big'
1 4 2 3 'a big slice'
1 5 1 1 'slice'
1 6 3 2 'big slice'
Now I'd like to be able to get the relative probability of each term within the context of a single object ID, so that I can sort them by probability, and see which terms are most probably (e.g. ORDER BY rel_prob DESC LIMIT 25)
For each row, I'm envisioning the addition of a column which gives the result of freq/sum of all freqs for that given token_size. In the case of 'a big', for instance, that would be 1/(1+3) = 0.25. For 'a', that's 1/3 = 0.333, etc.
I can't, for the life of me, figure out how to do this. Any help is greatly appreciated!
If I understood your problem, here's the query you need
select
m.object_id, m.token_id, m.freq,
t.token_size, t.token,
cast(m.freq as decimal(29, 10)) / sum(m.freq) over (partition by t.token_size, m.object_id)
from mapping as m
left outer join token on m.token_id = t.id
where m.object_id = 1;
sql fiddle example
hope that helps

Finding contiguous regions in a sorted MS Access query

I am a long time fan of Stack Overflow but I've come across a problem that I haven't found addressed yet and need some expert help.
I have a query that is sorted chronologically with a date-time compound key (unique, never deleted) and several pieces of data. What I want to know is if there is a way to find the start (or end) of a region where a value changes? I.E.
DateTime someVal1 someVal2 someVal3 target
1 3 4 A
1 2 4 A
1 3 4 A
1 2 4 B
1 2 5 B
1 2 5 A
and my query returns rows 1, 4 and 6. It finds the change in col 5 from A to B and then from B back to A? I have tried the find duplicates method and using min and max in the totals property however it gives me the first and last overall instead of the local max and min? Any similar problems?
I didn't see any purpose for the someVal1, someVal2, and someVal3 fields, so I left them out. I used an autonumber as the primary key instead of your date/time field; but this approach should also work with your date/time primary key. This is the data in my version of your table.
pkey_field target
1 A
2 A
3 A
4 B
5 B
6 A
I used a correlated subquery to find the previous pkey_field value for each row.
SELECT
m.pkey_field,
m.target,
(SELECT Max(pkey_field)
FROM YourTable
WHERE pkey_field < m.pkey_field)
AS prev_pkey_field
FROM YourTable AS m;
Then put that in a subquery which I joined to another copy of the base table.
SELECT
sub.pkey_field,
sub.target,
sub.prev_pkey_field,
prev.target AS prev_target
FROM
(SELECT
m.pkey_field,
m.target,
(SELECT Max(pkey_field)
FROM YourTable
WHERE pkey_field < m.pkey_field)
AS prev_pkey_field
FROM YourTable AS m) AS sub
LEFT JOIN YourTable AS prev
ON sub.prev_pkey_field = prev.pkey_field
WHERE
sub.prev_pkey_field Is Null
OR prev.target <> sub.target;
This is the output from that final query.
pkey_field target prev_pkey_field prev_target
1 A
4 B 3 A
6 A 5 B
Here is a first attempt,
SELECT t1.Row, t1.target
FROM t1 WHERE (((t1.target)<>NZ((SELECT TOP 1 t2.target FROM t1 AS t2 WHERE t2.DateTimeId<t1.DateTimeId ORDER BY t2.DateTimeId DESC),"X")));

How to Order By within an Order By in SQL?

I'm trying to create a SQL statement which will recreate the hierarchical container/folder/test structure in SilkCentral Test Manager.
Test containers have no ParentID
Test folders contain a ParentID and IsLeaf = 0
Tests contain a ParentID and IsLeaf = 1
This Query results in all of the test containers, folders, and tests:
SELECT "NodeID", "ParentID", "Name", "IsLeaf", "OrderNumber"
FROM "Silk"."TM_TestPlanNodes" AS TPN
WHERE PROJECTID = 36
ORDER BY "ParentID", "OrderNumber", "IsLeaf"
Here are some of the Results:
NodeID ParentID Name IsLeaf OrderNumber
65408 Installation and Upgrades 0 0
65445 Connectivity 0 1
65448 Focus 0 2
65409 GINA / PLAP 0 3
65446 Graphical User Interface 0 4
71038 Login Properties 0 5
65449 Miscellaneous 0 6
70636 Net Firewall 0 7
70998 Software Updates 0 8
65447 Third-party Services 0 9
70805 SilkTest Automated Tests 0 10
68812 65408 0. Setup 0 0
65454 65408 1. Installations & Upgrades 0 1
65450 65408 Typical/Custom Installation 0 2
I would like this ordering instead:
The ParentID is sorted, but if there exists a Node with the ParentID=thePreviousNode'sID, then that is chosen next. If there are multiple of those nodes, they should be ordered by IsLeaf and then, OrderNumber.
How to accomplish this? I'm very limited in what I can do, because I think very complicated syntax will end up throwing errors in Silk. I was going to try a nested SELECT statement:
SELECT "NodeID", "ParentID", "Name","IsLeaf"
FROM "Silk"."TM_TestPlanNodes"
WHERE PROJECTID = '36'AND ParentID LIKE (
SELECT ParentID
FROM "Silk"."TM_TestPlanNodes"
WHERE NAME = 'Installation and Upgrades')
But this is getting this error: "Could not execute report query: Subquery returned more than 1 value. This is not permitted when the subquery follows =, !=, <, <= , >, >= or when the subquery is used as an expression."
This is why I'm fiddling with Order By.
You can use a recursive cte to create a hidden column and orderby that column. The hidden column should be Something like:
WITH cte (NodeID, ParentID, Name, IsLeaf, [Order])
AS
(
SELECT NodeID, ParentID, Name, IsLeaf, cast(NodeID as nvarchar(10))
FROM "Silk"."TM_TestPlanNodes"
WHERE PROJECTID = '36'
UNION ALL
SELECT "NodeID", "ParentID", "Name","IsLeaf", cast(leftNode.ParentID as nvarchar(10)) + cast(leftNode.NodeID as nvarchar(10))
FROM "Silk"."TM_TestPlanNodes" as leftNode
INNER JOIN cte on cte.NodeID = leftNode.ParentID
WHERE leftNode.ParentID = cte.NodeID
)
select "NodeID", "ParentID", "Name","IsLeaf" from cte
order by cast([Order] as nvarchar(50))
This was written in notepad so is possible to have some errors, but the idea is to make an [order] column that for example for 65530 would be 654086554569530 (the parent_parent, the parent and the node)
EDIT:
this only works if the ids are all 5 characters long, but from here you can make the proper tweaks.
Although it might not be a PERFECT fit, it is very close with a nested hierarchical representation of parent-child records in a self-joined list and incorporated proper ordering concerns. You may have to tweak it a bit for your table, but here's a link to a prior solution
To clarify that problem with the menu and the corresponding data.
id | parentid | name
1 | 0 | CatOne
2 | 0 | CatTwo
3 | 0 | CatThree
4 | 1 | SubCatOne
5 | 1 | SubCatOne2
6 | 3 | SubCatThree
Desired output
CatOne 1
--SubCatOne 4
--SubCatOne2 5
CatTwo 2
CatThree 3
--SubCatThree 6
The FIRST case is pre-grouping all the like ID's based on the parent... So, when the parent ID is 0, it IS the top-most level, so we keep it's ID. Then, any children under it, we want their respective PARENT IDs so all of the same are correctly pre-grouped.
The purpose of the SECOND group by is to force the entry that represents the actual TOP LEVEL menu item to the top of the list regardless of the child entries.
Say you have a table where IDs are already established, and you now add a new item into position ID = 7 for "New Top Level" and want to move ID #s 2 and 3 into the new "top-level section. If you just to the query with the first CASE, your records would be simulated returned as
ID Parent Name (natural order from the table)
2 7 CatTwo
3 7 CatThree
7 0 New Category. (we want THIS one in FIRST POSITION of the group)
As you can see, this would be a bad representation of the sub-grouping order. The top-level item actually is in the 3rd position... To bring it to the front, we are now sub-grouping and saying... if the Parent ID of the record = 0, then sort it as if it were a '1' priority. Anything else is considered a '2' priority and would simulate the result like
ID Parent Name SubPrioritySort
7 0 New Category. 1
2 7 CatTwo 2
3 7 CatThree 2
Since you are not actually returning these "CASE" values in your result query, you wouldn't otherwise visually see it... but for grins, add them as columns to your query to see the impact. Hopefully this clarified the answer for you.
In your question, you would obviously be able to add your sort order column to the basis of this query.

How to fetch categories and sub-categories in a single query in sql? (mysql)

I would like to know if it's possible to extract the categories and sub-categories in a single DB fetch.
My DB table is something similar to that shown below
table
cat_id parent_id
1 0
2 1
3 2
4 3
5 3
6 1
i.e. when the input is 3, then all the rows with parent_id as 3 AND the row 3 itself AND all the parents of row 3 should be fetched.
output
cat_id parent_id
3 2 -> The row 3 itself
4 3 -> Row with parent as 3
5 3 -> Row with parent as 3
2 1 -> 2 is the parent of row 3
1 0 -> 1 is the parent of row 2
Can this be done using stored procedures and loops? If so, will it be a single DB fetch or multiple? Or are there any other better methods?
Thanks!!!
If you asking about "Is there in mysql recursive queries?" answer "NO".
But there is very good approach to handle it.
Create helper table (saying CatHierarchy)
CatHierarchy:
SuperId, ChildId, Distance
------------------------------
1 1 0
1 2 1
2 2 0
This redundant data allows easily in 1 query to select any hierarchy, and in 2 insert support any hierarchy (deletion also performed in 1 query with help of delete cascade integrity).
So what does this mean. You track all path in hierarchy. Each node of Cat must add reference to itself (distance 0), then support duplication by adding redundant data about nodes are linked.
To select category with sub just write:
SELECT c.* from Category c inner join CatHierarchy ch ON ch.ChildId=c.cat_id
WHERE ch.SuperId = :someSpecifiedRootOfCat
someSpecifiedRootOfCat - is parameter to specify root of category
THATS ALL!
Theres a really good article about this on Sitepoint - look especially at Modified Preorder Tree Traversal
It's tricky. I assume you want to display categories, kind of like a folder view? Three fields: MainID, ParentID, Name... Apply to your table, and it should work like a charm. I think it's called a recursive query?
WITH CATEGORYVIEW (catid, parentid, categoryname) AS
(
SELECT catid, ParentID, cast(categoryname as varchar(255))
FROM [CATEGORIES]
WHERE isnull(ParentID,0) = 0
UNION ALL
SELECT C.catid, C.ParentID, cast(CATEGORYVIEW.categoryname+'/'+C.categoryname as varchar(255))
FROM [CATEGORIES] C
JOIN CATEGORYVIEW ON CATEGORYVIEW.catID = C.ParentID
)
SELECT * FROM CATEGORYVIEW ORDER BY CATEGORYNAME