SQL Recursive Tables - sql

I have the following tables, the groups table which contains hierarchically ordered groups and group_member which stores which groups a user belongs to.
groups
---------
id
parent_id
name
group_member
---------
id
group_id
user_id
ID PARENT_ID NAME
---------------------------
1 NULL Cerebra
2 1 CATS
3 2 CATS 2.0
4 1 Cerepedia
5 4 Cerepedia 2.0
6 1 CMS
ID GROUP_ID USER_ID
---------------------------
1 1 3
2 1 4
3 1 5
4 2 7
5 2 6
6 4 6
7 5 12
8 4 9
9 1 10
I want to retrieve the visible groups for a given user. That it is to say groups a user belongs to and children of these groups. For example, with the above data:
USER VISIBLE_GROUPS
9 4, 5
3 1,2,4,5,6
12 5
I am getting these values using recursion and several database queries. But I would like to know if it is possible to do this with a single SQL query to improve my app performance. I am using MySQL.

Two things come to mind:
1 - You can repeatedly outer-join the table to itself to recursively walk up your tree, as in:
SELECT *
FROM
MY_GROUPS MG1
,MY_GROUPS MG2
,MY_GROUPS MG3
,MY_GROUPS MG4
,MY_GROUPS MG5
,MY_GROUP_MEMBERS MGM
WHERE MG1.PARENT_ID = MG2.UNIQID (+)
AND MG1.UNIQID = MGM.GROUP_ID (+)
AND MG2.PARENT_ID = MG3.UNIQID (+)
AND MG3.PARENT_ID = MG4.UNIQID (+)
AND MG4.PARENT_ID = MG5.UNIQID (+)
AND MGM.USER_ID = 9
That's gonna give you results like this:
UNIQID PARENT_ID NAME UNIQID_1 PARENT_ID_1 NAME_1 UNIQID_2 PARENT_ID_2 NAME_2 UNIQID_3 PARENT_ID_3 NAME_3 UNIQID_4 PARENT_ID_4 NAME_4 UNIQID_5 GROUP_ID USER_ID
4 2 Cerepedia 2 1 CATS 1 null Cerebra null null null null null null 8 4 9
The limit here is that you must add a new join for each "level" you want to walk up the tree. If your tree has less than, say, 20 levels, then you could probably get away with it by creating a view that showed 20 levels from every user.
2 - The only other approach that I know of is to create a recursive database function, and call that from code. You'll still have some lookup overhead that way (i.e., your # of queries will still be equal to the # of levels you are walking on the tree), but overall it should be faster since it's all taking place within the database.
I'm not sure about MySql, but in Oracle, such a function would be similar to this one (you'll have to change the table and field names; I'm just copying something I did in the past):
CREATE OR REPLACE FUNCTION GoUpLevel(WO_ID INTEGER, UPLEVEL INTEGER) RETURN INTEGER
IS
BEGIN
DECLARE
iResult INTEGER;
iParent INTEGER;
BEGIN
IF UPLEVEL <= 0 THEN
iResult := WO_ID;
ELSE
SELECT PARENT_ID
INTO iParent
FROM WOTREE
WHERE ID = WO_ID;
iResult := GoUpLevel(iParent,UPLEVEL-1); --recursive
END;
RETURN iResult;
EXCEPTION WHEN NO_DATA_FOUND THEN
RETURN NULL;
END;
END GoUpLevel;
/

Joe Cleko's books "SQL for Smarties" and "Trees and Hierarchies in SQL for Smarties" describe methods that avoid recursion entirely, by using nested sets. That complicates the updating, but makes other queries (that would normally need recursion) comparatively straightforward. There are some examples in this article written by Joe back in 1996.

I don't think that this can be accomplished without using recursion. You can accomplish it with with a single stored procedure using mySQL, but recursion is not allowed in stored procedures by default. This article has information about how to enable recursion. I'm not certain about how much impact this would have on performance verses the multiple query approach. mySQL may do some optimization of stored procedures, but otherwise I would expect the performance to be similar.

Didn't know if you had a Users table, so I get the list via the User_ID's stored in the Group_Member table...
SELECT GroupUsers.User_ID,
(
SELECT
STUFF((SELECT ',' +
Cast(Group_ID As Varchar(10))
FROM Group_Member Member (nolock)
WHERE Member.User_ID=GroupUsers.User_ID
FOR XML PATH('')),1,1,'')
) As Groups
FROM (SELECT User_ID FROM Group_Member GROUP BY User_ID) GroupUsers
That returns:
User_ID Groups
3 1
4 1
5 1
6 2,4
7 2
9 4
10 1
12 5
Which seems right according to the data in your table. But doesn't match up with your expected value list (e.g. User 9 is only in one group in your table data but you show it in the results as belonging to two)
EDIT: Dang. Just noticed that you're using MySQL. My solution was for SQL Server. Sorry.
-- Kevin Fairchild

There was already similar question raised.
Here is my answer (a bit edited):
I am not sure I understand correctly your question, but this could work My take on trees in SQL.
Linked post described method of storing tree in database -- PostgreSQL in that case -- but the method is clear enough, so it can be adopted easily for any database.
With this method you can easy update all the nodes depend on modified node K with about N simple SELECTs queries where N is distance of K from root node.
Good Luck!

I don't remember which SO question I found the link under, but this article on sitepoint.com (second page) shows another way of storing hierarchical trees in a table that makes it easy to find all child nodes, or the path to the top, things like that. Good explanation with example code.
PS. Newish to StackOverflow, is the above ok as an answer, or should it really have been a comment on the question since it's just a pointer to a different solution (not exactly answering the question itself)?

There's no way to do this in the SQL standard, but you can usually find vendor-specific extensions, e.g., CONNECT BY in Oracle.
UPDATE: As the comments point out, this was added in SQL 99.

Related

Is there a more efficient way of selecting the data than multiple intersects SQL

I have data in my PostgreSQL database in the format below
answer_id question_id country_id answer
1 1 1 7
2 1 2 7
3 1 3 5
4 2 1 3
5 2 2 2
6 2 3 2
What I am trying to do is get all countries which have a certain answer for a certain country, and we can have multiple question~answer combination.
For example I can need all countries which for question 1 have 7 for an answer (2 values), but then, along with the first condition) I also add that answer for question 2 is 2 and now it drops from 2 values (countries under ids 1 and 2) to only 1 (country id 2).
Now I have managed to do it with intersect as it follows...
select country_id from answer_table where question_id = 1 and answer = 7
intersect
select country_id from answer_table where question_id = 2 and answer = 2
Problem is that I need to be able to do this dynamically, meaning that one time I may select only 1 question~answer pair, but other times I may want more (3, 5, 7 or whatever) which affects the number of selects (and in turn intersects).
I mean this above works and I do have a capability to use a query builder so it really isn't a big deal to generate, but I don't believe that it is the most efficient nor the smartest way.
Therefore, my question is basically is there a more efficient or smarter way of doing these selects/intersects dynamically (like function which takes arrays of data or whatever?)?
Thank You and have a good one!
p.s. I found this stack thread, but there they use fixed 5 queries at all times.
I don't know if it is more efficient, but you might find it more generalizable:
select country_id from answer_table
where (question_id, answer) in ( (1, 7), (2, 2) )
group by country_id
having count(distinct (question_id, answer) ) = 2;
You can actually replace the in list and "2" with array functions to pass in array values.

Get list of dependent objects via SQL query or function

I have two tables. One is for Task and second is dependency table for the tasks.
I want a query to give me all the tasks (recursively) based on a particular id.
I have two tables. One is for Task
ID TASK
1 Abc
2 Def
3 Ghi
4 Jkl
5 Mno
6 Pqr
The second one is for getting dependent tasks
ID DEPENDENT_ON
2 1
3 1
4 2
4 6
5 2
6 5
Is it possible to write a sql query to get a list of all the tasks (recursive) which are dependent on a particular task.
Example.
I want to check all tasks dependent on ID=1.
Expected output (which is 2 and 3):
2.Def
3.Ghi
Furthermore query should also give output of these two dependent tasks and so on.
Final output should be:
2.Def -- level one
3.Ghi -- level one
4.Jkl -- Dependent on task 2
5.Mno -- Dependent on task 2
6.Pqr -- Dependent on task 5
Formatting is not important. Just output is required
I need to join two tables and then do a recursive search.
You must OUTER JOIN the second table (which you didn't name, so I have called it TASK_TREE) through DEPENDENT_ON to the parent ID. Outer join because task 1 is the top of the tree and depends on no task. Then use Oracle's hierarchical query syntax to walk the tree:
select t.id, t.task, tt.dependent_on, level
from tasks t
left outer join task_tree tt on tt.id = t.id
connect by prior t.id = tt.dependent_on
start with t.id = 1
/
I have included the level so you can see how the tree unfurls. The Oracle SQL documentation covers hierarchical queries in depth. Find out more. If you don't want to use Oracle's proprietary hierarchical syntax, from 11gR2 Oracle supported recursive WITH clause. Find out more.
Incidentally, your posted data contains a error. Task 4 depends on both 2 and 6. Hierarchies must have child nodes which depend on a single parent node. Otherwise you'll get all sorts of weird results.

How does order by clause works if two values are equal?

This is my NEWSPAPER table.
National News A 1
Sports D 1
Editorials A 12
Business E 1
Weather C 2
Television B 7
Births F 7
Classified F 8
Modern Life B 1
Comics C 4
Movies B 4
Bridge B 2
Obituaries F 6
Doctor Is In F 6
When i run this query
select feature,section,page from NEWSPAPER
where section = 'F'
order by page;
It gives this output
Doctor Is In F 6
Obituaries F 6
Births F 7
Classified F 8
But in Kevin Loney's Oracle 10g Complete Reference the output is like this
Obituaries F 6
Doctor Is In F 6
Births F 7
Classified F 8
Please help me understand how is it happening?
If you need reliable, reproducible ordering to occur when two values in your ORDER BY clause's first column are the same, you should always provide another, secondary column to also order on. While you might be able to assume that they will sort themselves based on order entered (almost always the case to my knowledge, but be aware that the SQL standard does not specify any form of default ordering) or index, you never should (unless it is specifically documented as such for the engine you are using--and even then I'd personally never rely on that).
Your query, if you wanted alphabetical sorting by feature within each page, should be:
SELECT feature,section,page FROM NEWSPAPER
WHERE section = 'F'
ORDER BY page, feature;
In relational databases, tables are sets and are unordered. The order by clause is used primarily for output purposes (and a few other cases such as a subquery containing rownum).
This is a good place to start. The SQL standard does not specify what has to happen when the keys on an order by are the same. And this is for good reason. Different techniques can be used for sorting. Some might be stable (preserving original order). Some methods might not be.
Focus on whether the same rows are in the sets, not their ordering. By the way, I would consider this an unfortunate example. The book should not have ambiguous sorts in its examples.
When you use the SELECT statement to query data from a table, the order which rows appear in the result set may not be what you expected.
In some cases, the rows that appear in the result set are in the order that they are stored in the table physically. However, in case the query optimizer uses an index to process the query, the rows will appear as they are stored in the index key order. For this reason, the order of rows in the result set is undetermined or unpredictable.
The query optimizer is a built-in software component in the database
system that determines the most efficient way for an SQL statement to
query the requested data.

A simple SQL Select query to crawl all connected people in a social graph?

What is the shortest or fastest SQL select query or SQL procedure to crawl a social graph. Imagine we have this table:
UId FriendId
1 2
2 1
2 4
1 3
5 7
7 5
7 8
5 9
9 7
We have two subset of people here, i'm talking about a sql query or procedure which if we pass:
Uid = 4 return the result set rows with uid : {1, 2, 3}
or if
Uid = 9 return the result set rows with uid : {5, 7, 8}
Sorry for my poor english.
So you want get all friends of someone, including n-th degree friends? I don't think it is possible without recursion.
How you can do that is explained here:
https://inviqa.com/blog/graphs-database-sql-meets-social-network
If you are storing your values in an adjacency list, the easiest way I've found to crawl it is to translate it into a graphing language and query that. For example, if you were working in PHP, you could use the Image_GraphViz package. Or, if you want to use AJAX, you might consider cytoscapeweb. Both work well.
In either case, you'd SELECT * FROM mytable and feed all the records into the graph package as nodes. This means outputting them in dot or GraphML (or other graphing language). Then you can easily query them.
If you don't wish to translate the dataset, consider storing it as nested sets. Nested sets, though a bit of a pain to maintain, are much better than adjacency lists for the kind of queries you are looking to do.
If you are storing your values in an adjacency list, and you want n-th degree you can simply recursively INNER JOIN the UID's. For example:
Select t1.uid, t2.uid, t3.uid FROM t1 INNER JOIN t2 ON t1.uid=t2.uid INNER JOIN t3 ON t2.uid=t3.uid
This query is like a DFS with a fixed depth.

SQL statement to switch values

I have several tables where a field is for priority (1 to 5). Problem here is that different projects have been using 5 as highest and some 1 for highest and I going to harmonize this.
My easy option is to create a temp table and copy the data over and switch as this table:
1 -> 5
2 -> 4
3 -> 3
4 -> 2
5 -> 1
I'm not that good with SQL but it feels that there should be an easy way to switch those values right off with an statement but I do have concerns of when there are huge amount of data and if something goes wrong half way then the data will be in a mess.
Should I just go with my temp table solution or should do you have a nice way of doing this straight in SQL? (Oracle 10g is being used)
Many thanks!
simply update the second table like this, a temp table is not needed because you are just reversing the priority:
update table_2
set priority = 6-priority;
You can use a CASE statement
case PRIORITY
when 5 then 1
when 4 then 2
when 3 then 3
when 2 then 4
when 1 then 5
else PRIORITY
end
Edit: texBlues' solution is much better, but I leave this here for cases where the maths isn't as neat.
To be sure that no 'mess' results if the update goes awry, use a transaction. Building on tekBlues solution (+1 for this).
START TRANSACTION;
update table_2
set priority = 6-priority;
...
COMMIT;
This is especially valid if you want to update multiple tables in one go. Single statements are implicitely handled, as hainstech pointed out in his comment correctly.