PostgreSQL check parent of parent in query - sql

Okay, I'll try to be as precise as possible. I have a database with comments who have a parent_comment_id. I want to select a number of comments from this table and a matching number of answers to these comments. Here comes my problem: If I arbitrarily select comments, I might grab some who don't have any answers. However, if I select comment answers first, I might grab answers who are an answer to another comment already. So, what I want to do is grab answers who are immediate answers to a comment (parent_comment_id = NULL) and then get those comments base on the parent_comment_id. How would I go about this in a query?
Assume this layout of a database
comment_id parent_comment_id
1 NULL
2 NULL
3 1
4 3
5 1
6 3
7 1
8 4
9 NULL
10 NULL
...
Now, I would select, let's say 2 answers and the corresponding comments. If I just select the first 2 answers, I would get comment 3 and 4, but if I backtrace those, I would only get comment 1, because comment 4 is an answer to an answer. Instead, I would want to find only comments whose parent comment has no parent comment, which would be comment 3 and 5 in this example.
I haven't really tried anything beyound realizing it doesn't work without somehow recursively getting parents which I have no clue how to do in SQL queries.

I solved it with a pretty easy solution. Might be horrible practice, but works for my use-case ^^
For the given example it would be something like this:
select distinct on (parent_comment_id) c.id, c.parent_comment_id, c.text
from comments c
where parent_comment_id is not null
and (select parent_comment_id from comments p where id = c.parent_comment_id) is null

I would want to find only comments whose parent comment has no parent comment.
I think that you just want a self-join and some filtering:
select c.*
from comments c
inner join comments pc on pc.comment_id = c.parent_comment_id
where pc.parent_comment_id is null
You could also use exists:
select c.*
from comments c
where exists (
select 1
from comments pc
where pc.comment_id = c.parent_comment_id and pc.parent_comment_id is null
)

Related

Nested SELECT vs JOIN performance [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 months ago.
Improve this question
I have the following two tables in PostgreSQL database (simplified for the sake of example):
article
id
summary
1
Article 1
2
Article 2
3
Article 3
...
...
event
id
article_id
eventtype_id
comment
108
1
4
Comment 1
109
2
8
Comment 2
110
3
4
Comment 3
...
...
I would like to select only 1 event with eventtype_id=4 for each article. The result should look like this:
article_id
article_summary
event_comment
1
Article 1
Comment 1
2
Article 2
3
Article 3
Comment 3
...
Which of these 2 queries (Query 1 or Query 2) runs faster? Do they return the same result?
Query1:
SELECT
a.id AS article_id,
a.summary AS article_summary,
evnt.comment AS event_comment
FROM
article a
LEFT JOIN
event evnt ON evnt.article_id = a.id AND evnt.eventtype_id = 4;
Query2:
SELECT
a.id AS article_id,
a.summary AS article_summary,
(
SELECT
evnt.comment
FROM
event evnt
WHERE
evnt.article_id = a.id AND
evnt.eventtype_id = 4
LIMIT 1
) AS event_comment
FROM
article a;
Apart from the fact that queries are not the same, as you said in the comments, generally speaking, correlated queries are not considered suitable from a performance point of view. That's because these queries are applied row by row. They are usually helpful in some particular situations: read this. However, even in those situations, it is a good practice to use them in an exists clause if possible, So that whenever it finds a row for the query it returns true.

SQL Server 2014 - Query to skip rows

I have a table that has the following fields (Institution, Auditor, QuestionID, Comment). There are 5 different questions (1, 2, 3, 4, 5) and an Auditor will have made a comment/response on each question.
I need to return only those rows where an auditor has made a comment on at least one question. If no comment has been made on any question then that row needs to be skipped. So for example if Auditor 1 has made a comment on Q1 for Institution 1 then we need to see all 5 rows for Auditor 1 for that institution. If another Auditor has not made any comments yet on any questions for an institution that Auditors records for that institution needs to be skipped.
In the image above we should return the following data and skip the one shaded green because there are no comments entered for Institution B by Auditor A2 for any question
I am confused how I can do this. Maybe keep a count? Any help will be highly appreciated. Thanks.
This sounds like exists to me. It is a little hard to follow your description of the data. But based on your example, I think:
select t.*
from t
where exists (select 1
from t t2
where t2.auditor = t.auditor and
t2.institution = t.institution and
t2.comment is not null
);

Without using conjunctions in conditions of selection operators

Let's say there is a table call ITEM and it contains 3 attributes(name, id, price):
name id price
Apple 1 3
Orange 1 3
Banana 2 4
Cherry 3 5
Mango 1 3
How should I write a query to use a constants selection operator to select those item that have same prices and same ids ? The first thing come into my mind is use a rename operator to rename id to id', and price to price', then union it with the ITEM table, but since I need to select 2 tuples (price=price' & id=id') from the table, how can I select them without using the conjunctions operator in relational algebra ?
Thank you.
I'm not quite sure but for me, it would be something like this in relational calculus:
and then in SQL:
SELECT name FROM ITEM i WHERE
EXISTS ITEM u
AND u.name != i.name
AND u.price=i.price
AND u.id = i.id
But still, I think your assumption is right, you can still do it by renaming. I do believe it is a bit longer than what I did above.

Oracle SQL How to find missing value for different users

I have 2 tables
One table with questions
ID Description
== ===========
1 Some Question
2 Some Question
3 Some Question
4 Some Question
And an other one with the awsers to each question of every users
ID_USER ID_QUESTION ANSWER
======= =========== =========
1 2 a
1 1 b
1 3 d
2 1 e
2 4 a
3 4 c
3 2 a
As you can see it is possible that a user does not answer a question and this is my problem
I am currently trying to find wich answer a user did not answer to.
I'd like to have something like this
ID_USER ID_MISSING_QUESTION
======= ===================
1 4
2 3
2 2
3 1
3 3
I can easly find the missing questions for a single user but i can't do that for every user since they are quite numerous.
Thanks Ayoye
Quick and Dirty:
SELECT TB_USER.ID, TB_QUESTION.ID AS "Q_ID" FROM TB_USER, TB_QUESTION
minus
SELECT ID_USER, ID_QUESTION FROM tb_answer
Sql Fiddle Demo here.
I think You are looking for something like this:
SELECT
u.id_user,
q.id_question
FROM
questions q
CROSS JOIN users u
LEFT JOIN answers a ON (a.id_question = q.id_question and a.id_user = u.id_user)
WHERE
a.answer IS NULL
First You shoyul create set of every question for every user, and then try to join in with Your answers. And then filer out all results that have found answers. :)
You should post the SQL statement(s) you tried, before expecting a full answer, else someone might think you want let others write all the code for you...
Nevertheless, instead of plain JOIN, use a FULL OUTER JOIN and a LEFT OUTER JOIN resp. RIGHT OUTER JOIN, depending on table ordering in your SQL statement (which you did not post yet), and filter with IS NULL.

SQL Recursive Tables

I have the following tables, the groups table which contains hierarchically ordered groups and group_member which stores which groups a user belongs to.
groups
---------
id
parent_id
name
group_member
---------
id
group_id
user_id
ID PARENT_ID NAME
---------------------------
1 NULL Cerebra
2 1 CATS
3 2 CATS 2.0
4 1 Cerepedia
5 4 Cerepedia 2.0
6 1 CMS
ID GROUP_ID USER_ID
---------------------------
1 1 3
2 1 4
3 1 5
4 2 7
5 2 6
6 4 6
7 5 12
8 4 9
9 1 10
I want to retrieve the visible groups for a given user. That it is to say groups a user belongs to and children of these groups. For example, with the above data:
USER VISIBLE_GROUPS
9 4, 5
3 1,2,4,5,6
12 5
I am getting these values using recursion and several database queries. But I would like to know if it is possible to do this with a single SQL query to improve my app performance. I am using MySQL.
Two things come to mind:
1 - You can repeatedly outer-join the table to itself to recursively walk up your tree, as in:
SELECT *
FROM
MY_GROUPS MG1
,MY_GROUPS MG2
,MY_GROUPS MG3
,MY_GROUPS MG4
,MY_GROUPS MG5
,MY_GROUP_MEMBERS MGM
WHERE MG1.PARENT_ID = MG2.UNIQID (+)
AND MG1.UNIQID = MGM.GROUP_ID (+)
AND MG2.PARENT_ID = MG3.UNIQID (+)
AND MG3.PARENT_ID = MG4.UNIQID (+)
AND MG4.PARENT_ID = MG5.UNIQID (+)
AND MGM.USER_ID = 9
That's gonna give you results like this:
UNIQID PARENT_ID NAME UNIQID_1 PARENT_ID_1 NAME_1 UNIQID_2 PARENT_ID_2 NAME_2 UNIQID_3 PARENT_ID_3 NAME_3 UNIQID_4 PARENT_ID_4 NAME_4 UNIQID_5 GROUP_ID USER_ID
4 2 Cerepedia 2 1 CATS 1 null Cerebra null null null null null null 8 4 9
The limit here is that you must add a new join for each "level" you want to walk up the tree. If your tree has less than, say, 20 levels, then you could probably get away with it by creating a view that showed 20 levels from every user.
2 - The only other approach that I know of is to create a recursive database function, and call that from code. You'll still have some lookup overhead that way (i.e., your # of queries will still be equal to the # of levels you are walking on the tree), but overall it should be faster since it's all taking place within the database.
I'm not sure about MySql, but in Oracle, such a function would be similar to this one (you'll have to change the table and field names; I'm just copying something I did in the past):
CREATE OR REPLACE FUNCTION GoUpLevel(WO_ID INTEGER, UPLEVEL INTEGER) RETURN INTEGER
IS
BEGIN
DECLARE
iResult INTEGER;
iParent INTEGER;
BEGIN
IF UPLEVEL <= 0 THEN
iResult := WO_ID;
ELSE
SELECT PARENT_ID
INTO iParent
FROM WOTREE
WHERE ID = WO_ID;
iResult := GoUpLevel(iParent,UPLEVEL-1); --recursive
END;
RETURN iResult;
EXCEPTION WHEN NO_DATA_FOUND THEN
RETURN NULL;
END;
END GoUpLevel;
/
Joe Cleko's books "SQL for Smarties" and "Trees and Hierarchies in SQL for Smarties" describe methods that avoid recursion entirely, by using nested sets. That complicates the updating, but makes other queries (that would normally need recursion) comparatively straightforward. There are some examples in this article written by Joe back in 1996.
I don't think that this can be accomplished without using recursion. You can accomplish it with with a single stored procedure using mySQL, but recursion is not allowed in stored procedures by default. This article has information about how to enable recursion. I'm not certain about how much impact this would have on performance verses the multiple query approach. mySQL may do some optimization of stored procedures, but otherwise I would expect the performance to be similar.
Didn't know if you had a Users table, so I get the list via the User_ID's stored in the Group_Member table...
SELECT GroupUsers.User_ID,
(
SELECT
STUFF((SELECT ',' +
Cast(Group_ID As Varchar(10))
FROM Group_Member Member (nolock)
WHERE Member.User_ID=GroupUsers.User_ID
FOR XML PATH('')),1,1,'')
) As Groups
FROM (SELECT User_ID FROM Group_Member GROUP BY User_ID) GroupUsers
That returns:
User_ID Groups
3 1
4 1
5 1
6 2,4
7 2
9 4
10 1
12 5
Which seems right according to the data in your table. But doesn't match up with your expected value list (e.g. User 9 is only in one group in your table data but you show it in the results as belonging to two)
EDIT: Dang. Just noticed that you're using MySQL. My solution was for SQL Server. Sorry.
-- Kevin Fairchild
There was already similar question raised.
Here is my answer (a bit edited):
I am not sure I understand correctly your question, but this could work My take on trees in SQL.
Linked post described method of storing tree in database -- PostgreSQL in that case -- but the method is clear enough, so it can be adopted easily for any database.
With this method you can easy update all the nodes depend on modified node K with about N simple SELECTs queries where N is distance of K from root node.
Good Luck!
I don't remember which SO question I found the link under, but this article on sitepoint.com (second page) shows another way of storing hierarchical trees in a table that makes it easy to find all child nodes, or the path to the top, things like that. Good explanation with example code.
PS. Newish to StackOverflow, is the above ok as an answer, or should it really have been a comment on the question since it's just a pointer to a different solution (not exactly answering the question itself)?
There's no way to do this in the SQL standard, but you can usually find vendor-specific extensions, e.g., CONNECT BY in Oracle.
UPDATE: As the comments point out, this was added in SQL 99.