How do I get data on a hierarchical structure? - sql

In my PostgreSQL database I have a table called answers. This table stores information about how users respond to certain questions. Also, I have organizations table which stores information about the hierarchical relationship between organizations.
PostgreSQL version: PostgreSQL 11.4 (on Debian)
The answers table has such a structure:
| employee | tree_organization_id | question_id | question_text | option_id | option_text |
|----------|----------------------|-------------|-------------------------------|-----------|-------------|
| Alex | \1 | 1 | What is your favourite color? | 1 | Red |
| Mark | \1\2\3 | 1 | What is your favourite color? | 3 | Brown |
| Lily | \1\2\4 | 1 | What is your favourite color? | 2 | Yellow |
| Grace | \1\2\4 | 1 | What is your favourite color? | 1 | Red |
| Evie | \5 | 1 | What is your favourite color? | 1 | Red |
| Bob | \5\6 | 1 | What is your favourite color? | 2 | Yellow |
| Mark | \5\7 | 1 | What is your favourite color? | 3 | Brown |
The organizations table has such:
| organization_id | organization_name | parent_organization_id | tree_organization_id | organization_rang |
|-----------------|-------------------|------------------------|----------------------|-------------------|
| 1 | Alphabet | | \1 | 1 |
| 2 | Google | 1 | \1\2 | 2 |
| 3 | Calico | 1 | \1\3 | 2 |
| 4 | Youtube | 2 | \1\2\4 | 3 |
| 5 | Nest | 2 | \1\2\5 | 3 |
| 6 | Facebook | | \6 | 1 |
| 7 | Whatsapp | 5 | \6\7 | 2 |
| 8 | Instagram | 5 | \6\8 | 2 |
Let's say as input I have specific organization_id value. For example, it can be 4 (Youtube). I need to show the number of people who answered the question in this organization and its parents.
In other words, I'm trying to get a similar result:
| organization_id | organization_name | tree_organization_id | total |
|-----------------|-------------------|----------------------|-----------------|
| 1 | Alphabet | \1 | 3 | <- Alex, Lily, Grace
| 2 | Google | \1\2 | 2 | <- Lily, Grace
| 4 | Youtube | \1\2\4 | 2 | <- Lily, Grace
I tried such code but it incorrectly calculates parent's organizations.
select
organizations.organization_id,
organizations.organization_name,
organizations.tree_organization_id,
calculation.total
from
organizations
join lateral (
select
count(*) as total
from
answers
where
tree_organization_id like concat('%', '\', organizations.organization_id, '%')
and
question_id = 1
) calculation on 1 = 1
where
organization_id in (4);
Also, I used such a code. I found the parents of the organization, but how do I calculate the values in the total column correctly?
with recursive organizations_hierarchy as (
select
organizations.organization_id,
organizations.organization_name,
organizations.parent_organization_id,
organizations.tree_organization_id,
organizations.organization_rang
from
organizations
where
organizations.organization_id in (4)
union all
select
a.organization_id,
a.organization_name,
a.parent_organization_id,
a.tree_organization_id,
a.organization_rang
from
organizations a
inner join
organizations_hierarchy b
on
a.organization_id = b.parent_organization_id
)
select
organizations_hierarchy.organization_id,
organizations_hierarchy.organization_name,
organizations_hierarchy.tree_organization_id
from
organizations_hierarchy
order by
organizations_hierarchy.organization_rang;

Finally, I solve my problem. Below is the working code that I use:
select
x.organization_id,
x.organization_name,
x.tree_organization_id,
calculation.total
from (
with recursive organizations_hierarchy as (
select
organizations.organization_id,
organizations.organization_name,
organizations.organization_rang,
organizations.parent_organization_id,
organizations.tree_organization_id
from
organizations
where
organizations.orgnization_id in(4)
union
select
a.organization_id,
a.organization_name,
a.organization_rang,
a.parent_organization_id,
a.tree_organization_id
from
organizations a
inner join
organizations_hierarchy b
on
a.organization_id = b.parent_organization_id
)
select
a.*,
(
select
array_agg(b.tree_organization_id order by b.organization_rang)
from
organizations_hierarchy b
where
B.tree_organization_id like CONCAT('%', '\', a.organization_id, '%')
) as hierarchy
from
organizations_hierarchy a
order by
a.organization_rang,
a.organization_id
) x
join lateral (
select
count(*) as total
from
answers
where
tree_organization_id = any(x.hierarchy)
and
question_id = 1
) calculation on 1 = 1;

Related

Returning grouped junction table results in a JSON-formatted string

Say I have a table of people
+----+------+
| id | name |
+----+------+
| 1 | John |
| 2 | Mary |
| 3 | Jane |
+----+------+
And various tables for clothing of various types, e.g. a table of shoes
+----+----------+--------------------+---------+
| id | brand | name | type |
+----+----------+--------------------+---------+
| 1 | Converse | High tops | sneaker |
| 2 | Clarks | Tilden cap Oxfords | dress |
| 3 | Nike | Air Zoom | running |
+----+----------+--------------------+---------+
And then I have a junction table, where I’m storing all the clothing that each person has:
+--------+--------+-------+-------+
| person | shirts | pants | shoes |
+--------+--------+-------+-------+
| 1 | 3 | | |
| 1 | 4 | | |
| 1 | | 3 | |
| 1 | | | 5 |
| 2 | | 2 | |
| 2 | | | 2 |
| 2 | 3 | | |
...
I need a query that compiles this junction table into a return like so:
+----+------+--------------------+
| id | name | clothing items |
+----+------+--------------------+
| 1 | John | [JSON in a string] |
| 2 | Mary | [JSON in a string] |
| 3 | Jane | [JSON in a string] |
+----+------+--------------------+
Where the [JSON in a string] for each row should look like this:
{
"shirts":[3,4],
"pants":[3],
"shoes":[5]
}
How do I go about constructing this query in SQLITE?
Use SQLite's JSON Functions to aggregate in the junction table and do a LEFT join of people to that resultset:
WITH cte AS (
SELECT person,
json_object(
'shirts', json('[' || GROUP_CONCAT(shirts) || ']'),
'pants', json('[' || GROUP_CONCAT(pants) || ']'),
'shoes', json('[' || GROUP_CONCAT(shoes) || ']')
) clothing_items
FROM junction
GROUP BY person
)
SELECT p.id, p.name, c.clothing_items
FROM people p LEFT JOIN cte c
ON c.person = p.id;
I use GROUP_CONCAT() instead of json_group_array() to remove nulls.
See the demo.

Summarize count of multi table in single SQL query

I have three table with below details:
Table 1: worklog
+-----------+------------+-------------+
| worklogid | technician | description |
+-----------+------------+-------------+
| 1 | john | some text |
+-----------+------------+-------------+
| 2 | jack | some text |
+-----------+------------+-------------+
| 3 | john | some text |
+-----------+------------+-------------+
| 4 | jenifer | some text |
+-----------+------------+-------------+
Table 2: task
+--------+-------+-------------+
| taskid | owner | description |
+--------+-------+-------------+
| 1 | john | some text |
+--------+-------+-------------+
| 2 | john | some text |
+--------+-------+-------------+
| 3 | john | some text |
+--------+-------+-------------+
| 4 | jack | some text |
+--------+-------+-------------+
Table 3: request
+-----------+------------+-----------+-------------+
| requestid | technician | title | description |
+-----------+------------+-----------+-------------+
| 1 | john | some text | ... |
+-----------+------------+-----------+-------------+
| 2 | sara | some text | ... |
+-----------+------------+-----------+-------------+
| 3 | john | some text | ... |
+-----------+------------+-----------+-------------+
| 4 | jack | some text | ... |
+-----------+------------+-----------+-------------+
Now I need to SQL query for this result:
+------------+------------------+---------------+------------------+
| technician | count(worklogid) | count(taskid) | count(requestid) |
+------------+------------------+---------------+------------------+
| john | 2 | 3 | 2 |
+------------+------------------+---------------+------------------+
| jack | 1 | 1 | 1 |
+------------+------------------+---------------+------------------+
| jenifer | 1 | 0 | 0 |
+------------+------------------+---------------+------------------+
| sara | 0 | 0 | 1 |
+------------+------------------+---------------+------------------+
What should I do?
One method is to just use union all and aggregation:
select techician, sum(is_workid), sum(is_taskid), sum(is_requestid)
from ((select technician, 1 as is_workid, 0 as is_taskid, 0 as is_requestid
from worklog
) union all
(select owner, 0, 1, 0
from task
) union all
(select technician, 0, 0, 1
from request
)
) t
group by technician;
In Postgres, you can also aggregate before joining:
select *
from (select technician, count(*) as num_workid
from worklog
group by technician
) w full join
(select owner as technician, count(*) as num_task
from task
group by owner
) t
using (technician) full join
(select technician, count(*) as num_request
from request
group by technician
) w
using (technician);
With a full join, I find that using is simpler than on clauses. But the name needs to be the same in all the tables.

Merge columns on two left joins

I have 3 tables as shown:
Video
+----+--------+-----------+
| id | name | videoSize |
+----+--------+-----------+
| 1 | video1 | 1MB |
| 2 | video2 | 2MB |
| 3 | video3 | 3MB |
+----+--------+-----------+
Survey
+----+---------+-----------+
| id | name | questions |
+----+---------+-----------+
| 1 | survey1 | 1 |
| 2 | survey2 | 2 |
| 3 | survey3 | 3 |
+----+---------+-----------+
Sequence
+----+---------+-----------+----------+
| id | videoId | surveyId | sequence |
+----+---------+-----------+----------+
| 1 | null | 1 | 1 |
| 2 | 2 | null | 2 |
| 3 | null | 3 | 3 |
+----+---------+-----------+----------+
I would like to query Sequence and join on both of video and survey tables and merge common columns without specifying the column names (in this case name) like this:
Query Result:
+----+---------+-----------+----------+---------+-----------+-----------+
| id | videoId | surveyId | sequence | name | videoSize | questions |
+----+---------+-----------+----------+---------+-----------+-----------+
| 1 | null | 1 | 1 | survey1 | null | 1 |
| 2 | 2 | null | 2 | video2 | 2MB | null |
| 3 | null | 3 | 3 | survey3 | null | 3 |
+----+---------+-----------+----------+---------+-----------+-----------+
Is this possible?
BTW the below sql doesn't work as it doesn't merge on the name field:
SELECT * FROM "Sequence"
LEFT JOIN "Survey" ON "Survey"."id" = "Sequence"."surveyId"
LEFT JOIN "Video" ON "Video"."id" = "Sequence"."videoId"
This query will show what you want:
select
s.*,
coalesce(y.name, v.name) as name, -- picks the right column
v.videoSize,
y.questions
from sequence s
left join survey y on y.id = s.surveyId
left join video v on v.id = s.videoId
However, the SQL standard requires you to name the columns you want. The only exception being * as shown above.

Is there an easier way to find the row with a max value?

I have a schema where these two tables exist (among others)
participation
+------+--------+------------------+
| movie| person | role |
+------+--------+------------------+
| 1 | 1 | "Regisseur" |
| 1 | 1 | "Schauspieler" |
| 1 | 2 | "Schauspielerin" |
| 2 | 3 | "Regisseur" |
| 3 | 4 | "Regisseur" |
| 3 | 5 | "Schauspieler" |
| 3 | 6 | "Schauspieler" |
| 4 | 7 | "Schauspielerin" |
| 4 | 8 | "Schauspieler" |
| 5 | 1 | "Schauspieler" |
| 5 | 8 | "Schauspieler" |
| 5 | 14 | "Schauspieler" |
+------+--------+------------------+
movie
+----+------------------------------+------+-----+
| id | title | year | fsk |
+----+------------------------------+------+-----+
| 1 | "Die Bruecke am Fluss" | 1995 | 12 |
| 2 | "101 Dalmatiner" | 1961 | 0 |
| 3 | "Vernetzt - Johnny Mnemonic" | 1995 | 16 |
| 4 | "Waehrend Du schliefst..." | 1995 | 6 |
| 5 | "Casper" | 1995 | 6 |
| 6 | "French Kiss" | 1995 | 6 |
| 7 | "Stadtgespraech" | 1995 | 12 |
| 8 | "Apollo 13" | 1995 | 6 |
| 9 | "Schlafes Bruder" | 1995 | 12 |
| 10 | "Assassins - Die Killer" | 1995 | 16 |
| 11 | "Braveheart" | 1995 | 16 |
| 12 | "Das Netz" | 1995 | 12 |
| 13 | "Free Willy 2" | 1995 | 6 |
+----+------------------------------+------+-----+
I want to get the movie with the highest number of people that participated. I figured out an SQL statement that actually does this, but looks super complicated. It looks like this:
SELECT titel
FROM movie.movie
JOIN (SELECT *
FROM (SELECT Max(count_person) AS max_count_person
FROM (SELECT movie,
Count(person) AS count_person
FROM movie.participation
GROUP BY movie) AS countPersons) AS
maxCountPersons
JOIN (SELECT movie,
Count(person) AS count_person
FROM movie.participation
GROUP BY movie) AS countPersons
ON maxCountPersons.max_count_person =
countPersons.count_person)
AS maxPersonsmovie
ON maxPersonsmovie.movie = movie.id
The main problem is, that I can't find an easier way to select the row with the highest value. If I simply could make a selection on the inner table and pick the row with the highest value on count_person without losing the information about the movie itself, this would look so much simpler. Is there a way to simplify this, or is this really the easiest way to do this?
Here is a way without subqueries:
SELECT m.title
FROM movie.movie m JOIN
movie.participation p
ON m.id = p.movie
GROUP BY m.title
ORDER BY COUNT(*) DESC
FETCH FIRST 1 ROW ONLY;
You can use LIMIT 1 instead of FETCH, if you prefer.
Note: In the event of ties, this only returns one value. That seems consistent with your question.
You can use rank window function to do this.
SELECT title
FROM (SELECT m.title,rank() over(order by count(p.person) desc) as rnk
FROM movie.movie m
LEFT JOIN movie.participation p ON m.id=p.movie
GROUP BY m.title
) t
WHERE rnk=1
SELECT title
FROM movie.movie
WHERE id = (SELECT movie
FROM movie.participation
GROUP BY movie
ORDER BY count(*) DESC
LIMIT 1);

SQL compare multiple rows or partitions to find matches

The database I'm working on is DB2 and I have a problem similar to the following scenario:
Table Structure
-------------------------------
| Teacher Seating Arrangement |
-------------------------------
| PK | seat_argmt_id |
| | teacher_id |
-------------------------------
-----------------------------
| Seating Arrangement |
-----------------------------
|PK FK | seat_argmt_id |
|PK | Row_num |
|PK | seat_num |
|PK | child_name |
-----------------------------
Table Data
------------------------------
| Teacher Seating Arrangement|
------------------------------
| seat_argmt_id | teacher_id |
| 1 | 1 |
| 2 | 1 |
| 3 | 1 |
| 4 | 1 |
| 5 | 2 |
------------------------------
---------------------------------------------------
| Seating Arrangement |
---------------------------------------------------
| seat_argmt_id | row_num | seat_num | child_name |
| 1 | 1 | 1 | Abe |
| 1 | 1 | 2 | Bob |
| 1 | 1 | 3 | Cat |
| | | | |
| 2 | 1 | 1 | Abe |
| 2 | 1 | 2 | Bob |
| 2 | 1 | 3 | Cat |
| | | | |
| 3 | 1 | 1 | Abe |
| 3 | 1 | 2 | Cat |
| 3 | 1 | 3 | Bob |
| | | | |
| 4 | 1 | 1 | Abe |
| 4 | 1 | 2 | Bob |
| 4 | 1 | 3 | Cat |
| 4 | 2 | 2 | Dan |
---------------------------------------------------
I want to see where there are duplicate seating arrangements for a teacher. And by duplicates I mean where the row_num, seat_num, and child_name are the same among different seat_argmt_id for one teacher_id. So with the data provided above, only seat id 1 and 2 are what I would want to pull back, as they are duplicates on everything but the seat id. If all the children on the 2nd table are exact (sans the primary & foreign key, which is seat_argmt_id in this case), I want to see that.
My initial thought was to do a count(*) group by row#, seat#, and child. Everything with a count of > 1 would mean it's a dupe and = 1 would mean it's unique. That logic only works if you are comparing single rows though. I need to compare multiple rows. I cannot figure out a way to do it via SQL. The solution I have involves going outside of SQL and works (probably). I'm just wondering if there is a way to do it in DB2.
Does this do what you want?
select d.teacher_id, sa.row_num, sa.seat_num, sa.child_name
from seatingarrangement sa join
data d
on sa.seat_argmt_id = d.seat_argmt_id
group by d.teacher_id, sa.row_num, sa.seat_num, sa.child_name
having count(*) > 1;
EDIT:
If you want to find two arrangements that are the same:
select sa1.seat_argmt_id, sa2.seat_argmt_id
from seatingarrangement sa1 join
seatingarrangement sa2
on sa1.seat_argmt_id < sa2.seat_argmt_id and
sa1.row_num = sa2.row_num and
sa1.seat_num = sa2.seat_num and
sa1.child_name = sa2.child_name
group by sa1.seat_argmt_id, sa2.seat_argmt_id
having count(*) = (select count(*) from seatingarrangement sa where sa.seat_argmt_id = sa1.seat_argmt_id) and
count(*) = (select count(*) from seatingarrangement sa where sa.seat_argmt_id = sa2.seat_argmt_id);
This finds the matches between two arrangements and then verifies that the counts are correct.