SQL Server recursive query with associated table - sql

I have a typical parent/child relationship table to represent folders. My challenge is using it in conjunction with another table.
The folder table is like this:
+--+----+--------+
|id|name|parentid|
+--+----+--------+
|1 |a |null |
+--+----+--------+
|2 |b |1 |
+--+----+--------+
|3 |c1 |2 |
+--+----+--------+
|4 |c2 |2 |
+--+----+--------+
The association table is like this:
+--+--------+
|id|folderid|
+--+--------+
|66|2 |
+--+--------+
|77|3 |
+--+--------+
so that where association.id = 66 has a relationship to folder.id = 2
What I need to do is find the association.id of the first ancestor with a record in the association table.. Using the example data above, given folder.id of 3 I expect to find 77; given folder.id of 2 or 4 I expect to find 66; any other folder.id value would find null.
Finding folder ancestry can be done with a common table expression like this:
WITH [recurse] (id,name,parentid,lvl) AS
(
select a.id,a.name,a.parentid,0 FROM folder AS a
WHERE a.id='4'
UNION ALL
select r.id,r.name,r.parentid,lvl+1 FROM folder as r
INNER JOIN [recurse] ON recurse.parentid = r.id
)
SELECT * from [recurse] ORDER BY lvl DESC
yielding the results:
+--+----+--------+---+
|id|name|parentid|lvl|
+--+----+--------+---+
|1 |a | |2 |
+--+----+--------+---+
|2 |b |1 |1 |
+--+----+--------+---+
|4 |c2 |2 |0 |
+--+----+--------+---+
To include the association.id I've tried using a LEFT JOIN in the recursive portion of the CTE, but this is not allowed by SQL Server.
What workaround do I have for this?
Or better yet, is the a way to query directly for the particular association.id? (e.g., without walking through the results of the CTE query that I have been attempting)

SELECT r.id, r.name, r.parentid, r.lvl, a.folderid, a.id as associationid
FROM [recurse] r
LEFT JOIN [association] a
ON r.id = a.folderid
WHERE a.folderId IS NOT NULL
ORDER BY lvl DESC
This will give you the records that have values in the association table. Then you could limit it to the first record that has a value or just grab the top result

Related

Conditional count of rows where at least one peer qualifies

Background
I'm a novice SQL user. Using PostgreSQL 13 on Windows 10 locally, I have a table t:
+--+---------+-------+
|id|treatment|outcome|
+--+---------+-------+
|a |1 |0 |
|a |1 |1 |
|b |0 |1 |
|c |1 |0 |
|c |0 |1 |
|c |1 |1 |
+--+---------+-------+
The Problem
I didn't explain myself well initially, so I've rewritten the goal.
Desired result:
+-----------------------+-----+
|ever treated |count|
+-----------------------+-----+
|0 |1 |
|1 |3 |
+-----------------------+-----+
First, identify id that have ever been treated. Being "ever treated" means having any row with treatment = 1.
Second, count rows with outcome = 1 for each of those two groups. From my original table, the ids who are "ever treated" have a total of 3 outcome = 1, and the "never treated", so to speak, have 1 `outcome = 1.
What I've tried
I can get much of the way there, I think, with something like this:
select treatment, count(outcome)
from t
group by treatment;
But that only gets me this result:
+---------+-----+
|treatment|count|
+---------+-----+
|0 |2 |
|1 |4 |
+---------+-----+
For the updated question:
SELECT ever_treated, sum(outcome_ct) AS count
FROM (
SELECT id
, max(treatment) AS ever_treated
, count(*) FILTER (WHERE outcome = 1) AS outcome_ct
FROM t
GROUP BY 1
) sub
GROUP BY 1;
ever_treated | count
--------------+-------
0 | 1
1 | 3
db<>fiddle here
Read:
For those who got no treatment at all (all treatment = 0), we see 1 x outcome = 1.
For those who got any treatment (at least one treatment = 1), we see 3 x outcome = 1.
Would be simpler and faster with proper boolean values instead of integer.
(Answer to updated question)
here is an easy to follow subquery logic that works with integer:
select subq.ever_treated, sum(subq.count) as count
from (select id, max(treatment) as ever_treated, count(*) as count
from t where outcome = 1
group by id) as subq
group by subq.ever_treated;

In SQL, query a table by transposing column results

Background
Forgive the title of this question, as I'm not really sure how to describe what I'm trying to do.
I have a SQL table, d, that looks like this:
+--+---+------------+------------+
|id|sex|event_type_1|event_type_2|
+--+---+------------+------------+
|a |m |1 |1 |
|b |f |0 |1 |
|c |f |1 |0 |
|d |m |0 |1 |
+--+---+------------+------------+
The Problem
I'm trying to write a query that yields the following summary of counts of event_type_1 and event_type_2 cut (grouped?) by sex:
+-------------+-----+-----+
| | m | f |
+-------------+-----+-----+
|event_type_1 | 1 | 1 |
+-------------+-----+-----+
|event_type_2 | 2 | 1 |
+-------------+-----+-----+
The thing is, this seems to involve some kind of transposition of the 2 event_type columns into rows of the query result that I'm not familiar with as a novice SQL user.
What I've tried
I've so far come up with the following query:
SELECT event_type_1, event_type_2, count(sex)
FROM d
group by event_type_1, event_type_2
But that only gives me this:
+------------+------------+-----+
|event_type_1|event_type_2|count|
+------------+------------+-----+
|1 |1 |1 |
|1 |0 |1 |
|0 |1 |2 |
+------------+------------+-----+
You can use a lateral join to unpivot the data. Then use conditional aggregate to calculate m and f:
select v.which,
count(*) filter (where d.sex = 'm') as m,
count(*) filter (where d.sex = 'f') as f
from d cross join lateral
(values (d.event_type_1, 'event_type_1'),
(d.event_type_2, 'event_type_2')
) v(val, which)
where v.val = 1
group by v.which;
Here is a db<>fiddle.

select categories where their parent is type 1

I have this table categories
|catId |catName|catParentID|catType|
-------------------------------------
|1 |cat1 |null |6 |
|2 |cat2 |null |9 |
|3 |cat3 |1 |6 |
|4 |cat4 |2 |9 |
|5 |cat5 |1 |6 |
|6 |cat6 |3 |8 |
the parents are in the same table with the sub categories only they have no parent.
i need to get all the sub categories that their parent's type is 6.
the output of the example above should look like this-
cat3
cat5
Given your data structure, this seems to work:
select c.*
from categories c
where c.catParentID is not null and -- has a parent
c.catType = 6;
However, that might not be a general solution. So you can use a self-join:
select c.*
from categories c join
categories cp
on c.catParentID = cp.catID
where cp.catType = 6;
SELECT *
FROM categories
WHERE cattype = 6
AND catparentid IS NOT NULL
The Simplest way is,
SELECT * FORM categories WHERE catParentId ='1' AND catType ='6'
Try this... (Based on your desired output)
SELECT t1.*
FROM tablename t1
LEFT JOIN tablename t2 ON t1.catparentid = t2.catid
WHERE t2.cattype = 6
AND t2.catparentid IS NULL

Return rows only if matches all list values

Let's say I have a table customers:
-----------------
|id|name|country|
|1 |Joe |Mexico |
|2 |Mary|USA |
|3 |Jim |France |
-----------------
And a table languages:
-------------
|id|language|
|1 |English |
|2 |Spanish |
|3 |French |
-------------
And a table cust_lang:
------------------
|id|custId|langId|
|1 |1 |1 |
|2 |1 |2 |
|3 |2 |1 |
|4 |3 |3 |
------------------
Given a list: ["English", "Spanish", "Portugese"]
Using a WHERE IN for the list, it will still return customers with ids 1,2 because they match "English" and "Spanish".
However, the results should be 0 rows returned since no customer matches ALL three terms.
I only want the customer ids to return if it matches the cust_lang table.
For instance, Given a list: ["English", "Spanish"]
I would want the results to be customer Id 1, since he alone speaks both languages.
EDIT: #GordonLinoff - That works!!
Now to make it more complex, what's wrong with this additional related query:
Let's assume I also have a table degrees:
-----------
|id|degree|
|1 |PHD |
|2 |BA |
|3 |MD |
-----------
A corresponding join table cust_deg:
------------------
|id|custId|degId |
|1 |1 |1 |
|2 |1 |2 |
|3 |2 |1 |
|4 |3 |3 |
------------------
The following query does not work. However, it is two of the same queries combined. The results should be only rows that match both lists, instead of the one list.
SELECT * FROM customers C
WHERE C.id IN (
SELECT CL.langId FROM cust_lang CL
JOIN languages L on CL.langId = L.id
WHERE L.language IN ("English", "Spanish")
GROUP BY CL.langID
HAVING COUNT(*) = 2)
AND C.id IN (
SELECT CD.custId FROM cust_deg CD
JOIN degrees D ON CD.degID = D.id
WHERE D.degree IN ("PHD", "BA")
GROUP BY CD.custId HAVING COUNT(*) = 2));`
EDIT2: I think i fixed it. I accidentally had an extra select statement in there.
You can do this with group by and having:
select cl.custid
from cust_lang cl join
languages l
on cl.langid = l.id
where l.language in ('English', 'Spanish', 'Portuguese')
group by cl.custid
having count(*) = 3;
If, for example, you only wanted to check for two languages, then you need only change you WHERE ... IN and HAVING conditions, e.g.:
where l.language in ('English', 'Spanish')
and
having count(*) = 2
This is pretty much Gordon's answer but it has the benefit of being a little more flexible on the language list and it doesn't require any change to the having clause.
with my_languages as (
select langId from languages
where language in ('English', 'Spanish')
)
select cl.custId
from cust_lang as cl inner join my_languages as l on l.langId = cl.langId
group by cl.custId
having count(*) = (select count(*) from lang)

count and distinct over multiple columns

I have a database table containing two costs. I want to find the distinct costs over these two columns. I also want to find the count that these costs appear. The table may look like
|id|cost1|cost2|
|1 |50 |60 |
|2 |20 |50 |
|3 |50 |70 |
|4 |20 |30 |
|5 |50 |60 |
In this case I want a result that is distinct over both columns and count the number of times that appears. So the result I would like is
|distinctCost|count|
|20 |2 |
|30 |1 |
|50 |4 |
|60 |2 |
|70 |1 |
and ideally ordered
|disctinCost1|count|
|50 |4 |
|60 |2 |
|20 |2 |
|70 |1 |
|30 |1 |
I can get the distinct over two columns by doing something like
select DISTINCT c FROM (SELECT cost1 AS c FROM my_costs UNION SELECT cost2 AS c FROM my_costs);
and I can get the count for each column by doing
select cost1, count(*)
from my_costs
group by cost1
order by count(*) desc;
My problem is how can I get the count for both columns? I am stuck on how to do the count over each individual column and then add it up.
Any pointers would be appreciated.
I am using Oracle DB.
Thanks
By combining your two queries..
select cost, count(*)
from
(
SELECT id, cost1 AS cost FROM my_costs
UNION ALL
SELECT id, cost2 AS c FROM my_costs
) v
group by cost
order by count(*) desc;
(If when a row has cost1 and cost2 equal, you want to count it once not twice, change the union all to a union)
You can use the unpivot statement :
select *
from
(
SELECT cost , count(*) as num_of_costs
FROM my_costs
UNPIVOT
(
cost
FOR cost_num IN (cost1,cost2)
)
group by cost
)
order by num_of_costs desc;