WHERE clause does not find column after a CTE? - sql

New to CTE's and subqueries in SQL.
I have 3 tables:
categories (category_code, category)
countries (country_code, country, continent)
businesses (business, year_founded, category_code, country_code)
Goal is to look at oldest businesses in the world. I used a CTE:
WITH bus_cat_cont AS (
SELECT business, year_founded, category, country,
continent
FROM businesses AS b
INNER JOIN categories AS c1
ON b.category_code = c1.category_code
INNER JOIN countries AS c2
ON b.country_code = c2.country_code
)
SELECT continent,
category,
COUNT(business) AS n
FROM bus_cat_cont
WHERE n > 5
GROUP BY continent, category
ORDER BY n DESC;
The code works without WHERE n > 5. But after adding that, I get the error:
column "n" does not exist
I realized there is a much easier way to get the output I want without a CTE.
But I'm wondering: Why do I get this error?

This would work:
WITH bus_cat_cont AS (
SELECT business, year_founded, category, country, continent
FROM businesses AS b
JOIN categories AS c1 ON b.category_code = c1.category_code
JOIN countries AS c2 ON b.country_code = c2.country_code
)
SELECT continent, category, count(business) AS n
FROM bus_cat_cont
-- WHERE n > 5 -- wrong
GROUP BY continent, category
HAVING count(business) > 5 -- right
ORDER BY n DESC;
The output column name "n" is not visible (yet) in the WHERE or HAVING clause. Consider the sequence of events in an SQL query:
Best way to get result count before LIMIT was applied
For the record, the result has no obvious connection to your declared goal to "look at oldest businesses in the world". year_founded is unused in the query.
You get the most common continent/category combinations among businesses.
Aside, probably better:
SELECT co.continent, ca.category, n
FROM (
SELECT category_code, country_code, count(*) AS n
FROM businesses
GROUP BY 1, 2
HAVING count(*) > 5
) b
JOIN categories ca USING (category_code)
JOIN countries co USING (country_code)
ORDER BY n DESC;
There is really no need for a CTE.
Aggregate first, join later. See:
Query with LEFT JOIN not returning rows for count of 0
Beside being faster, this is also safer. While category_code, country_code should be defined UNIQUE, the same may not be true for continent and category. (You may want to output codes additionally to disambiguate.)
count(*) is implemented separately and slightly faster - and equivalent while business is defined NOT NULL.

Related

SQL subquery and column naming issue

Deal all
this is correct code
SELECT MAX(inflation_rate) AS max_inf
FROM (
SELECT name, continent, inflation_rate
FROM countries
INNER JOIN economies
USING (code)
WHERE year = 2015) AS subquery
GROUP BY continent;
this is incorrect code
SELECT MAX(subquery.economies.inflation_rate) AS max_inf
FROM
(SELECT countries.name, countries.continent, economies.inflation_rate
FROM countries
INNER JOIN economies
ON countries.code = economies.code
WHERE economies.year = 2015) AS subquery
GROUP BY subquery.countries.continent;
Why 2nd one is not allowed ?
SELECT
MAX(subquery.economies.inflation_rate) AS max_inf -- 3
FROM (
SELECT
countries.name, -- 1
countries.continent,
economies.inflation_rate
FROM ...) AS subquery -- 2
GROUP BY
subquery.countries.continent; -- 3
You are using a subquery (2). This subquery returns three columns: name, continent, inflation_rate (1). Only these names are known outside the subquery, but nothing else. So the superior query does not know anything about where did the column names come from. The table or the table schema is irrelevant.
So for the superior query the only relevant information is: The name of the subquery and the column names (3):
SELECT
MAX(subquery.inflation_rate) AS max_inf -- change
FROM (
SELECT
countries.name,
countries.continent,
economies.inflation_rate
FROM ...) AS subquery
GROUP BY
subquery.continent; -- change
Since I am assuming this is postgresql you could simplify this and get rid of the subquery.
SELECT continent
, max(inflation_rate) as max_inf
FROM countries
INNER JOIN economies USING (code)
WHERE year = 2015
group by continent
You don't need to write subquery.countries.continent as you've a subquery and you renamed it - so subquery.continent is enough
SELECT MAX(subquery.inflation_rate) AS max_inf FROM
(SELECT countries.name, countries.continent, economies.inflation_rate
FROM countries INNER JOIN economies
ON countries.code = economies.code
WHERE economies.year = 2015) AS subquery
GROUP BY subquery.continent

Can't figure out whether I need to NEST or JOIN or something else?

I have 3 tables - continents, country info & flights.
I want to run a query that ranks the continents in DESC order by counting the number of countries within them that have 0 flights booked historically. But shows the continents NAME instead of its id
Continents table - cont_id (P-KEY), name varchar, notes varchar
Country table - cntry_id (P-KEY), name varchar, abbreviation varchar
Flights info table - cntry_id int, cont_id int, flights float, date date
I'm doing the on Metabase and thus far I've managed to get it to do everything but show the continents name, it only goes as far as showing its id. I have tried to NEST the main query and tried to using a join instead but neither have worked
SELECT "public"."flights_info"."cont_id", count(*) AS "count"
FROM "public"."flights_info"
WHERE "public"."flights_info"."flights" <= 0
GROUP BY "public"."flights_info"."cont_id"
ORDER BY "count" DESC
I'm successfully getting the cont_id, I just need a line of code that will make it run a lookup from the continents table and give me names that match the id's (I only want the names to show not the ID's)
So the simple answer is to take your query, reformat it a little, and then add an INNER JOIN to the continent table. Something like this would probably work:
SELECT
c.[name] AS continent_name,
COUNT(*) AS [count]
FROM
[public].flights_info f
INNER JOIN [public].continents c ON c.cont_id = f.cont_id
WHERE
f.flights <= 0
GROUP BY
c.[name]
ORDER BY
2 DESC;
However, I'm not convinced your original query is correct. You said you wanted to count the number of countries in each continent with no flights booked historically, and I don't think this is what you are counting at all. Instead you are counting the number of rows for each continent with a flights value of 0 or less than zero. Now maybe this is actually how your database works, and if so then cool, the query above should get you onto the right track.
However, if this database works anything like I think it should do then you would need a very different query, e.g. this one:
SELECT
c.[name] AS continent_name,
COUNT(DISTINCT cn.cntry_id) AS [count]
FROM
[public].continents c
INNER JOIN [public].country cn ON cn.cont_id = c.cont_id
LEFT JOIN [public].flights_info f ON f.cont_id = c.cont_id AND f.cntry_id = cn.cntry_id
WHERE
ISNULL(f.flights, 0) <= 0
GROUP BY
c.[name]
ORDER BY
2 DESC;
How does this work? Well it starts off with the continent table, and then links this to countries, to get a list of the countries in each continent. Then it performs a LEFT JOIN to the flights table, so it will get hits even if there's no flight data. Finally it counts up the number of countries where there was a flights value of 0 or less, or where there's no flights data at all.
Even this probably isn't correct, as if you had two rows for a country (I'm going to assume the flights table has a row for each continent, country, date), where one had a flights = 0 and one had a flights = 10, then this would still report that country as having no flights. But now I'm getting too far away from the original question I feel...
You can use join or sub-query to do this
Using JOIN
You just need to join with Continents table based cont_id column and fetch Continents name.
select Continents.name, count(*) AS "count"
FROM flights_info flights_info
join Continents Continents
on Continents.cont_id = flights_info.cont_id
WHERE flights_info.flights <= 0
GROUP BY flights_info.cont_id,Continents.name
ORDER BY "count" DESC
Using Sub-Query
You can write another query that give you Continents name by matching cont_id.
select (select Continents.name from Continents Continents where Continents.cont_id =
flights_info.cont_id ) Continent_name, count(*) AS "count"
FROM flights_info flights_info
WHERE flights_info.flights <= 0
GROUP BY flights_info.cont_id
ORDER BY "count" DESC

Subtracting values of columns from two different tables

I would like to take values from one table column and subtract those values from another column from another table.
I was able to achieve this by joining those tables and then subtracting both columns from each other.
Data from first table:
SELECT max_participants FROM courses ORDER BY id;
Data from second table:
SELECT COUNT(id) FROM participations GROUP BY course_id ORDER BY course_id;
Here is some code:
SELECT max_participants - participations AS free_places FROM
(
SELECT max_participants, COUNT(participations.id) AS participations
FROM courses
INNER JOIN participations ON participations.course_id = courses.id
GROUP BY courses.max_participants, participations.course_id
ORDER BY participations.course_id
) AS course_places;
In general, it works, but I was wondering, if there is some way to make it simplier or maybe my approach isn't correct and this code will not work in some conditions? Maybe it needs to be optimized.
I've read some information about not to rely on natural order of result set in databases and that information made my doubts to appear.
If you want the values per course, I would recommend:
SELECT c.id, (c.max_participants - COUNT(p.id)) AS free_places
FROM courses c LEFT JOIN
participations p
ON p.course_id = c.id
GROUP BY c.id, c.max_participants
ORDER BY 1;
Note the LEFT JOIN to be sure all courses are included, even those with no participants.
The overall number is a little tricker. One method is to use the above as a subquery. Alternatively, you can pre-aggregate each table:
select c.max_participants - p.num_participants
from (select sum(max_participants) as max_participants from courses) c cross join
(select count(*) as num_participants from participants from participations) p;

Returning the Min() of a Count()

I am studying for an SQL test and the previous year has the final question:
Name the student who has studied the least number of papers. How many
papers have they studied?
So far, this is the select query that I have created:
select min(Full_Name), min(Amount)
from (select st.ST_F_Name & ' ' & st.ST_L_Name as Full_Name, count(*) as Amount
from (student_course as sc
inner join students as st
on st.ST_ID=sc.SC_ST_ID)
group by st.ST_F_Name & ' ' & st.ST_L_Name)
This works perfectly for returning the result I want but I'm not sure if this is the way I should be doing this query? I feel like calling min() on the Full_Name could potentially backfire on me under certain circumstances. Is there a better way to be doing this? (this is in MS Access for unknown reasons)
If you want only 1 of such students if there are multiple, this is probably the simplest:
select st.ST_F_Name, st.ST_L_Name, count(*) as Amount
from student_course as sc
inner join students as st
on st.ST_ID=sc.SC_ST_ID
group by st.ST_ID
order by Amount ASC LIMIT 1
However, if you want to find all stuch students, you follow a different approach. We use a WITH clause to simplify things, that defines a CTE (Common Table Expression) computing the number of courses per-student. And then we select students where their number equals to the minimum in that CTE:
with per_student as (
select st.ST_F_Name, st.ST_L_Name, count(*) as Amount
from student_course as sc
inner join students as st
on st.ST_ID=sc.SC_ST_ID
group by st.ST_ID
)
select * from per_student
where amount = (select min(amount) from per_student)
But the real trick in that question is that there might be students that didn't take ANY courses. But with approaches presented so far you'll never see them. You want something like this:
with per_student as (
select st.ST_F_Name, st.ST_L_Name, count(sc.SC_ST_ID) as Amount
from student_course as sc
right outer join students as st
on st.ST_ID=sc.SC_ST_ID
group by st.ST_ID
)
select * from per_student
where amount = (select min(amount) from per_student)
You can order by count(*) to get the student with the least # of papers:
i.e.
select * from students where st_id in (
select top 1 sc_st_id
from student_course
group by sc_st_id
order by count(*)
)
if you also need the # of papers studied, then join a derived table containing the min count:
select * from students s
left join (
select top 1 sc_st_id, count(*)
from student_course
group by sc_st_id
order by count(*)
) t on t.sc_st_id = s.st_id

Using group by and having clause

Using the following schema:
Supplier (sid, name, status, city)
Part (pid, name, color, weight, city)
Project (jid, name, city)
Supplies (sid, pid, jid**, quantity)
Get supplier numbers and names for suppliers of parts supplied to at least two different projects.
Get supplier numbers and names for suppliers of the same part to at least two different projects.
These were my answers:
1.
SELECT s.sid, s.name
FROM Supplier s, Supplies su, Project pr
WHERE s.sid = su.sid AND su.jid = pr.jid
GROUP BY s.sid, s.name
HAVING COUNT (DISTINCT pr.jid) >= 2
2.
SELECT s.sid, s.name
FROM Suppliers s, Supplies su, Project pr, Part p
WHERE s.sid = su.sid AND su.pid = p.pid AND su.jid = pr.jid
GROUP BY s.sid, s.name
HAVING COUNT (DISTINCT pr.jid)>=2
Can anyone confirm if I wrote this correctly? I'm a little confused as to how the Group By and Having clause works
The semantics of Having
To better understand having, you need to see it from a theoretical point of view.
A group by is a query that takes a table and summarizes it into another table. You summarize the original table by grouping the original table into subsets (based upon the attributes that you specify in the group by). Each of these groups will yield one tuple.
The Having is simply equivalent to a WHERE clause after the group by has executed and before the select part of the query is computed.
Lets say your query is:
select a, b, count(*)
from Table
where c > 100
group by a, b
having count(*) > 10;
The evaluation of this query can be seen as the following steps:
Perform the WHERE, eliminating rows that do not satisfy it.
Group the table into subsets based upon the values of a and b (each tuple in each subset has the same values of a and b).
Eliminate subsets that do not satisfy the HAVING condition
Process each subset outputting the values as indicated in the SELECT part of the query. This creates one output tuple per subset left after step 3.
You can extend this to any complex query there Table can be any complex query that return a table (a cross product, a join, a UNION, etc).
In fact, having is syntactic sugar and does not extend the power of SQL. Any given query:
SELECT list
FROM table
GROUP BY attrList
HAVING condition;
can be rewritten as:
SELECT list from (
SELECT listatt
FROM table
GROUP BY attrList) as Name
WHERE condition;
The listatt is a list that includes the GROUP BY attributes and the expressions used in list and condition. It might be necessary to name some expressions in this list (with AS). For instance, the example query above can be rewritten as:
select a, b, count
from (select a, b, count(*) as count
from Table
where c > 100
group by a, b) as someName
where count > 10;
The solution you need
Your solution seems to be correct:
SELECT s.sid, s.name
FROM Supplier s, Supplies su, Project pr
WHERE s.sid = su.sid AND su.jid = pr.jid
GROUP BY s.sid, s.name
HAVING COUNT (DISTINCT pr.jid) >= 2
You join the three tables, then using sid as a grouping attribute (sname is functionally dependent on it, so it does not have an impact on the number of groups, but you must include it, otherwise it cannot be part of the select part of the statement). Then you are removing those that do not satisfy your condition: the satisfy pr.jid is >= 2, which is that you wanted originally.
Best solution to your problem
I personally prefer a simpler cleaner solution:
You need to only group by Supplies (sid, pid, jid**, quantity) to
find the sid of those that supply at least to two projects.
Then join it to the Suppliers table to get the supplier same.
SELECT sid, sname from
(SELECT sid from supplies
GROUP BY sid
HAVING count(DISTINCT jid) >= 2
) AS T1
NATURAL JOIN
Supliers;
It will also be faster to execute, because the join is only done when needed, not all the times.
--dmg
Because we can not use Where clause with aggregate functions like count(),min(), sum() etc. so having clause came into existence to overcome this problem in sql. see example for having clause go through this link
http://www.sqlfundamental.com/having-clause.php
First of all, you should use the JOIN syntax rather than FROM table1, table2, and you should always limit the grouping to as little fields as you need.
Altought I haven't tested, your first query seems fine to me, but could be re-written as:
SELECT s.sid, s.name
FROM
Supplier s
INNER JOIN (
SELECT su.sid
FROM Supplies su
GROUP BY su.sid
HAVING COUNT(DISTINCT su.jid) > 1
) g
ON g.sid = s.sid
Or simplified as:
SELECT sid, name
FROM Supplier s
WHERE (
SELECT COUNT(DISTINCT su.jid)
FROM Supplies su
WHERE su.sid = s.sid
) > 1
However, your second query seems wrong to me, because you should also GROUP BY pid.
SELECT s.sid, s.name
FROM
Supplier s
INNER JOIN (
SELECT su.sid
FROM Supplies su
GROUP BY su.sid, su.pid
HAVING COUNT(DISTINCT su.jid) > 1
) g
ON g.sid = s.sid
As you may have noticed in the query above, I used the INNER JOIN syntax to perform the filtering, however it can be also written as:
SELECT s.sid, s.name
FROM Supplier s
WHERE (
SELECT COUNT(DISTINCT su.jid)
FROM Supplies su
WHERE su.sid = s.sid
GROUP BY su.sid, su.pid
) > 1
What type of sql database are using (MSSQL, Oracle etc)?
I believe what you have written is correct.
You could also write the first query like this:
SELECT s.sid, s.name
FROM Supplier s
WHERE (SELECT COUNT(DISTINCT pr.jid)
FROM Supplies su, Projects pr
WHERE su.sid = s.sid
AND pr.jid = su.jid) >= 2
It's a little more readable, and less mind-bending than trying to do it with GROUP BY. Performance may differ though.
1.Get supplier numbers and names for suppliers of parts supplied to at least two different projects.
SELECT S.SID, S.NAME
FROM SUPPLIES SP
JOIN SUPPLIER S
ON SP.SID = S.SID
WHERE PID IN
(SELECT PID FROM SUPPPLIES GROUP BY PID, JID HAVING COUNT(*) >= 2)
I am not slear about your second question