SQL subquery and column naming issue - sql

Deal all
this is correct code
SELECT MAX(inflation_rate) AS max_inf
FROM (
SELECT name, continent, inflation_rate
FROM countries
INNER JOIN economies
USING (code)
WHERE year = 2015) AS subquery
GROUP BY continent;
this is incorrect code
SELECT MAX(subquery.economies.inflation_rate) AS max_inf
FROM
(SELECT countries.name, countries.continent, economies.inflation_rate
FROM countries
INNER JOIN economies
ON countries.code = economies.code
WHERE economies.year = 2015) AS subquery
GROUP BY subquery.countries.continent;
Why 2nd one is not allowed ?

SELECT
MAX(subquery.economies.inflation_rate) AS max_inf -- 3
FROM (
SELECT
countries.name, -- 1
countries.continent,
economies.inflation_rate
FROM ...) AS subquery -- 2
GROUP BY
subquery.countries.continent; -- 3
You are using a subquery (2). This subquery returns three columns: name, continent, inflation_rate (1). Only these names are known outside the subquery, but nothing else. So the superior query does not know anything about where did the column names come from. The table or the table schema is irrelevant.
So for the superior query the only relevant information is: The name of the subquery and the column names (3):
SELECT
MAX(subquery.inflation_rate) AS max_inf -- change
FROM (
SELECT
countries.name,
countries.continent,
economies.inflation_rate
FROM ...) AS subquery
GROUP BY
subquery.continent; -- change

Since I am assuming this is postgresql you could simplify this and get rid of the subquery.
SELECT continent
, max(inflation_rate) as max_inf
FROM countries
INNER JOIN economies USING (code)
WHERE year = 2015
group by continent

You don't need to write subquery.countries.continent as you've a subquery and you renamed it - so subquery.continent is enough
SELECT MAX(subquery.inflation_rate) AS max_inf FROM
(SELECT countries.name, countries.continent, economies.inflation_rate
FROM countries INNER JOIN economies
ON countries.code = economies.code
WHERE economies.year = 2015) AS subquery
GROUP BY subquery.continent

Related

Can Anyone explains how this where statement works in the query(not commented out) and how we are getting same result as above query without group by?

/*
SELECT countries.name AS country, COUNT(*) AS cities_num
FROM cities
INNER JOIN countries
ON countries.code = cities.country_code
GROUP BY country
ORDER BY cities_num DESC, country
LIMIT 9;
*/
SELECT name AS country,
-- Subquery
(SELECT count(*)
FROM cities
WHERE countries.code = cities.country_code) AS cities_num
FROM countries
ORDER BY cities_num DESC, country
LIMIT 9;
I was expecting the sub query to return the value of the count of all the cities across the countries and if that happens I was expecting in the lower query an error but it gives me the same result as the upper query that is count of cities for each country.
The query which you are referring as subquery is actually a classic example of Co-Related Subquery.
SELECT name AS country,
-- Subquery
(SELECT count(*)
FROM cities
WHERE countries.code = cities.country_code) AS cities_num
FROM countries
ORDER BY cities_num DESC, country
LIMIT 9;
Co-related sub-query is a sub-query which gets executed once for each record in the outer table. You can identify a co-related subquery by where condition which will always reference a column from outer table.
countries.code = cities.country_code
Countries is outer table and cities is inner table.
If you want to count all cities in the country using subquery, just removed the above condition.

WHERE clause does not find column after a CTE?

New to CTE's and subqueries in SQL.
I have 3 tables:
categories (category_code, category)
countries (country_code, country, continent)
businesses (business, year_founded, category_code, country_code)
Goal is to look at oldest businesses in the world. I used a CTE:
WITH bus_cat_cont AS (
SELECT business, year_founded, category, country,
continent
FROM businesses AS b
INNER JOIN categories AS c1
ON b.category_code = c1.category_code
INNER JOIN countries AS c2
ON b.country_code = c2.country_code
)
SELECT continent,
category,
COUNT(business) AS n
FROM bus_cat_cont
WHERE n > 5
GROUP BY continent, category
ORDER BY n DESC;
The code works without WHERE n > 5. But after adding that, I get the error:
column "n" does not exist
I realized there is a much easier way to get the output I want without a CTE.
But I'm wondering: Why do I get this error?
This would work:
WITH bus_cat_cont AS (
SELECT business, year_founded, category, country, continent
FROM businesses AS b
JOIN categories AS c1 ON b.category_code = c1.category_code
JOIN countries AS c2 ON b.country_code = c2.country_code
)
SELECT continent, category, count(business) AS n
FROM bus_cat_cont
-- WHERE n > 5 -- wrong
GROUP BY continent, category
HAVING count(business) > 5 -- right
ORDER BY n DESC;
The output column name "n" is not visible (yet) in the WHERE or HAVING clause. Consider the sequence of events in an SQL query:
Best way to get result count before LIMIT was applied
For the record, the result has no obvious connection to your declared goal to "look at oldest businesses in the world". year_founded is unused in the query.
You get the most common continent/category combinations among businesses.
Aside, probably better:
SELECT co.continent, ca.category, n
FROM (
SELECT category_code, country_code, count(*) AS n
FROM businesses
GROUP BY 1, 2
HAVING count(*) > 5
) b
JOIN categories ca USING (category_code)
JOIN countries co USING (country_code)
ORDER BY n DESC;
There is really no need for a CTE.
Aggregate first, join later. See:
Query with LEFT JOIN not returning rows for count of 0
Beside being faster, this is also safer. While category_code, country_code should be defined UNIQUE, the same may not be true for continent and category. (You may want to output codes additionally to disambiguate.)
count(*) is implemented separately and slightly faster - and equivalent while business is defined NOT NULL.

SQL dividing a count from one table by a number from a different table

I am struggling with taking a Count() from one table and dividing it by a correlating number from a different table in Microsoft SQL Server.
Here is a fictional example of what I'm trying to do
Lets say I have a table of orders. One column in there is states.
I have a second table that has a column for states, and second column for each states population.
I'd like to find the order per population for each sate, but I have struggled to get my query right.
Here is what I have so far:
SELECT Orders.State, Count(*)/
(SELECT StatePopulations.Population FROM Orders INNER JOIN StatePopulations
on Orders.State = StatePopulations.State
WHERE Orders.state = StatePopulations.State )
FROM Orders INNER JOIN StatePopulations
ON Orders.state = StatePopulations.State
GROUP BY Orders.state
So far I'm contending with an error that says my sub query is returning multiple results for each state, but I'm newer to SQL and don't know how to overcome it.
If you really want a correlated sub-query, then this should do it...
(You don't need to join both table in either the inner or outer query, the correlation in the inner query's where clause does the 'join'.)
SELECT
Orders.state,
COUNT(*) / (SELECT population FROM StatePopulation WHERE state = Orders.state)
FROM
Orders
GROUP BY
Orders.state
Personally, I'd just join them and use MAX()...
SELECT
Orders.state,
COUNT(*) / MAX(StatePopulation.population)
FROM
Orders
INNER JOIN
StatePopulation
StatePopulation.state = Orders.state
GROUP BY
Orders.state
Or aggregate your orders before you join...
SELECT
Orders.state,
Orders.order_count / StatePopulation.population
FROM
(
SELECT
Orders.state,
COUNT(*) AS order_count
FROM
Orders
GROUP BY
Orders.state
)
Orders
INNER JOIN
StatePopulation
StatePopulation.state = Orders.state
(Please forgive typos and smelling pistakes, I'm doing this on a phone.)

Query all columns of table1 left join and count of the table2

I couldn't get this query working :
DOESN'T WORK
select
Region.*, count(secteur.*) count
from
Region
left join
secteur on secteur.region_id = Region.id
The solution I found is this but is there a better solution using joins or if this doesn't affect performance, because I have a very large dataset of about 500K rows
WORKS BUT AFRAID OF PERFORMANCE ISSUES
select
Region.*,
(select count(*)
from Secteur
where Secteur.Region_id = region.id) count
from
Region
I would suggest:
select region.*, count(secteur.region_id) as count
from region left join secteur on region.id = secteur.region_id
group by region.id, region.field2, region.field3....
Note that count(table.field) will ignore nulls, whereas count(*) will include them.
Alternatively, left join on a subquery and use coalesce to avoid nulls:
select region.*, coalesce(t.c, 0) as count
from region left join
(select region_id, count(*) as c from secteur group by region_id) t on region.id = t.region_id
I'd join region on an aggregate query of secteur:
SELECT r.*, COALESCE(s.cnt, 0)
FROM region r
LEFT JOIN (SELECT region_id, COUNT(*) AS cnt
FROM secteur
GROUP BY region_id) s ON s.region_id = r.id
I would go with this query:
select r.*,
(select count(*)
from Secteur s
where s.Region_id = r.id
) as num_secteurs
from Region r;
Then fix the performance problem by adding an index on Secteur(region_id):
create index idx_secteur_region on secteur(region_id);
You make a two mistakes
First: you have try to calulate COUNT() in only one (I mean, the second) table. This doesn't will work because theCOUNT(), like an any aggregate function, calculates only for the whole set of rows, not just for any part of the set (not only just for the one or an other joined table).
In your first query, you may replace secteur. * only by asterisk, like a Region.region_id, count(*) AS count, and do not forget add Region.region_id on the GROUP BY step.
Second: You has define not only aggregate function in the query, but and other fields: select Region.*, but you don't define them in GROUP BY step. You need to add to GROUP BY statement all columns, which you has define in the SELECT step but not apply an aggregate functions to them.
Append: not, GROUP BY Region.* doesn't will work, you should to define a columns in the GROUP BY step by their actual names.
So, correct form of this will looks like a
SELECT
Region.col1
,Region.col2,
, count(*) count
from Region
left join
secteur on secteur.region_id = Region.id
GROUP BY Region.col1, Region.col2
Or, if you don't want to type each name of column, use window queries
SELECT
Region.*,
, count( * ) OVER (PARTITION BY region_id) AS count
from Region
left join
secteur on secteur.region_id = Region.id

SUM a column count from two tables

I have this simple unioned query in SQL Server 2014 where I am getting counts of rows from each table, and then trying to add a TOTAL row at the bottom that will SUM the counts from both tables. I believe the problem is the LEFT OUTER JOIN on the last union seems to be only summing the totals from the first table
SELECT A.TEST_CODE, B.DIVISION, COUNT(*)
FROM ALL_USERS B, SIGMA_TEST A
WHERE B.DOMID = A.DOMID
GROUP BY A.TEST_CODE, B.DIVISION
UNION
SELECT E.TEST_CODE, F.DIVISION, COUNT(*)
FROM BETA_TEST E, ALL_USERS F
WHERE E.DOMID = F.DOMID
GROUP BY E.TEST_CODE, F.DIVISION
UNION
SELECT 'TOTAL', '', COUNT(*)
FROM (SIGMA_TEST A LEFT OUTER JOIN BETA_TEST E ON A.DOMID
= E.DOMID )
Here is a sample of the results I am getting:
I would expect the TOTAL row to display a result of 6 (2+1+3=6)
I would like to avoid using a Common Table Expression (CTE) if possible. Thanks in advance!
Since you are counting users with matching DOMIDs in the first two statements, the final statement also needs to include the ALL_USERS table. The final statement should be:
SELECT 'TOTAL', '', COUNT(*)
FROM ALL_USERS G LEFT OUTER JOIN
SIGMA_TEST H ON G.DOMID = H.DOMID
LEFT OUTER JOIN BETA_TEST I ON I.DOMID = G.DOMID
WHERE (H.TEST_CODE IS NOT NULL OR I.TEST_CODE IS NOT NULL)
I would consider doing a UNION ALL first then COUNT:
SELECT COALESCE(TEST_CODE, 'TOTAL'),
DIVISION,
COUNT(*)
FROM (
SELECT A.TEST_CODE, B.DIVISION
FROM ALL_USERS B
INNER JOIN SIGMA_TEST A ON B.DOMID = A.DOMID
UNION ALL
SELECT E.TEST_CODE, F.DIVISION
FROM BETA_TEST E
INNER JOIN ALL_USERS F ON E.DOMID = F.DOMID ) AS T
GROUP BY GROUPING SETS ((TEST_CODE, DIVISION ), ())
Using GROUPING SETS you can easily get the total, so there is no need to add a third subquery.
Note: I assume you want just one count per (TEST_CODE, DIVISION). Otherwise you have to also group on the source table as well, as in #Gareth's answer.
I think you can achieve this with a single query. It seems your test tables have similar structures, so you can union them together and join to ALL_USERS, finally, you can use GROUPING SETS to get the total
SELECT ISNULL(T.TEST_CODE, 'TOTAL') AS TEST_CODE,
ISNULL(U.DIVISION, '') AS DIVISION,
COUNT(*)
FROM ALL_USERS AS U
INNER JOIN
( SELECT DOMID, TEST_CODE, 'SIGNMA' AS SOURCETABLE
FROM SIGMA_TEST
UNION ALL
SELECT DOMID, TEST_CODE, 'BETA' AS SOURCETABLE
FROM BETA_TEST
) AS T
ON T.DOMID = U.DOMID
GROUP BY GROUPING SETS ((T.TEST_CODE, U.DIVISION, T.SOURCETABLE), ());
As an aside, the implicit join syntax you are using was replaced over a quarter of a century ago in ANSI 92. It is not wrong, but there seems to be little reason to continue to use it, especially when you are mixing and matching with explicit outer joins and implicit inner joins. Anyone else that might read your SQL will certainly appreciate consistency.