I'm a beginner at SQL and this is the question I have been asked to solve:
Say that a big city is defined as a place of type city with a population of at
least 100,000. Write an SQL query that returns the scheme (state_name,no_big_city,big_city_population) ordered by state_name, listing those states which have either (a) at least five big cities or (b) at least one million people living in big cities. The column state_name is the name of the state, no_big_city is the number of big cities in the state, and big_city_population is the number of people living in big cities in the state.
Now, as far as I can see, the following query returns correct results:
SELECT state.name AS state_name
, COUNT(CASE WHEN place.type = 'city' AND place.population >= 100000 THEN 1 ELSE NULL END) AS no_big_city
, SUM(CASE WHEN place.type = 'city' AND place.population >= 100000 THEN place.population ELSE NULL END) AS big_city_population
FROM state
JOIN place
ON state.code = place.state_code
GROUP BY state_name
HAVING
COUNT(CASE WHEN place.type = 'city' AND place.population >= 100000 THEN 1 ELSE NULL END) >= 5 OR
SUM(CASE WHEN place.type = 'city' AND place.population >= 100000 THEN place.population ELSE NULL END) >= 1000000
ORDER BY state_name;
However, the two aggregate functions used in the code appear twice. MY question: is there any way of making this code duplication disappear preserving functionality?
To be clear, I have already tried using the alias, but I just get a "column does not exist" error.
The manual clarifies:
An output column's name can be used to refer to the column's value in
ORDER BY and GROUP BY clauses, but not in the WHERE or HAVING clauses;
there you must write out the expression instead.
Bold emphasis mine.
You can avoid typing long expressions repeatedly with a subquery or CTE:
SELECT state_name, no_big_city, big_city_population
FROM (
SELECT s.name AS state_name
, COUNT(*) FILTER (WHERE p.type = 'city' AND p.population >= 100000) AS no_big_city
, SUM(population) FILTER (WHERE p.type = 'city' AND p.population >= 100000) AS big_city_population
FROM state s
JOIN place p ON s.code = p.state_code
GROUP BY s.name -- can be input column name as well, best schema-qualified to avoid ambiguity
) sub
WHERE no_big_city >= 5
OR big_city_population >= 1000000
ORDER BY state_name;
While being at it, I simplified with the aggregate FILTER clause (Postgres 9.4+):
How can I simplify this game statistics query?
However, I suggest this simpler and faster query to begin with:
SELECT s.state_name, p.no_big_city, p.big_city_population
FROM state s
JOIN (
SELECT state_code AS code -- alias just to simplify join
, count(*) AS no_big_city
, sum(population) AS big_city_population
FROM place
WHERE type = 'city'
AND population >= 100000
GROUP BY 1 -- can be ordinal number referencing position in SELECT list
HAVING count(*) >= 5 OR sum(population) >= 1000000 -- simple expressions now
) p USING (code)
ORDER BY 1; -- can also be ordinal number
I am demonstrating another option to reference expressions in GROUP BY and ORDER BY. Only use that if it doesn't impair readability and maintainability.
Not sure if this is a comment or an answer, since it is more preference based as opposed to technical, but I'll post it anyway
What I usually do when I need to reference calculated columns (usually a LOT at the same time) is I put my calculated columns within a derived table and then reference the calculated columns using its alias outside of the derived table. This syntax should be ANSI-SQL correct, but I am not familiar with PostGRES
select * from (
SELECT STATE.NAME AS state_name
,COUNT(CASE WHEN place.type = 'city'
AND place.population >= 100000 THEN 1 ELSE NULL END) AS no_big_city
,SUM(CASE WHEN place.type = 'city'
AND place.population >= 100000 THEN place.population ELSE NULL END) AS big_city_population
FROM STATE
INNER JOIN place
ON STATE.code = place.state_code
GROUP BY state_name
) sub
where no_big_city >= 5
and big_city_population >=100000
--HAVING COUNT(CASE WHEN place.type = 'city'
-- AND place.population >= 100000 THEN 1 ELSE NULL END) >= 5
-- OR SUM(CASE WHEN place.type = 'city'
-- AND place.population >= 100000 THEN place.population ELSE NULL END) >= 1000000
ORDER BY state_name;
The nice thing about this approach is, although you are adding complication via a subquery/derived table, the formula is kept in one place, so any changes only have to happen once. I do not know if this will perform worse than simply repeating the calcuation in the group-by, but I can't imagine it would be that much worse.
SELECT clause is what you want to select from the filtred by WHERE clause table(s).
GROUP BY is a condition how to group filtered records to use in aggregation functions in the SELECT. So alias cannot be there.
But you can wrap your filtered records and select from them. Something like that:
SELECT state_name, no_big_city, big_city_population
FROM
(
SELECT
state.name AS state_name,
COUNT(1) no_big_city,
MAX(place.population) max_city_population,
SUM(place.population) AS big_city_population
FROM state JOIN place ON state.code = place.state_code
WHERE
place.type = 'city' AND
place.population >= 100000
GROUP BY state.name
)
WHERE
no_big_city >= 5 OR
max_city_population > 1000000
ORDER BY state_name
Also, moving conditions
place.type = 'city' AND
place.population >= 100000
out of CASE to WHERE will perform better. "No city" or "small city records will not be processed. especially if there is an index on place.type column.
Related
I'm trying to run this code for an assignment for a class I've got. The "x" at the end of my subquery keeps on giving me errors and I can't wrap my head around why this is.
The goal of this assignment is to count (by age group) the number of reports that Carditis was a symptom after receiving a COVID shot.
Thanks in advance
Select agegroup, sum(case when died= 'Y' then 1 else 0 end) as Deaths
From (Select *,
Case
when age<=2 then 'infant'
when age<18 then 'juvenile'
when age<35 then 'adult'
when age<65 then 'old adult'
when age>=65 then 'senior'
else 'unknown' end as agegroup
from dbo.symptoms as s
join dbo.vaersvax as v on s.vaers_id=v.vaers_id
join dbo.patient as p on s.vaers_id=p.vaers_id
where v.vax_type='COVID19' and OneVax='Y' and symptom='Carditis'
) as x
Group By agegroup
Order By avg(age)
As #Schmocken already said, you can't perform a SELECT FROM a subquery that returns more than one column with the same name. As I suppose from your external query, this would do the job for you:
Select agegroup, sum(case when died= 'Y' then 1 else 0 end) as Deaths
From (Select died, age,
Case
when age<=2 then 'infant'
when age<18 then 'juvenile'
when age<35 then 'adult'
when age<65 then 'old adult'
when age>=65 then 'senior'
else 'unknown' end as agegroup
from dbo.symptoms as s
join dbo.vaersvax as v on s.vaers_id=v.vaers_id
join dbo.patient as p on s.vaers_id=p.vaers_id
where v.vax_type='COVID19' and OneVax='Y' and symptom='Carditis'
) as x
Group By agegroup
Order By avg(age)
By using Select * you have specified the same column name to be returned more than once.
As an example, you are returning both s.vaers_id and v.vaers_id, which are the same. This is not allowed; a subquery must return a unique set of column names.
You could return s.* successfully, but not all columns from all tables.
I am trying to run a case count on SQL but I want the results without the 0 how do I do that?
Select ClubName,
ClubType,
Country,
Concat(Count(case when GameResult like 'w%' then 1 else NULL end), ' ','wins'),
Count(Case when GameResult like 'l%' then 1 end) AS Losses
from ClubDim,CountryDim,GamesFact
where ClubDim.CountryID = CountryDim.CountryID
And ClubDim.ClubID = GamesFact.ClubID
GROUP BY ClubName,ClubType,Country,GameResult
Having ClubType = 'Professional'
That's the code and I am getting a lot of zeros and my target is to count losses and wins in two separate columns
It's obviously not possible to test without sample data however you should be using sum not count here. Your having criteria should be part of the where clause as you are not filtering on an aggregate. I would also recommend using proper join syntax which has been standard since 1992! However this should give you expected results, I suspect.
Select ClubName,
ClubType,
Country,
Concat(sum(case when GameResult like 'w%' then 1 else 0 end), ' ','wins'),
Sum(Case when GameResult like 'l%' then 1 else 0 end) AS Losses
from ClubDim,CountryDim,GamesFact
where ClubDim.CountryID = CountryDim.CountryID and ClubDim.ClubID = GamesFact.ClubID
and ClubType = 'Professional'
GROUP BY ClubName,ClubType,Country,GameResult
The issue is that you have put gameResult in the group by.
More importantly, you need to learn proper, explicit, standard, readable JOIN syntax. Doing a Cartesian product and filtering in the WHERE clause is just awkward. Not using proper syntax is outdated:
select cl.ClubName, cl.ClubType, co.Country,
Concat(Count(case when g.GameResult like 'w%' then 1 else NULL end), ' ', 'wins'),
Count(Case when g.GameResult like 'l%' then 1 end) AS Losses
from ClubDim cl join
CountryDim co
on cl.CountryID = co.CountryID join
GamesFact g
on cl.ClubID = g.ClubID
where cl.ClubType = 'Professional'
group by cl.ClubName, cl.ClubType, co.Country;
In addition:
Note the use of table aliases. These should be abbreviations for the table names.
I did my best to qualify all columns, so it is clear what tables they come from. Of course, your question doesn't provide this information, so this is just guessing.
Filter before aggregating by using a WHERE clause rather than filtering after aggregating with a HAVING. Use HAVING when you want to filter on the results of aggregations (such as COUNT(*)).
You can try below query -
SELECT ClubName,
ClubType,
Country,
CASE WHEN Wins > 0 THEN CONCAT(Wins , ' wins') ELSE NULL END WINS,
loses
FROM (SELECT ClubName,
ClubType,
Country,
Count(case when GameResult like 'w%' then 1 else NULL end) AS Wins,
Count(Case when GameResult like 'l%' then 1 end) AS Losses
FROM ClubDim CLD
JOIN CountryDim COD ON CLD.CountryID = COD.CountryID
JOIN GamesFact GF ON CLD.ClubID = GF.ClubID
where ClubType = 'Professional'
GROUP BY ClubName,ClubType,Country) X;
I have a query and i want to have two HAVING conditions
The first condition is where sum is more than 6000 (Which i have
done)
The second condition is where the COUNT(1) CNT is more than 1 (Which
i need help in)
SELECT SYSDATE,
CUSTOMER.CIF_NO,
CUSTOMER.LONG_NAME_ENG,
TRANSTYPE.short_desc_Eng,
LOCATION.LONG_DESC_ENG ,
COUNT(1) CNT,
SUM(TRANS.AMOUNT) SM
FROM TRANS, CUSTOMER, TRANSTYPE, LOCATION
WHERE TRANS.TRS_AC_CIF = CUSTOMER.CIF_NO
AND TRANS.BRANCH_CODE = LOCATION.BRANCH_CODE
AND TRANS.COMP_CODE = LOCATION.COMP_CODE
AND TRANSTYPE.COMP_CODE = TRANS.COMP_CODE
AND TRANSTYPE.TYPE IN ( 'D' , 'T' )
AND TRANSTYPE.CODE = TRANS.TRX_TYPE
AND TRANS.STATUS = 'P'
AND TRANS.TRS_TYPE = 'R'
AND TRANS.CB_IND = 'C'
GROUP BY CUSTOMER.CIF_NO ,CUSTOMER.LONG_NAME_ENG,
TRANSTYPE.short_desc_Eng, LOCATION.LONG_DESC_ENG
HAVING SUM(TRANS.AMOUNT) > 6000
---------------------------
second having here
----------------------------
ORDER BY CUSTOMER.CIF_NO, CUSTOMER.LONG_NAME_ENG, LOCATION.LONG_DESC_ENG
More than one HAVING clause can not be specified within a SELECT statement, e.g. it's a violation. But add your needed condition such as
HAVING SUM(TRANS.AMOUNT) > 6000 AND COUNT(1) > 1
OR
HAVING SUM(TRANS.AMOUNT) > 6000 OR COUNT(1) > 1
as long as
a GROUP BY clause is present with the SQL statement
aggregations take place within the HAVING clause
P.S. Convert your query syntax to the syntax with explicit JOIN clauses among tables rather than old-style comma-seperated JOINs, and use aliases for the table names
So I'm trying to construct a query in BigQuery that I'm struggling with for a final part.
As of now I have:
SELECT
UNIQUE(Name) as SubscriptionName,
ID,
Interval,
COUNT(mantaSubscriptionIdmetadata) AS SubsPurchased,
SUM(RevenueGenerated) as RevenueGenerated
FROM (
SELECT
mantaSubscriptionIdmetadata,
planIdmetadata,
INTEGER(Amount) as RevenueGenerated
FROM
[sample_internal_data.charge0209]
WHERE
revenueSourcemetadata = 'new'
AND
Status = 'Paid'
GROUP BY
mantaSubscriptionIdmetadata,
planIdmetadata,
RevenueGenerated
)a
JOIN (
SELECT
id,
Name,
Interval
FROM
[sample_internal_data.subplans]
WHERE
id in ('150017','150030','150033','150019')
GROUP BY
id,
Name,
Interval )b
ON
a.planIdmetadata = b.id
GROUP BY
ID,
Interval,
Name
ORDER BY
Interval ASC
The resulting query looks like this
Which is exactly what I'm looking for up to that point.
Now what I'm stuck on this. There is another column I need to add called SalesRepName. The resulting field will either be null or not null. If its null it means it was sold online. If its not null, it means it was sold via telephone. What I want to do is create two additional columns where it says how many were sold via telesales and via online. The sum total of the two columns will always equal the SubsPurchased total.
Can anyone help?
You can include case statements within aggregate functions. Here you could choose sum(case when SalesRepName is null then 1 else 0 end) as online and sum(case when SalesRepName is not null then 1 else 0 end) as telesales.
count(case when SalesRepName is null then 1 end) as online would give the same result. Using sum in these situations is simply my personal preference.
Note that omitting the else clause is equivalent to setting else null, and null isn't counted by count. This can be very useful in combination with exact_count_distinct, which has no equivalent in terms of sum.
Try below:
it assumes your SalesRepName field is in [sample_internal_data.charge0209] table
and then it uses "tiny version" of SUM(CASE ... WHEN ...) which works when you need 0 or 1 as a result to be SUM'ed
SUM(SalesRepName IS NULL) AS onlinesales,
SUM(NOT SalesRepName IS NULL) AS telsales
SELECT
UNIQUE(Name) AS SubscriptionName,
ID,
Interval,
COUNT(mantaSubscriptionIdmetadata) AS SubsPurchased,
SUM(RevenueGenerated) AS RevenueGenerated,
SUM(SalesRepName IS NULL) AS onlinesales,
SUM(NOT SalesRepName IS NULL) AS telesales
FROM (
SELECT SalesRepName, mantaSubscriptionIdmetadata, planIdmetadata, INTEGER(Amount) AS RevenueGenerated
FROM [sample_internal_data.charge0209]
WHERE revenueSourcemetadata = 'new'
AND Status = 'Paid'
GROUP BY mantaSubscriptionIdmetadata, planIdmetadata, RevenueGenerated
)a
JOIN (
SELECT id, Name, Interval
FROM [sample_internal_data.subplans]
WHERE id IN ('150017','150030','150033','150019')
GROUP BY id, Name, Interval
)b
ON a.planIdmetadata = b.id
GROUP BY ID, Interval, Name
ORDER BY Interval ASC
I'll try to describe as best I can, but it's hard for me to wrap my whole head around this problem let alone describe it....
I am trying to select multiple results in one query to display the current status of a database. I have the first column as one type of record, and the second column as a sub-category of the first column. The subcategory is then linked to more records underneath that, distinguished by status, forming several more columns. I need to display every main-category/subcategory combination, and then the count of how many of each sub-status there are beneath that subcategory in the subsequent columns. I've got it so that I can display the unique combinations, but I'm not sure how to nest the select statements so that I can select the count of a completely different table from the main query. My problem lies in that to display the main category and sub category, I can pull from one table, but I need to count from a different table. Any ideas on the matter would be greatly appreciated
Here's what I have. The count statements would be replaced with the count of each status:
SELECT wave_num "WAVE NUMBER",
int_tasktype "INT / TaskType",
COUNT (1) total,
COUNT (1) "LOCKED/DISABLED",
COUNT (1) released,
COUNT (1) "PARTIALLY ASSEMBLED",
COUNT (1) assembled
FROM (SELECT DISTINCT
(t.invn_need_type || ' / ' || s.code_desc) int_tasktype,
t.task_genrtn_ref_nbr wave_num
FROM sys_code s, task_hdr t
WHERE t.task_genrtn_ref_nbr IN
(SELECT ship_wave_nbr
FROM ship_wave_parm
WHERE TRUNC (create_date_time) LIKE SYSDATE - 7)
AND s.code_type = '590'
AND s.rec_type = 'S'
AND s.code_id = t.task_type),
ship_wave_parm swp
GROUP BY wave_num, int_tasktype
ORDER BY wave_num
Image here: http://i.imgur.com/JX334.png
Guessing a bit,both regarding your problem and Oracle (which I've - unfortunately - never used), hopefully it will give you some ideas. Sorry for completely messing up the way you write SQL, SELECT ... FROM (SELECT ... WHERE ... IN (SELECT ...)) simply confuses me, so I have to restructure:
with tmp(int_tasktype, wave_num) as
(select distinct (t.invn_need_type || ' / ' || s.code_desc), t.task_genrtn_ref_nbr
from sys_code s
join task_hdr t
on s.code_id = t.task_type
where s.code_type = '590'
and s.rec_type = 'S'
and exists(select 1 from ship_wave_parm p
where t.task_genrtn_ref_nbr = p.ship_wave_nbr
and trunc(p.create_date_time) = sysdate - 7))
select t.wave_num "WAVE NUMBER", t.int_tasktype "INT / TaskType",
count(*) TOTAL,
sum(case when sst.sub_status = 'LOCKED' then 1 end) "LOCKED/DISABLED",
sum(case when sst.sub_status = 'RELEASED' then 1 end) RELEASED,
sum(case when sst.sub_status = 'PARTIAL' then 1 end) "PARTIALLY ASSEMBLED",
sum(case when sst.sub_status = 'ASSEMBLED' then 1 end) ASSEMBLED
from tmp t
join sub_status_table sst
on t.wave_num = sst.wave_num
group by t.wave_num, t.int_tasktype
order by t.wave_num
As you notice, I don't know anything about the table with the substatuses.
You can use inner join, grouping and count to get your result:
suppose tables are as follow :
cat (1)--->(n) subcat (1)----->(n) subcat_detail.
so the query would be :
select cat.title cat_title ,subcat.title subcat_title ,count(*) as cnt from
cat inner join sub_cat on cat.id=subcat.cat_id
inner join subcat_detail on subcat.ID=am.subcat_detail_id
group by cat.title,subcat.title
Generally when you need different counts, you need to use the CASE statment.
select count(*) as total
, case when field1 = "test' then 1 else 0 end as testcount
, case when field2 = 'yes' then 1 else 0 endas field2count
FROM table1