I have a table of data and I would want to group 2 columns based on a logic formed from a few case statements in a new column. This is my data:
And this is my current sql:
select a.Action,st.State,ym.Year,sum(RatingCount) as LevelCount
from ActionTable a
left join StateTable st on a.ID = st.ActionID
left join YearMetrics ym a.Name = ym.NameCategory and st.Name = ym.CategoryName
group by a.name,st.name,ym.Year,ym.Level
These are the case statements (not all of them) base on which the logic should apply:
case when level = 'high' and levelcount >= 1 then 'High'
case when level = 'medium' and levelcount >3 then 'High'
else Low
end as Level
So, for example in case of Oregon (lines 20,21,22) I would want, based on the case statements to group the data on Action, State, Year. A new column named Level should be formed based from the logic on the case statements. So in the case of line 20, because there is no case statements to match the data in the table the result should be:
Non-Travel Oregon 2020 Low
The lines 21,22 should be:
Non-Travel Oregon 2021 High
because, according to the case statements, there is one levelcount >=1 and Level is High. In the case of line 19 the result should be :
Non-Travel Nevada null null
What I have tried includes:
Partitions
CLR object to include the logic in a c# assembly
Stuff function
Group by case statements
I have not managed to obtain the desired result using any of the techniques.
This is the expected result:
Any help would be appreciated.
This appears to be the logic that you are describing:
select a.Action, st.State, ym.Year,
sum(RatingCount) as LevelCount,
(case when level = 'high' and sum(RatingCount) >= 1 then 'High'
when level = 'medium' and sum(RatingCount) > 3 then 'High'
when level = 'medium' then 'Low'
end) as Level
from ActionTable a left join
StateTable st
on a.ID = st.ActionID left join
YearMetrics ym
on a.Name = ym.NameCategory and st.Name = ym.CategoryName
group by a.name, st.name, ym.Year, ym.Level;
As far as I can tell, the stated expected results are not compatible with what you've give us in terms of rules of how to derive them. It also doesn't help that your data rather than being the raw data is the output of your existing query. As a result, it feels like we're guessing a bit here ...
The query I've given below doesn't return what you say you want, but it's close and I think agrees with your explanation.
WITH subquery AS
(
select a.Action,st.State,ym.Year,ym.Level,sum(RatingCount) as LevelCount
from ActionTable a
left join StateTable st on a.ID = st.ActionID
left join YearMetrics ym a.Name = ym.NameCategory and st.Name = ym.CategoryName
group by a.name,st.name,ym.Year,ym.Level
) --This is just your original code with ym.Level added to the SELECT clause.
SELECT
s.Action,
s.State,
s.Year,
CASE WHEN s.Level = 'high' AND s.LevelCount >=1 THEN 'High'
WHEN s.Level = 'medium' AND s.LevelCount >0 THEN 'High'
WHEN s.Level IS NULL THEN NULL --If you don't do this, NULLs become 'Low'
ELSE 'Low'
END AS NewLevel
FROM
subquery s
GROUP BY
s.Action,
s.State,
s.Year,
CASE WHEN s.Level = 'high' AND s.LevelCount >=1 THEN 'High'
WHEN s.Level = 'medium' AND s.LevelCount >0 THEN 'High'
WHEN s.Level IS NULL THEN NULL
ELSE 'Low'
END
Related
I'm trying to run this code for an assignment for a class I've got. The "x" at the end of my subquery keeps on giving me errors and I can't wrap my head around why this is.
The goal of this assignment is to count (by age group) the number of reports that Carditis was a symptom after receiving a COVID shot.
Thanks in advance
Select agegroup, sum(case when died= 'Y' then 1 else 0 end) as Deaths
From (Select *,
Case
when age<=2 then 'infant'
when age<18 then 'juvenile'
when age<35 then 'adult'
when age<65 then 'old adult'
when age>=65 then 'senior'
else 'unknown' end as agegroup
from dbo.symptoms as s
join dbo.vaersvax as v on s.vaers_id=v.vaers_id
join dbo.patient as p on s.vaers_id=p.vaers_id
where v.vax_type='COVID19' and OneVax='Y' and symptom='Carditis'
) as x
Group By agegroup
Order By avg(age)
As #Schmocken already said, you can't perform a SELECT FROM a subquery that returns more than one column with the same name. As I suppose from your external query, this would do the job for you:
Select agegroup, sum(case when died= 'Y' then 1 else 0 end) as Deaths
From (Select died, age,
Case
when age<=2 then 'infant'
when age<18 then 'juvenile'
when age<35 then 'adult'
when age<65 then 'old adult'
when age>=65 then 'senior'
else 'unknown' end as agegroup
from dbo.symptoms as s
join dbo.vaersvax as v on s.vaers_id=v.vaers_id
join dbo.patient as p on s.vaers_id=p.vaers_id
where v.vax_type='COVID19' and OneVax='Y' and symptom='Carditis'
) as x
Group By agegroup
Order By avg(age)
By using Select * you have specified the same column name to be returned more than once.
As an example, you are returning both s.vaers_id and v.vaers_id, which are the same. This is not allowed; a subquery must return a unique set of column names.
You could return s.* successfully, but not all columns from all tables.
I have SQL Query like this in SSMS
select distinct (b.TransactionNumber),
(case when b.Amount > 0 then c.total else 0 end) as 'Total Sales',
(case when b.TenderID = 1 then b.Amount else 0 end) as 'Cash',
(case when b.TenderID = 20 then b.Amount else 0 end) as 'Gift Certificates'
from [Transaction] c
inner join TenderEntry b on c.TransactionNumber = b.TransactionNumber
but the output is(see image for reference)
This should be the expected output(see image for reference)
I would expect one row per transaction number, especially given your use of select distinct:
select t.TransactionNumber, te.total as total_sales,
sum(case when t.TenderID = 1 then t.Amount else 0 end) as Cash,
sum(case when t.TenderID = 20 then t.Amount else 0 end) as Gift_Certificates
from TenderEntry te join
Transaction t
on te.TransactionNumber = t.TransactionNumber
group by t.TransactionNumber, te.total;
This produces one row per transaction.
Note the changes to the query:
The table aliases are meaningful (i.e. abbreviations of table names) rather than arbitrary letters.
The column aliases do not use single quotes. Only use single quotes for string and date constants.
The column aliases have been simplified so they do not need to be escaped.
It occurs to me that you might want to "list" the cash and gifts in the two columns. This would look like:
select TransactionNumber,
max(case when seqnum = 1 then total end) as total_sales,
sum(case when tenderId = 1 then amount end) as cash,
sum(case when tenderId = 20 then amount end) as Gift_Certificates
from (select t.TransactionNumber, te.total, t.amount, t.TenderID,
row_number() over (partition by t.TransactionNumber, t.TenderId order by t.amount) as seqnum
from TenderEntry te join
Transaction t
on te.TransactionNumber = t.TransactionNumber
where tenderid in (1, 20)
) x
group by t.TransactionNumber, seqnum;
This is only a partial answer, but I put it here because I cannot fit it into comments well enough.
It is likely that you do not want the DISTINCT component in the select. SELECT DISTINCT find all unique rows. So if you have 3 rows which all have some differences, it will show all three. If, on the other hand, there were two rows the same (e.g., they paid for a $100 item with two $50 vouchers) it would just ignore one of them.
Instead, you probably need to become familiar with 'GROUP BY' which allows you to find totals etc across multiple rows.
For example (although not tested)
select
(b.TransactionNumber),
(case when b.Amount > 0 then c.total else 0 end) as 'Total Sales',
SUM(case when b.TenderID = 1 then b.Amount else 0 end) as 'Cash',
SUM(case when b.TenderID = 20 then b.Amount else 0 end) as 'Gift Certificates'
from [Transaction] c
inner join TenderEntry b on c.TransactionNumber = b.TransactionNumber
GROUP BY (b.TransactionNumber, (case when b.Amount > 0 then c.total else 0 end))
In the above, I have removed the DISTINCT, added 'SUM' for the two transaction values (cash/certificates) and the GROUP BY across the TransactionNumber and Total sales (as that seems to be common across the transaction).
For the above data, what that would produce is 1 line of data for TransactionNumber = 1, with sales = 250 (as all the lines have that), and totals for cash and gift certificates (150 and 100 respectively if my maths is correct).
However, that is not your desired answer, - you want this transaction to go over two lines. Firstly, are you sure of that?
If you do, then we need another criteria by which to group them and you will need to specify that e.g., why does this transaction's report need to go over two lines rather than just one?
Other notes
'Transaction' is a special word within SQL Server. You are allowed to use it, but I would consider renaming the table.
I'm a beginner at SQL and this is the question I have been asked to solve:
Say that a big city is defined as a place of type city with a population of at
least 100,000. Write an SQL query that returns the scheme (state_name,no_big_city,big_city_population) ordered by state_name, listing those states which have either (a) at least five big cities or (b) at least one million people living in big cities. The column state_name is the name of the state, no_big_city is the number of big cities in the state, and big_city_population is the number of people living in big cities in the state.
Now, as far as I can see, the following query returns correct results:
SELECT state.name AS state_name
, COUNT(CASE WHEN place.type = 'city' AND place.population >= 100000 THEN 1 ELSE NULL END) AS no_big_city
, SUM(CASE WHEN place.type = 'city' AND place.population >= 100000 THEN place.population ELSE NULL END) AS big_city_population
FROM state
JOIN place
ON state.code = place.state_code
GROUP BY state_name
HAVING
COUNT(CASE WHEN place.type = 'city' AND place.population >= 100000 THEN 1 ELSE NULL END) >= 5 OR
SUM(CASE WHEN place.type = 'city' AND place.population >= 100000 THEN place.population ELSE NULL END) >= 1000000
ORDER BY state_name;
However, the two aggregate functions used in the code appear twice. MY question: is there any way of making this code duplication disappear preserving functionality?
To be clear, I have already tried using the alias, but I just get a "column does not exist" error.
The manual clarifies:
An output column's name can be used to refer to the column's value in
ORDER BY and GROUP BY clauses, but not in the WHERE or HAVING clauses;
there you must write out the expression instead.
Bold emphasis mine.
You can avoid typing long expressions repeatedly with a subquery or CTE:
SELECT state_name, no_big_city, big_city_population
FROM (
SELECT s.name AS state_name
, COUNT(*) FILTER (WHERE p.type = 'city' AND p.population >= 100000) AS no_big_city
, SUM(population) FILTER (WHERE p.type = 'city' AND p.population >= 100000) AS big_city_population
FROM state s
JOIN place p ON s.code = p.state_code
GROUP BY s.name -- can be input column name as well, best schema-qualified to avoid ambiguity
) sub
WHERE no_big_city >= 5
OR big_city_population >= 1000000
ORDER BY state_name;
While being at it, I simplified with the aggregate FILTER clause (Postgres 9.4+):
How can I simplify this game statistics query?
However, I suggest this simpler and faster query to begin with:
SELECT s.state_name, p.no_big_city, p.big_city_population
FROM state s
JOIN (
SELECT state_code AS code -- alias just to simplify join
, count(*) AS no_big_city
, sum(population) AS big_city_population
FROM place
WHERE type = 'city'
AND population >= 100000
GROUP BY 1 -- can be ordinal number referencing position in SELECT list
HAVING count(*) >= 5 OR sum(population) >= 1000000 -- simple expressions now
) p USING (code)
ORDER BY 1; -- can also be ordinal number
I am demonstrating another option to reference expressions in GROUP BY and ORDER BY. Only use that if it doesn't impair readability and maintainability.
Not sure if this is a comment or an answer, since it is more preference based as opposed to technical, but I'll post it anyway
What I usually do when I need to reference calculated columns (usually a LOT at the same time) is I put my calculated columns within a derived table and then reference the calculated columns using its alias outside of the derived table. This syntax should be ANSI-SQL correct, but I am not familiar with PostGRES
select * from (
SELECT STATE.NAME AS state_name
,COUNT(CASE WHEN place.type = 'city'
AND place.population >= 100000 THEN 1 ELSE NULL END) AS no_big_city
,SUM(CASE WHEN place.type = 'city'
AND place.population >= 100000 THEN place.population ELSE NULL END) AS big_city_population
FROM STATE
INNER JOIN place
ON STATE.code = place.state_code
GROUP BY state_name
) sub
where no_big_city >= 5
and big_city_population >=100000
--HAVING COUNT(CASE WHEN place.type = 'city'
-- AND place.population >= 100000 THEN 1 ELSE NULL END) >= 5
-- OR SUM(CASE WHEN place.type = 'city'
-- AND place.population >= 100000 THEN place.population ELSE NULL END) >= 1000000
ORDER BY state_name;
The nice thing about this approach is, although you are adding complication via a subquery/derived table, the formula is kept in one place, so any changes only have to happen once. I do not know if this will perform worse than simply repeating the calcuation in the group-by, but I can't imagine it would be that much worse.
SELECT clause is what you want to select from the filtred by WHERE clause table(s).
GROUP BY is a condition how to group filtered records to use in aggregation functions in the SELECT. So alias cannot be there.
But you can wrap your filtered records and select from them. Something like that:
SELECT state_name, no_big_city, big_city_population
FROM
(
SELECT
state.name AS state_name,
COUNT(1) no_big_city,
MAX(place.population) max_city_population,
SUM(place.population) AS big_city_population
FROM state JOIN place ON state.code = place.state_code
WHERE
place.type = 'city' AND
place.population >= 100000
GROUP BY state.name
)
WHERE
no_big_city >= 5 OR
max_city_population > 1000000
ORDER BY state_name
Also, moving conditions
place.type = 'city' AND
place.population >= 100000
out of CASE to WHERE will perform better. "No city" or "small city records will not be processed. especially if there is an index on place.type column.
I have a single table in the following format:
STATE SURVEY_ANSWER
NC high
NC moderate
WA high
FL low
NC high
I am looking for a single query that will get me the following result:
STATE HIGH MODERATE LOW
NC 2 1 0
WA 1 0 0
FL 0 0 1
Unfortunately, these are the results I am getting:
STATE HIGH MODERATE LOW
NC 3 1 1
WA 3 1 1
FL 3 1 1
Here is the code I am using:
Select mytable.STATE,
(SELECT COUNT(*) FROM mytable WHERE mytable.survey_answer = 'low' and state = mytable.state) AS low,
(SELECT COUNT(*) FROM mytable WHERE mytable.survey_answer = 'moderate' and state = mytable.state) AS moderate,
(SELECT COUNT(*) FROM mytable WHERE mytable.survey_answer = 'high' and state = mytable.state) AS high,
FROM mytable
GROUP BY mytable.state;
While this and other forums have been very helpful I am unable to figure out what I am doing wrong. PLEASE NOTE: I am using Access so CASE WHEN solutions do not work. Thank you for any advice.
It looks like this may be an issue caused by not using table aliases. Because you are doing sub-queries on the same table that the outer SELECT is using and not giving the outer table an alias, both of the conditions in the WHERE of the sub-query are only using data in the sub-query.
In other words, when you write:
SELECT COUNT(*) FROM mytable WHERE mytable.survey_answer = 'low' and state = mytable.state
It doesn't know anything about the outer query.
Try this:
SELECT t1.STATE,
(SELECT COUNT(*) FROM mytable t2 WHERE t2.state = t1.state AND t2.survey_answer = 'low') low,
(SELECT COUNT(*) FROM mytable t3 WHERE t3.state = t1.state AND t3.survey_answer = 'moderate') moderate,
(SELECT COUNT(*) FROM mytable t4 WHERE t4.state = t1.state AND t4.survey_answer = 'high') high,
FROM mytable t1
GROUP BY t1.state
Aiias answer explains why your current query is not working, but I thought I'd point out that your assumption that you can't use CASE WHEN solutions is only partly right, yes you can't use CASE WHEN but that doesn't mean you need correlated subqueries. You could simply use:
SELECT mytable.STATE,
SUM(IIF(mytable.survey_answer = 'low', 1, 0) AS low,
SUM(IIF(mytable.survey_answer = 'moderate', 1, 0) AS moderate,
SUM(IIF(mytable.survey_answer = 'high', 1, 0) AS high
FROM mytable
GROUP BY mytable.state;
I want to select *, and not have to type out all individual columns, but I also want to include a custom column with a case statement. I tried the following:
select *, (case when PRI_VAL = 1 then 'High'
when PRI_VAL = 2 then 'Med'
when PRI_VAL = 3 then 'Low'
end) as PRIORITY
from MYTABLE;
But it is complaining that
ORA-00923: FROM keyword not found where expected
Add an alias for mytable like this:
select t.*, (case when PRI_VAL = 1 then 'High'
when PRI_VAL = 2 then 'Med'
when PRI_VAL = 3 then 'Low'
end) as PRIORITY
from MYTABLE t;
This is not dependent on any specific Oracle version, not sure about other databases.
As IronGoofy says, add the table alias.
On a different note be aware that there is a handy searched case syntax that would be suitable for your situation:
select t.*,
case PRI_VAL
when 1 then 'High'
when 2 then 'Med'
when 3 then 'Low'
end as PRIORITY
from MYTABLE t;
Do it like this:
select e.*,
case deptno
when 30 then 'High'
when 20 then 'Medi'
when 10 then 'Low'
else 'Very Low'
end case
from emp e order by deptno desc;