Summing the results of Case queries in SQL - sql

I think this is a relatively straightforward question but I have spent the afternoon looking for an answer and cannot yet find it. So...
I have a view with a country column and a number column. I want to make any number less than 10 'other' and then sum the 'other's into one value.
For example,
AR 10
AT 7
AU 11
BB 2
BE 23
BY 1
CL 2
I used CASE as follows:
select country = case
when number < 10 then 'Other'
else country
end,
number
from ...
This replaces the countries values with less than 10 in the number column to other but I can't work out how to sum them. I want to end up with a table/view which looks like this:
AR 10
AU 11
BE 23
Other 12
Any help is greatly appreciated.

Just group by your case statement:
select
country = case
when number < 10 then 'Other'
else country
end,
sum(number)
from ...
group by
case
when number < 10 then 'Other'
else country
end

select country, number
from your_table
where number >= 10
union
select 'Other' as country, sum(number)
from your_table
where number < 10

You can wrap it into a derived table to avoid repeating the CASE statement.
select country,
sum(number) as number
from (
select case
when number < 10 then 'Other'
else country
end as country,
number
from ...
) t
group by country

You need a Group by
select country = case
when number < 10 then 'Other'
else country
end,
sum(number)
from ...
group by
case when number < 10 then 'Other' else country end

You were very close... You just can't do country = case... Try
Select
Case when number < 10 then 'Other'
Else country end as finalcountry,
Sum(number) as total number
From
YourTable
Group by
Finalcountry
Since I don't have MySQL handy, I don't remember if it allows you to use the final column name as a group by... You may have to re-copy the case when clause into the group by clause too.

Related

Is There a Way to Automate the Conversion of SQL Rows to Column Using Case?

I was playing with usa_names dataset on Bigquery and in order to be able to visualize the top 10 names between 1910 and 2020, I had to GROUP BY year and create a new column for each of the 10 names using CASE.
The thing is, I will like to visualize the top 100 and I want to know if there is a way to automate the CASE, in the sense that I don't have to write a "WHEN and THEN Clause for each name in order to create a column for them.
I had to use the following SQL query code to first get the top 10 names;
SELECT
name,
SUM(number) AS total
FROM
bigquery-public-data.usa_names.usa_1910_current
WHERE
year BETWEEN 1910 AND 2020
GROUP BY
name
ORDER BY
total DESC
LIMIT
10
And then use the following code to convert each name row to columns;
SELECT
year,
SUM(CASE WHEN name = 'James' THEN number ELSE 0 END) AS James,
SUM(CASE WHEN name = 'John' THEN number ELSE 0 END) AS John,
SUM(CASE WHEN name = 'Robert' THEN number ELSE 0 END) AS Robert,
SUM(CASE WHEN name = 'Michael' THEN number ELSE 0 END) AS Michael,
SUM(CASE WHEN name = 'William' THEN number ELSE 0 END) AS William,
SUM(CASE WHEN name = 'Mary' THEN number ELSE 0 END) AS Mary,
SUM(CASE WHEN name = 'Richard' THEN number ELSE 0 END) AS Richard,
SUM(CASE WHEN name = 'Joseph' THEN number ELSE 0 END) AS Joseph,
SUM(CASE WHEN name = 'Charles' THEN number ELSE 0 END) AS Charles,
SUM(CASE WHEN name = 'Thomas' THEN number ELSE 0 END) AS Thomas
FROM
bigquery-public-data.usa_names.usa_1910_current
GROUP BY
year
ORDER BY
year
I want to achieve the same result without having to first pull out the name and manually enter them into the CASE statements.
Also, this won't be needed if there is a way to visualize the data directly without having to convert the names from row to columns.
Thanks.
You need to combine 2 capabilities:
row to column: PIVOT clause
scripting to automate the query finding the top 10 names
declare top_names default ((
select concat("'", string_agg(name, "','"), "'")
from (
// your query in question
SELECT
name
FROM
bigquery-public-data.usa_names.usa_1910_current
WHERE
year BETWEEN 1910 AND 2020
GROUP BY
name
ORDER BY
SUM(number) DESC
LIMIT
10
)));
select top_names;
The output is:
'James','John','Robert','Michael','William','Mary','David','Richard','Joseph','Charles'
The PIVOT query you will need is:
SELECT * FROM
(select year, name, sum(number) number
from bigquery-public-data.usa_names.usa_1910_current
group by year, name
)
PIVOT(SUM(number) FOR name IN ('James','John','Robert','Michael','William','Mary','David','Richard','Joseph','Charles'
))
which output exactly as your second query.
To stick the 2 together, you will need something like:
execute immediate concat(
"""
SELECT * FROM
(select year, name, sum(number) number
from bigquery-public-data.usa_names.usa_1910_current
group by year, name
)
PIVOT(SUM(number) FOR name IN (
""",
top_names,
"))");
You shouldn't need to create a column for each name. Your first query is sufficient (would obviously just need to change the limit to 100). Based on the questions tags I'm assuming your using Tableau, so it would be as simple as choosing your desired visualisation (say a bar chart) and placing names on one axis and total on the other axis.
Based on your follow up comment it would look like this
SELECT
name,
year,
SUM(number) AS total
From bigquery-public-data.usa_names.usa_1910_current
WHERE name IN
(
SELECT name
FROM
(
SELECT
name,
SUM(number) AS total
FROM
bigquery-public-data.usa_names.usa_1910_current
WHERE
year BETWEEN 1910 AND 2020
GROUP BY
name
ORDER BY
total DESC
LIMIT
100
))
GROUP BY name, year
You could also look into using calculate fields within Tableau ok the raw data to achieve the desired visualisation.

Proportion request sql

There is a table of accidents and output the share of accidents number 2 to all accidents I wrote this code, but I can not make it work:
select ((select count("ID") from "DTP" where "REASON"=2)/count("REASON"))
from "DTP"
group by "ID"
Something like this (not tested):
select id, count(case reason when 2 then 1 end)/count(*) as proportion
from your_table
-- where ... (if you need to filter, for example by date)
group by id
;
count(*) counts all the rows in a group (that is, all the rows for each separate id). The case expression returns 1 when the reason is 2 and it returns null otherwise; count counts only non-null values, so it will count the rows where the reason is 2.
You can use avg():
select id,
avg(case when reason = 2 then 1.0 else 0 end)
from "DTP"
group by "ID"
This produces the ratio for each id -- based on your sample query. If you only want one row for all the data, then:
select avg(case when reason = 2 then 1.0 else 0 end)
from "DTP";

Multiple Word Count in SQL

I have a list of words I need to find in a specific column , "description of what happenned "
this holds anything up to 500 or more characters. I have the script below that does work
However how do I replace the Name column 1.2.3 with the actual name of the word I am looking for with the total next to it.
Just cant get it to display prob something simple.
select GROUPING_ID ( Amoxicillin ,Atorvastatin ) as Name ,count(*) as Total
from ( select case when [description_of_what_happened] like '%Amoxicillin%'
then 1 else 0 end as Amoxicillin ,
case when [description_of_what_happened] like '%Atorvastatin%'
then 1 else 0 end as Atorvastatin
FROM "NAME OF TABLE"
group by grouping sets (() ,(Amoxicillin),(Atorvastatin))
having coalesce (Amoxicillin,1) != 0 and coalesce (Atorvastatin,1) != 0
order by grouping_id (Amoxicillin,Atorvastatin)
row 3 being the total I need row 1 and row 2 to show the name of the product
result as below
Name Total
1 7
2 9
3 4112
You can use strings instead of flags:
select coalesce(Amoxicillin, Atorvastatin, 'Total') as Name,
count(*) as Total
from (select (case when [description_of_what_happened] like '%Amoxicillin%'
then 'Amoxicillin'
end) as Amoxicillin ,
(case when [description_of_what_happened] like '%Atorvastatin%'
then 'Atorvastatin'
end
) as Atorvastatin
from "NAME OF TABLE"
where Amoxicillin is not null or Atorvastatin is not null
group by grouping sets ((), (Amoxicillin), (Atorvastatin))
order by name;
Note that I also moved the logic in the having to the where.

Sum distinct records in a table with duplicates in Teradata

I have a table that has some duplicates. I can count the distinct records to get the Total Volume. When I try to Sum when the CompTia Code is B92 and run distinct is still counts the dupes.
Here is the query:
select
a.repair_week_period,
count(distinct a.notif_id) as Total_Volume,
sum(distinct case when a.header_comptia_cd = 'B92' then 1 else 0 end) as B92_Sum
FROM artemis_biz_app.aca_service_event a
where a.Sales_Org_Cd = '8210'
and a.notif_creation_dt >= current_date - 180
group by 1
order by 1
;
Is There a way to only SUM the distinct records for B92?
I also tried inner joining the table on itself by selecting the distinct notification id and joining on that notification id, but still getting wrong sum counts.
Thanks!
Your B92_Sum currently returns either NULL, 1 or 2, this is definitely no sum.
To sum distinct values you need something like
sum(distinct case when a.header_comptia_cd = 'B92' then column_to_sum else 0 end)
If this column_to_sum is actually the notif_id you get a conditional count but not a sum.
Otherwise the distinct might remove too many vales and then you probably need a Derived Table where you remove duplicates before aggregation:
select
repair_week_period,
--no more distinct needed
count(a.notif_id) as Total_Volume,
sum(case when a.header_comptia_cd = 'B92' then column_to_sum else 0 end) as B92_Sum
FROM
(
select repair_week_period,
notif_id
header_comptia_cd,
column_to_sum
from artemis_biz_app.aca_service_event
where a.Sales_Org_Cd = '8210'
and a.notif_creation_dt >= current_date - 180
-- only onw row per notif_id
qualify row_number() over (partition by notif_id order by ???) = 1
) a
group by 1
order by 1
;
#dnoeth It seems the solution to my problem was not to SUM the data, but to count distinct it.
This is how I resolved my problem:
count(distinct case when a.header_comptia_cd = 'B92' then a.notif_id else NULL end) as B92_Sum

I want a case statetement that count more than 1 as 1

Please help solve the below query.
The column in question has Y and N and I want the N to show zero and the Y to show 1.
I want it to aggregate the no of times visited for each machine and if >= 1 to show 1. Client requirement is whether machine has been visited regardless of the number of times.
Select MachineNo,
[Date_of_Visit],
Month([Date_of_Visit])[Month],
Year([Date_of_Visit])[Year],
sum(case when [Visited] = 'Y' then 1 else 0 end)[No of Visits]
FROM [MachineVisit]
Group by [Date_of_Visit],
[MachineNo]
Use Max instead of Sum
Max(case when [Visited] = 'Y' then 1 else 0 end)[Visits]
This is a little confusing, the way it is written, but I am assuming you want something like:
select sum(case when [visited] >1 then 1 else null end) visists
from visit_table
where visit_date = '2015-01-30'
This is assuming that you have a table that counts the visits for a single day for 1 IP address as single entry.
If you have a table that has an entry for every single page visit, then you would probably need to do:
select count(distinct ip_address)
from (
select ip_address
from visit_table
where visit_date = '2015-01-30'
group by ip_address
having count(1) >1
) x
EDIT:
Well then the simplist way should be:
select count(1)
from visit_table
where [visited]='Y'
and visit_date = '2015-01-30'
;
That should work...
Although I don't have the table in front of me - so if this doesn't work, please post the fully query.