SQL - Counting column categories - sql

I am new to SQL, but I am trying to create a report where I can count the food purchases made by students and categorized by the Race/Ethnicity and choice of food every year.
For Example:
2012 2013
Student Vegetarian Meat Unknown Vegetarian Meat Unknown
---------------------------------------------------------------------------
Black 5 Purchases 4 Purchases 3 Purchases 5 Purchases etc
White 4 Purchases 3 Purchases 3 Purchases etc etc
Asian 4 Purchases 1 Purchases 6 Purchases etc etc
While I was able to get the races in one column, I am lost at getting the count on different food categories. All I know is that the pur_type is classified as "Veg", "Me" and "Ukt" for Vegetarian, Meat and unknown purchase.
Here is what I tried to do:
select
DATEPART(YYYY,pur_date) AS Year
,count (*) Citations
, race
, sex
, ethnicity
, GETDATE() as YTD
,COUNT(CASE WHEN pur_type = 'Veg' THEN 'Vegetarian'
WHEN pur_type = 'Me' THEN 'Meat'
WHEN pur_type = 'UKt' THEN 'Unknown'
ELSE ''
End as "Type"
, CASE
WHEN race='W' and sex='M' and ethnicity <> 'H' THEN 'Caucasian(Male)'
WHEN race='W' and sex='F' and ethnicity <> 'H' THEN 'Caucasian(Female)'
WHEN race='B' and sex='M' and ethnicity <> 'H' THEN 'African-American(Male)'
WHEN race='B' and sex='F' and ethnicity <> 'H' THEN 'African-American(Female)'
--WHEN race='A' and sex='M' and ethnicity <> 'H' THEN 'Pacific Islander(Male)'
--WHEN race='A' and sex='F' and ethnicity <> 'H' THEN 'Pacific Islander(Female)'
WHEN race='A' and sex='M' and ethnicity <> 'H' THEN 'Asian(Male)'
WHEN race='A' and sex='F' and ethnicity <> 'H' THEN 'Asian(Female)'
ELSE ''
END as 'Race'
from purchase
join purcharge
on purchase.purchaseid = purcharge.purchaseid
where pur_date between #startDate and dateadd(DD, +1, GETDATE())
and pur_type in ('Veg', 'Me', 'UKt')
GROUP BY DATEPART(YYYY,pur_date), race, sex, ethnicity , pur_type
ORDER BY 'Year','Race/Sex'
Looking for guidance. I am thinking maybe a sub query would work, maybe more case statements, but I am not able to have a count of how many food categories were purchased by students in each ethnic group.
Also,I am trying to get these results in an SSRS report so it may look slightly different in a SQL query.

You could look into the PIVOT statement but if it was me, I'd do more CASE...WHEN and then GROUP BY.
So, instead of
COUNT(CASE WHEN pur_type = 'Veg' THEN 'Vegetarian' (etc) AS Type
I'd do
SUM(CASE WHEN pur_type = 'Veg' THEN 1 ELSE 0) AS Vegetarian
etc.

This isn't really the best way to create a report, but if you need to use SQL (as opposed to having another layer of code to generate the report from a bunch of subsets of data), you could use subqueries. Without seeing the table structure, it is hard to write the query, but it could be something like this:
Select
race
, (select count(*) from purchase where pur_type = 'Veg' and race = race) as Vegetables
, (select count(*) from purchase where pur_type = 'Me' and race = race) as Meat
, (select count(*) from purchase where pur_type = 'UKt' and race = race) as Unknown
From purchase
Like I said, this isn't ideal, but it would get your the distinct subsets of data.

I don't know what your data looks like and havn't got time to build sample data but I think you've actually just got a typo or two..
SELECT
DATEPART(YYYY,pur_date) AS [Year]
,count (*) Citations
, race
, sex
, ethnicity
, GETDATE() as YTD
, COUNT(*) AS [PurchaseCount]
, CASE
WHEN pur_type = 'Veg' THEN 'Vegetarian'
WHEN pur_type = 'Me' THEN 'Meat'
WHEN pur_type = 'UKt' THEN 'Unknown'
ELSE ''
End as [Type]
, CASE
WHEN race='W' and sex='M' and ethnicity <> 'H' THEN 'Caucasian(Male)'
WHEN race='W' and sex='F' and ethnicity <> 'H' THEN 'Caucasian(Female)'
WHEN race='B' and sex='M' and ethnicity <> 'H' THEN 'African-American(Male)'
WHEN race='B' and sex='F' and ethnicity <> 'H' THEN 'African-American(Female)'
--WHEN race='A' and sex='M' and ethnicity <> 'H' THEN 'Pacific Islander(Male)'
--WHEN race='A' and sex='F' and ethnicity <> 'H' THEN 'Pacific Islander(Female)'
WHEN race='A' and sex='M' and ethnicity <> 'H' THEN 'Asian(Male)'
WHEN race='A' and sex='F' and ethnicity <> 'H' THEN 'Asian(Female)'
ELSE ''
END as [Race]
FROM purchase
JOIN purcharge
ON purchase.purchaseid = purcharge.purchaseid
WHERE pur_date BETWEEN #startDate AND dateadd(DD, +1, GETDATE())
AND pur_type in ('Veg', 'Me', 'UKt')
GROUP BY DATEPART(YYYY,pur_date), race, sex, ethnicity , pur_type
ORDER BY [Year],[Race], [Type]
This will not give exactl what you asked for as it includes both race and gender in your [Race] case statement but maybe that's what you really wanted, I don't know...
Anyway, all I really changed was the count(*) as [PurchaseCount] , the following line's CASE statement and the GROUP BY clause. COUNT(*) will just give you the count of records that fall within your GROUP BY clause. In this case a count of records for every [Year], [Race] and [Type]

Related

SQL query to find Date, TranscationAmount_US,TransactionAmount_UK

We have 2 tables:
Transaction (AccountId,Date,TransactionAmount)
Master(Aid,Country)
We need to find Date, TotalTAmt_US, Total_TAmt_UK
My solution:
Select
Date,
CASE WHEN Country in ('US') THEN SUM(TransAmt) ELSE '0'END AS TotalTAmt_US,
CASE WHEN Country in ('UK') THEN SUM(TransAmt) ELSE '0'END AS TotalTAmt_UK
FROM
(
SELECT
T.Date As Date,
M.Country As Country,
SUM(T.TransAmt) As TransAmt,
FROM
Transaction T JOIN Master M On T.Aid = M.Aid
WHERE Country in ('US','UK')
group by Date,Country
) As T1
group by Date;
Is this right?
Can we use Country in CASE WHEN without pulling it as I do not want to pull it and then group by it.
Advice please.
Thanks.
You have to declare in GROUP BY section all columns that you use in select statement.
So just put your cases to the grouping.
Select
Date,
CASE WHEN Country in ('US') THEN TransAmt ELSE 0 END AS TotalTAmt_US,
CASE WHEN Country in ('UK') THEN TransAmt ELSE 0 END AS TotalTAmt_UK
FROM
(
SELECT
T.Date As Date,
M.Country As Country,
SUM(T.TransAmt) As TransAmt,
FROM
Transaction T JOIN Master M On T.Aid = M.Aid
WHERE Country in ('US','UK')
GROUP BY Date,Country
) As T1
GROUP BY Date, CASE WHEN Country in ('US') THEN TransAmt ELSE 0 END AS
TotalTAmt_US, CASE WHEN Country in ('UK') THEN TransAmt ELSE 0 END
AS TotalTAmt_UK
Additionaly, remove the SUM() function in your case conditions. If you put them to the GROUP BY you can get error:
ORA-00934: Group function is not allowed here.
And, in the end, remove the ticks in zeros in ELSE conditions. You can get another error about incosistens data types.
I believe you wanted to do a conditional aggregation getting the sum for all US based transaction for a day and the sum of all the UK based transactions for the same day. Then you had to move the CASE into the sum(), adding the amount only if the country is the one you look for, otherwise zero.
Also your subquery isn't necessary.
SELECT t.date,
sum(CASE m.country
WHEN 'US' THEN
t.transamt
ELSE
0
END) totaltamt_us,
sum(CASE m.country
WHEN 'UK' THEN
t.transamt
ELSE
0
END) totaltamt_uk
FROM transaction t
INNER JOIN master m
ON m.aid = t.accountid
WHERE m.county IN ('US',
'UK')
GROUP BY t.date;
If you don't insist on having the different sums in different columns, but would also accept rows, it can be as simple as:
SELECT m.country,
t.date,
sum(t.transamt) totaltamt
FROM transaction t
INNER JOIN master m
ON m.aid = t.accountid
WHERE m.county IN ('US',
'UK')
GROUP BY m.country,
t.date;
I think you want conditional aggregation:
SELECT Date,
SUM(CASE WHEN Country in ('US') THEN TransAmt ELSE 0 END) AS TotalTAmt_US,
SUM(CASE WHEN Country in ('UK') THEN TransAmt ELSE 0 END) AS TotalTAmt_UK
FROM Transaction T JOIN
Master M
On T.Aid = M.Aid
WHERE Country in ('US', 'UK')
GROUP BY Date;
Notes:
You do not need two levels of aggregation.
Numbers should not be enclosed in single quotes, so 0, not '0'.

Division between data in rows - SQL

The data in my table looks like this:
date, app, country, sales
2017-01-01,XYZ,US,10000
2017-01-01,XYZ,GB,2000
2017-01-02,XYZ,US,30000
2017-01-02,XYZ,GB,1000
I need to find, for each app on a daily basis, the ratio of US sales to GB sales, so ideally the result would look like this:
date, app, ratio
2017-01-01,XYZ,10000/2000 = 5
2017-01-02,XYZ,30000/1000 = 30
I'm currently dumping everything into a csv and doing my calculations offline in Python but I wanted to move everything onto the SQL side. One option would be to aggregate each country into a subquery, join and then divide, such as
select d1_us.date, d1_us.app, d1_us.sales / d1_gb.sales from
(select date, app, sales from table where date between '2017-01-01' and '2017-01-10' and country = 'US') as d1_us
join
(select date, app, sales from table where date between '2017-01-01' and '2017-01-10' and country = 'GB') as d1_gb
on d1_us.app = d1_gb.app and d1_us.date = d1_gb.date
Is there a less messy way to go about doing this?
You can use the ratio of SUM(CASE WHEN) and GROUP BY in your query to do this without requiring a subquery.
SELECT DATE,
APP,
SUM(CASE WHEN COUNTRY = 'US' THEN SALES ELSE 0 END) /
SUM(CASE WHEN COUNTRY = 'GB' THEN SALES END) AS RATIO
FROM TABLE1
GROUP BY DATE, APP;
Based on the likelihood of the GB sales being zero, you can tweak the GB's ELSE condition, maybe ELSE 1, to avoid Divide by zero error. It really depends on how you want to handle exceptions.
You can use one query with grouping and provide the condition once:
SELECT date, app,
SUM(CASE WHEN country = 'US' THEN SALES ELSE 0 END) /
SUM(CASE WHEN country = 'GB' THEN SALES END) AS ratio
WHERE date between '2017-01-01' AND '2017-01-10'
FROM your_table
GROUP BY date, app;
However, this gives you zero if there are no records for US and NULL if there are no records for GB. If you need to return different values for those cases, you can use another CASE WHEN surrounding the division. For example, to return -1 and -2 respectively, you can use:
SELECT date, app,
CASE WHEN COUNT(CASE WHEN country = 'US' THEN 1 ELSE 0 END) = 0 THEN -1
WHEN COUNT(CASE WHEN country = 'GB' THEN 1 ELSE 0 END) = 0 THEN -2
ELSE SUM(CASE WHEN country = 'US' THEN SALES ELSE 0 END) /
SUM(CASE WHEN country = 'GB' THEN SALES END)
END AS ratio
WHERE date between '2017-01-01' AND '2017-01-10'
FROM your_table
GROUP BY date, app;
DROP TABLE IF EXISTS t;
CREATE TABLE t (
date DATE,
app VARCHAR(5),
country VARCHAR(5),
sales DECIMAL(10,2)
);
INSERT INTO t VALUES
('2017-01-01','XYZ','US',10000),
('2017-01-01','XYZ','GB',2000),
('2017-01-02','XYZ','US',30000),
('2017-01-02','XYZ','GB',1000);
WITH q AS (
SELECT
date,
app,
country,
SUM(sales) AS sales
FROM t
GROUP BY date, app, country
) SELECT
q1.date,
q1.app,
q1.country || ' vs ' || NVL(q2.country,'-') AS ratio_between,
CASE WHEN q2.sales IS NULL OR q2.sales = 0 THEN 0 ELSE ROUND(q1.sales / q2.sales, 2) END AS ratio
FROM q AS q1
LEFT JOIN q AS q2 ON q2.date = q1.date AND
q2.app = q1.app AND
q2.country != q1.country
-- WHERE q1.country = 'US'
ORDER BY q1.date;
Results for any country vs any country (WHERE q1.country='US' is commented out)
date,app,ratio_between,ratio
2017-01-01,XYZ,GB vs US,0.20
2017-01-01,XYZ,US vs GB,5.00
2017-01-02,XYZ,GB vs US,0.03
2017-01-02,XYZ,US vs GB,30.00
Results for US vs any other country (WHERE q1.country='US' uncommented)
date,app,ratio_between,ratio
2017-01-01,XYZ,US vs GB,5.00
2017-01-02,XYZ,US vs GB,30.00
The trick is in JOIN clause.
Results of a subquery q which aggregates data by date, app and country are joined with results themselves but on date and app.
This way, for every date, app and country we get a "match" with any another country on same date and app. By adding q1.country != q2.country, we exclude results for same country, highlighted below with *
date,app,country,sales,date,app,country,sales
*2017-01-01,XYZ,GB,2000.00,2017-01-01,XYZ,GB,2000.00*
2017-01-01,XYZ,GB,2000.00,2017-01-01,XYZ,US,10000.00
2017-01-01,XYZ,US,10000.00,2017-01-01,XYZ,GB,2000.00
*2017-01-01,XYZ,US,10000.00,2017-01-01,XYZ,US,10000.00*
2017-01-02,XYZ,GB,1000.00,2017-01-02,XYZ,US,30000.00
*2017-01-02,XYZ,GB,1000.00,2017-01-02,XYZ,GB,1000.00*
*2017-01-02,XYZ,US,30000.00,2017-01-02,XYZ,US,30000.00*
2017-01-02,XYZ,US,30000.00,2017-01-02,XYZ,GB,1000.00

AVG only valid dates

I have a simple search query like this one:
SELECT COUNT(id),
COUNT(CASE WHEN nation = 'german' THEN 1 END),
COUNT(CASE WHEN nation = 'french' THEN 1 END),
AVG(AGE(birthday))
FROM persons;
My problem is that I get an error:
ERROR: date out of range for timestamp
I suppose I get this error because not every person has a birthday saved.
birthday is a date-field
How can I prevent this error, and only average birthdays that are valid dates?THANKS
How about this:
SELECT COUNT(id),
COUNT(CASE WHEN nation = 'german' THEN 1 END),
COUNT(CASE WHEN nation = 'french' THEN 1 END),
AVG(AGE(COALESCE(birthday, 0) ))
FROM persons where birthday is not null;
I would use another case statement to exclude the invalid null values from the avg(age()) calculation:
SELECT
COUNT(id),
COUNT(CASE WHEN nation = 'german' THEN 1 END),
COUNT(CASE WHEN nation = 'french' THEN 1 END),
AVG(CASE WHEN birthday IS NOT NULL THEN AGE(birthday) END)
FROM persons;
If you were to add a where birthday is not null clause the average would be correct (or as correct as it can be) but the counts would be off due to the excluded rows not being counted.
See this SQL Fiddle demo and notice how the counts differ between the two queries.

Group by T-SQL vs. MySQL (single column)

I am new to SQL Server/used to MySQL databases and I am running into an issue that I never ran into with MySQL. I am looking to pull all current policy numbers, the name of the company/person it belongs to, their total premium, and whether or not they have what we call 'equipment breakdown' coverage. This is all pretty simple, the issue I am having is with grouping. I want to group by one column only, aka one distinct policy number, the company name, a sum of the premium (it is possible to have several premium amounts both negative and positive so I want to sum these to see what the true total is), and a simple Yes or No column for equipment breakdown.
Here is the query I am running:
SELECT pol_num as policy_number,
insd_name as insureds_name,
SUM(amt) as 'total_premium',
(SELECT
CASE
WHEN cvg_desc = 'Equipment Breakdown'
THEN 'Y'
ELSE 'N'
END) as 'equipment_breakdown'
FROM bapu.dbo.fact_prem
WHERE '2014-05-06' between d_pol_eff and d_pol_exp
AND amt_type = 'Premium'
AND amt_desc = 'Written Premium'
GROUP BY pol_num
ORDER BY policy_number
I get the an error saying that I need to group by insd_name and cvg_desc as well, but I DON'T want that as it gives me duplicate policy numbers.
Here is an example of what I get when I group everything it tells me to:
policy_number insureds_name total_premium equipment_breakdown
001 company a 0.00 n
001 company a 25,000.00 n
001 company a -10,000.00 n
002 company b 100.00 y
002 company b 10,000.00 y
Here is an example of the results I want:
policy_number insureds_name total_premium equipment_breakdown
001 company a 15,000.00 n
002 company b 10,100.00 y
Basically, I just want to group by the policy number and sum the premium amounts. Above is how I would achieve this in MySQL, how can I achieve the results I am looking for in SQL Server?
Thanks
MySQL doesn't require all non-aggregate fields to be included in the GROUP BY clause, even though not doing so can yield unexpected results. SQL Server requires this, so you are forced to decide how you want to handle multiple insd_name values for a given pol_num, you can use MAX(), MIN(), or if the values are always the same, just add them to your GROUP BY:
SELECT pol_num AS policy_number
, MAX(insd_name) AS insureds_name
, SUM(amt) AS 'total_premium'
, MAX(CASE WHEN cvg_desc = 'Equipment Breakdown' THEN 'Y'
ELSE 'N'
END) AS 'equipment_breakdown'
FROM bapu.dbo.fact_prem
WHERE '2014-05-06' BETWEEN d_pol_eff AND d_pol_exp
AND amt_type = 'Premium'
AND amt_desc = 'Written Premium'
GROUP BY pol_num
ORDER BY policy_number
Or:
SELECT pol_num AS policy_number
, insd_name AS insureds_name
, SUM(amt) AS 'total_premium'
, CASE WHEN cvg_desc = 'Equipment Breakdown' THEN 'Y'
ELSE 'N'
END AS 'equipment_breakdown'
FROM bapu.dbo.fact_prem
WHERE '2014-05-06' BETWEEN d_pol_eff AND d_pol_exp
AND amt_type = 'Premium'
AND amt_desc = 'Written Premium'
GROUP BY pol_num
, insd_name
, CASE WHEN cvg_desc = 'Equipment Breakdown' THEN 'Y'
ELSE 'N'
END
ORDER BY policy_number
It looks like the cvg_desc column is probably what's messing you up. You want to group by the resulting Y or N from your CASE statement, but SQL server is grouping by the original cvg_desc column. You could approach this in a way that resolves the CASE statement before it groups. For example, wrap the main query in a common table expression (CTE), which is sort of like an inline-view. Then with the equipment breakdown column reduced to just a Y or an N, a subsequent query from the CTE with your SUM aggregation on premium should give you the results you desire:
WITH Policies(policy_number, insureds_name, premium, equipment_breakdown) AS
(
SELECT
pol_num
,insd_name
,amt
,(CASE WHEN cvg_desc = 'Equipment Breakdown' THEN 'Y' ELSE 'N' END)
AS 'equipment_breakdown'
FROM
bapu.dbo.fact_prem
WHERE
'2014-05-06' BETWEEN d_pol_eff AND d_pol_exp
AND
amt_type = 'Premium'
AND
amt_desc = 'Written Premium'
)
SELECT
policy_number
,insureds_name
,SUM(premium) AS total_premium
,equipment_breakdown
FROM
Policies
GROUP BY
policy_number
,insureds_name
,equipment_breakdown
You'll need an aggregate function on the fields you don't want to group by. A simple one to use is MAX which works with most types;
SELECT pol_num as policy_number,
MAX(insd_name) as insureds_name,
SUM(amt) as 'total_premium',
(SELECT
CASE
WHEN MAX(cvg_desc) = 'Equipment Breakdown'
THEN 'Y'
ELSE 'N'
END) as 'equipment_breakdown'
FROM bapu.dbo.fact_prem
WHERE '2014-05-06' between d_pol_eff and d_pol_exp
AND amt_type = 'Premium'
AND amt_desc = 'Written Premium'
GROUP BY pol_num
ORDER BY policy_number
The reason SQL Server wants this is that it likes to give deterministic answers, for example
column_a | column_b
1 | 1
1 | 2
...grouped by only column_a would in MySQL give either 1 or 2 as an answer for column_b, while SQL Server wants you to tell it explicitly which one to use.
I would probably write this as below -- did not test
SELECT pol_num as policy_number,
insd_name as insureds_name,
SUM(amt) as total_premium
CASE
WHEN cvg_desc = 'Equipment Breakdown'
THEN 'Y'
ELSE 'N'
END as equipment_breakdown
FROM bapu.dbo.fact_prem
WHERE '2014-05-06' between d_pol_eff and d_pol_exp
AND amt_type = 'Premium'
AND amt_desc = 'Written Premium'
GROUP BY
pol_num, policy_number,
CASE
WHEN cvg_desc = 'Equipment Breakdown'
THEN 'Y'
ELSE 'N'
END
ORDER BY policy_number

Merging data SQL Query

I have a query request where I have to show one customer activity for each web-site but it has to be only one row each, instead of one customer showing multiple times for each activity.
Following is the query I tried but brings lot more rows. please help me as how I can avoid duplicates and show only one customer by each row for each activity.
SELECT i.customer_id, i.SEGMENT AS Pistachio_segment,
(CASE when S.SUBSCRIPTION_TYPE = '5' then 'Y' else 'N' end ) PB_SUBS
(CASE WHEN S.SUBSCRIPTION_TYPE ='12' THEN 'Y' ELSE 'N' END) Daily_test,
(CASE when S.SUBSCRIPTION_TYPE ='8' then 'Y' else 'N' end) COOK_4_2
FROM IDEN_WITH_MAIL_ID i JOIN CUSTOMER_SUBSCRIPTION_FCT S
ON I.IDENTITY_ID = S.IDENTITY_ID and I.CUSTOMER_ID = S.CUSTOMER_ID
WHERE s.site_code ='PB' and s.subscription_end_date is null
Sounds like you need to group by customer_id and perform aggregations for the other columns you are selecting. For example:
sum(case when s.subscription_type = '5' then 1 else 0 end) as pb_subs_count
You could try one of two things:
Use a GROUP BY statement to combine all records with the same id, e.g.,
...
WHERE s.site_code ='PB' and s.subscription_end_date is null
GROUP BY i.customer_id
Use the DISTINCT command in your SELECT, e.g.,
SELECT DISTINCT i.customer_id, i.SEGMENT, ...
you could use a aggregation (SUM) on customer_id, but what do you expect to happen on the other fields? for example, if you have SUBSCRIPTION_TYPE 5 and 13 for the same customer (2 rows), which value do you want?
Perhaps you are looking for something like this:
SELECT i.customer_id, i.SEGMENT AS Pistachio_segment,
MAX(CASE when S.SUBSCRIPTION_TYPE = '5' then 'Y' else 'N' end ) PB_SUBS
MAX(CASE WHEN S.SUBSCRIPTION_TYPE ='12' THEN 'Y' ELSE 'N' END) Daily_test,
MAX(CASE when S.SUBSCRIPTION_TYPE ='8' then 'Y' else 'N' end) COOK_4_2
FROM IDEN_WITH_MAIL_ID i JOIN CUSTOMER_SUBSCRIPTION_FCT S
ON I.IDENTITY_ID = S.IDENTITY_ID and I.CUSTOMER_ID = S.CUSTOMER_ID
WHERE s.site_code ='PB' and s.subscription_end_date is null
GROUP BY i.customer_id, i.SEGMENT
I can't be sure, though, without knowing more about the tables involved.