How to run case counts by group? - sql

I am just beginning to teach myself SQL (I've been going at it for a week now and feel I have been doing pretty well to this point).
I have a practice database that I'm just messing around with -- there are two tables (one titled "progress" and one titled "users").
Progress
This table includes a foreign key from "users" identifying students enrolled in 5 different coding courses (CPP, SQL, HTML, Javascript, and Java) and indicates whether a student is enrolled, has started a course, or has completed a course.
Users
This table includes the primary key for identifying students as well as their demographic information (addresses, email domain for university, etc...).
I want to be able to count the number of students enrolled in the 5 courses for each university. I have been able to do this for one university at a time but I want something that will do that for all 617 different universities at once.
WITH placeholder AS (
SELECT *
FROM users
JOIN progress
ON users.user_id = progress.user_id
GROUP BY email_domain
ORDER BY email_domain
)
select email_domain,
Sum(CASE WHEN learn_cpp = "completed" OR learn_cpp = "started" THEN 1 ELSE 0 END) AS 'CPP Enrollment',
Sum(CASE WHEN learn_sql = "completed" OR learn_SQL = "started" THEN 1 ELSE 0 END) AS 'SQL Enrollment',
Sum(CASE WHEN learn_html = "completed" OR learn_html = "started" THEN 1 ELSE 0 END) AS 'HTML Enrollment',
Sum(CASE WHEN learn_javascript = "completed" OR learn_javascript = "started" THEN 1 Else 0 END) AS 'Javascript Enrollment',
Sum(CASE WHEN learn_java = "completed" OR learn_java = "started" THEN 1 ELSE 0 END) AS 'Java Enrollment'
FROM placeholder;
This returns the correct enrollment count across all universities but only has the first university email domain (shown below).
aa.edu 238 317 183 306 119
I want the enrollment counts for each course by university (there should be 167 rows with enrollment counts for each course in the columns).

donPablo caught this quickly in the comments.
I moved my GROUP BY and ORDER BY commands out of the join table command and to the end of my code so that it would be run after enrollment counts have been calculated.
This produced the result I was looking for.
Thank you for your quick response!

Related

Pivot aggregates in SQL

I'm trying to find a way to pivot the table below (I guess you would say it's in "long" format) into the ("wider") format where all the columns are essentially explicitly Boolean. I hope this simple example gets across what I'm trying to do.
Note there is about 74 people. (so the output table will have 223 columns, 1 + 74 x 3 )
I can't figure out an easy way to do it other than horribly with a huge number of left joins along "Town" by statements like
... left join(
select
town,
case where person = 'Richard' then 1 else 0 end as "Richard"
Fee as "Richard Fee"
from services
where person = 'Richard'
left join...
can some smart person suggest a way to do this using PIVOT functions in SQL?
I am using Snowflake (and dbt so I can get some jinja into play if really necessary to loop through all the people).
Input:
Desired output:
ps. I know this is a ridiculous SQL ask, but this is the "output the client wants" so I have this undesirable task to fulfil.
If persons are known in advance then you could use conditional aggregation:
SELECT town,
MAX(CASE WHEN person = 'Richard' THEN 1 ELSE 0 END) AS "Richard",
MAX(CASE WHEN person = 'Richard' THEN Fee END) AS "Richard Fee",
MAX(CASE WHEN person = 'Richard' THEN Service END) AS "Richard Service",
MAX(CASE WHEN person = 'Caitlin' THEN 1 ELSE 0 END) AS "Caitlin",
...
FROM services
GROUP BY town;

How do I select this grouped data correctly?

First of all, sorry for the generic title, I don't know how exactly to word the question I have.
I am working with a legacy database and can't make changes to the schema, so I'm forced to work with what I've got.
Table setup (columns):
Id (a normal int id)
UniqueId (a column that holds the uniqueIds for each location)
status (a varchar column that can contain one of three status's, 'completed', 'failed', 'Attention')
Count (an int column that represents how many users fell into each status
Example data :
UniqueId Status Count
679FCE83-B245-E511-A42C-90B11C2CD708 completed 64
679FCE83-B245-E511-A42C-90B11C2CD708 Attention 1
679FCE83-B245-E511-A42C-90B11C2CD708 failed 101
4500990D-F516-E411-BB09-90B11C2CD708 completed 100
4500990D-F516-E411-BB09-90B11C2CD708 Attention 17
4500990D-F516-E411-BB09-90B11C2CD708 failed 516
557857BD-6B46-E511-A42C-90B11C2CD708 completed 67
557857BD-6B46-E511-A42C-90B11C2CD708 Attention 4
557857BD-6B46-E511-A42C-90B11C2CD708 failed 103
What I am trying to do is select all of the records, grouped by uniqueId, with a separate column for each of the status's containing their individual counts. The results would look something like this...
UniqueId, count(completed), count(failed), count(Attention)
679FCE83-B245-E511-A42C-90B11C2CD708 64 101 1
4500990D-F516-E411-BB09-90B11C2CD708 100 516 17
557857BD-6B46-E511-A42C-90B11C2CD708 67 103 4
I'm sure I'm missing something basic with this, but I can't seem to find the words to Google my way out of this one.
Could someone push me in the right direction?
You can use conditional aggregation:
select uniqueid,
sum(case when status = 'Completed' then count else 0 end) as completed,
sum(case when status = 'Failed' then count else 0 end) as failed,
sum(case when status = 'Attention' then count else 0 end) as attention
from t
group by uniqueid;

Change the value of a sum in sql

I'm doing a query to obtain the numbers of people for a Christmas dinner.
The people include the workers and their relatives. The relatives are stored in a different table.
Children and adults eat a different menu and we organize tables by families.
I'm already using this query
select worker_name,
count(*) as total_per_family,
SUM(CASE WHEN age < 18 THEN 1 ELSE 0 END) as children,
SUM(CASE WHEN age >= 18 THEN 1 ELSE 0 END) as adults
from
(
/*subquery*/
)
group by worker_name
order by worker_name;
This query returns the number of child and adults related to the worker and count gives me the total.
The problem is that I need to add the worker to the adults sum.
Is there a way to modify adults? Either setting its initial value to 1 or adding 1 after the sum is done but before the count is obtained.
Modifying your query to read
SUM(CASE WHEN AGE>=18 THEN 1 ELSE 0 END) + 1 as adults
would probably be a first approach. The aggregate SUM() would be computed first, with 1 added thereafter as your initial suggestion indicated.

Access SQL: How to specify which record to return based on the "more important" condition?

I have 2 tables (MS ACCESS):
Table "Orders"
OrderID Product Product_Group Client Client_Group Revenue
1 Cars Vehicles Men People 10 000
2 Houses NC_Assets Women People 15 000
3 Houses NC_Assets Partnersh Companies 12 000
4 Cars Vehicles Corps Companies 3 000
Table "Gouping"
Product Product_Group Client Client_Group Tax rate
Cars Companies Taxable 30%
Vehicles Companies Taxable 15%
Houses People Taxable 13%
Houses Women Taxable 15%
I want to join these tables to see which orders will fall into which taxable group. As you can see some products/clients are mapped differently than their groups -> if that is the case, the query should return only one record for this pair and exclude any pairing containing their groups. In pseudo-code:
If there's product-client grouping, return this record Else
If there's product-client grouping ---//----- else
If there's product group - client ----///-----else
If there's product group-client group ---///----
End if * 4
In that order.
Now my query (pseudo):
SELECT [Orders].*, [Grouping].* FROM [Orders] LEFT JOIN [Grouping] ON
(([Orders].Product = [Grouping].Product OR [Orders].Product_Group = [Grouping].Product_Group) AND
([Orders].Client = [Grouping].Client OR [Orders].Client_Group = [Grouping].Client_Group))
Returns both Cars-Companies and Vehicles-Companies. I'm out of ideas how to set it up to get only the most granular records from each combination. UNION? NOT EXISTS?
Any help appreciated.
I want to join these tables to see how many orders qualify as good,
mediocre etc.
Sounds like you want counts of the particular conditions...Assuming you have a SUM and CASE (I haven't written queries for MS Access in about 10 years...), here's some pseudo-code that should get you started:
SELECT SUM(CASE WHEN {mediocre-conditions} THEN 1 ELSE 0 END) AS MediocreCount,
SUM(CASE WHEN {good-conditions} THEN 1 ELSE 0 END) AS GoodCount,
SUM(CASE WHEN {great-conditions} THEN 1 ELSE 0 END) AS GreatCount
FROM [Orders] LEFT JOIN [Grouping] ON (([Orders].Product = [Grouping].Product OR [Orders].Product_Group = [Grouping].Product_Group) AND ([Orders].Client = [Grouping].Client OR [Orders].Client_Group = [Grouping].Client_Group))
[update] I don't like giving bad answers, so did a quick look...based on this link: Does MS Access support "CASE WHEN" clause if connect with ODBC?, it appears you may be able to do:
SELECT SUM(IIF({mediocre-conditions},1,0)) AS MediocreCount,
SUM(IIF({good-conditions},1,0)) AS GoodCount,
SUM(IIF({great-conditions},1,0)) AS GreatCount

Why does <> operator not work as expected on my varbinary(20) sql data?

I have a MySQL database, it contains a table that contains some data in columns of type varbinary(20)
Don't ask me why - it just is and i need to deal with it.
This data represents the number of modules a student has completed as part of a course. There is also another column called criterias that represents the number of modules that must be completed in order for the course to be marked complete. When I am doing a count of the number of completed courses, I want to count the rows where
completed = criteria AND completed != 0 ( ...oh and where the student is enrolled)
(because in some cases the criteria is 0 so I want to skip any courses where a student has completed 0 of 0 modules).
Additionally for some reason the value of 0 in the varbinary column is 30 (if anyone can offer an explanation for this then great) and so where there are 3 modules that need to be completed the value in the criteria column will be 33
So my SQL query looks like this -
count(CASE WHEN (completed <> 30 AND (completed = criterias) AND enrolled = 'Yes') THEN 1 END) AS Total_Complete
However this count seems to include courses where student has completed 0 of 0 courses so it would appear that my completed <> 30 is not working.
As I was writing this I answered my own question but I thought id leave it up here incase it helped anyone else.
The solution was to ignore the fact that for whatever reason 0 was represented by 30 and change my sql to -
count(CASE WHEN (completed <> 0 AND (completed = criterias) AND enrolled = 'Yes') THEN 1 END) AS Total_Complete
This worked