Pivot aggregates in SQL - sql

I'm trying to find a way to pivot the table below (I guess you would say it's in "long" format) into the ("wider") format where all the columns are essentially explicitly Boolean. I hope this simple example gets across what I'm trying to do.
Note there is about 74 people. (so the output table will have 223 columns, 1 + 74 x 3 )
I can't figure out an easy way to do it other than horribly with a huge number of left joins along "Town" by statements like
... left join(
select
town,
case where person = 'Richard' then 1 else 0 end as "Richard"
Fee as "Richard Fee"
from services
where person = 'Richard'
left join...
can some smart person suggest a way to do this using PIVOT functions in SQL?
I am using Snowflake (and dbt so I can get some jinja into play if really necessary to loop through all the people).
Input:
Desired output:
ps. I know this is a ridiculous SQL ask, but this is the "output the client wants" so I have this undesirable task to fulfil.

If persons are known in advance then you could use conditional aggregation:
SELECT town,
MAX(CASE WHEN person = 'Richard' THEN 1 ELSE 0 END) AS "Richard",
MAX(CASE WHEN person = 'Richard' THEN Fee END) AS "Richard Fee",
MAX(CASE WHEN person = 'Richard' THEN Service END) AS "Richard Service",
MAX(CASE WHEN person = 'Caitlin' THEN 1 ELSE 0 END) AS "Caitlin",
...
FROM services
GROUP BY town;

Related

How to run case counts by group?

I am just beginning to teach myself SQL (I've been going at it for a week now and feel I have been doing pretty well to this point).
I have a practice database that I'm just messing around with -- there are two tables (one titled "progress" and one titled "users").
Progress
This table includes a foreign key from "users" identifying students enrolled in 5 different coding courses (CPP, SQL, HTML, Javascript, and Java) and indicates whether a student is enrolled, has started a course, or has completed a course.
Users
This table includes the primary key for identifying students as well as their demographic information (addresses, email domain for university, etc...).
I want to be able to count the number of students enrolled in the 5 courses for each university. I have been able to do this for one university at a time but I want something that will do that for all 617 different universities at once.
WITH placeholder AS (
SELECT *
FROM users
JOIN progress
ON users.user_id = progress.user_id
GROUP BY email_domain
ORDER BY email_domain
)
select email_domain,
Sum(CASE WHEN learn_cpp = "completed" OR learn_cpp = "started" THEN 1 ELSE 0 END) AS 'CPP Enrollment',
Sum(CASE WHEN learn_sql = "completed" OR learn_SQL = "started" THEN 1 ELSE 0 END) AS 'SQL Enrollment',
Sum(CASE WHEN learn_html = "completed" OR learn_html = "started" THEN 1 ELSE 0 END) AS 'HTML Enrollment',
Sum(CASE WHEN learn_javascript = "completed" OR learn_javascript = "started" THEN 1 Else 0 END) AS 'Javascript Enrollment',
Sum(CASE WHEN learn_java = "completed" OR learn_java = "started" THEN 1 ELSE 0 END) AS 'Java Enrollment'
FROM placeholder;
This returns the correct enrollment count across all universities but only has the first university email domain (shown below).
aa.edu 238 317 183 306 119
I want the enrollment counts for each course by university (there should be 167 rows with enrollment counts for each course in the columns).
donPablo caught this quickly in the comments.
I moved my GROUP BY and ORDER BY commands out of the join table command and to the end of my code so that it would be run after enrollment counts have been calculated.
This produced the result I was looking for.
Thank you for your quick response!

How come my count() passes evaluation?

I am fairly new at SQL and I am trying to create a query that tries to determine in which countries only 3 specific languages (spanish,italian,german) are spoken and no other languages.
select country
from langusage
group by country
having count(case when language in ('spanish','german','italian') then 1 else 5 end)=3
The output are all countries that have at least 1 of the aforementioned languages . How come they pass the '=3' test?
The reason is that count(1) = count(5). count() counts the number of non-NULL values.
You intend sum():
select country
from langusage
group by country
having sum(case when iso in ('spanish', 'german', 'italian') then 1 else 5 end) = 3

postgresql dynamically name columns in case statement

I'm looking for a way to dynamically or automatically name the columns in my case statement below. Scenario - I'm trying to find out how many different companies of various industries are found in each country. The countries are the rows while the categories are the columns.
I'm using postgressql so pivot won't work and I don't have a new enough version where I can use cross-tab
I want to be able to replicate this for much larger scenarios where I won't have to worry about 'hardcoding' the cat_nbr and column names like I do here.
SELECT country,
count(CASE WHEN cat_nbr = 1 THEN company_code END) retail,
count(CASE WHEN cat_nbr = 2 THEN company_code END) finance,
count(CASE WHEN cat_nbr = 3 THEN company_code END) oil,
count(CASE WHEN cat_nbr = 4 THEN company_code END) tech
FROM global_companies
GROUP BY country
the table structure format in case it isn't clear has these columns:
country - cat_nbr - company_code - cat_desc.
Cat_desc is where I have hardcoded the words 'retail', 'finance', etc
Is there someway I can do this with less hardcoding in terms of what I refer to each cat_nbr/cat_desc? There are lots and lots of cat_nbrs and cat_descs.
You can not create a query with a dynamic row size. That's impossible, even with cross-tab.
You can however
create a query that returns a SQL statement which you can execute afterward, in the client.
create something like \crosstabview with the client.
You can read more information about this in my question, "How do I generate a pivoted CROSS JOIN where the resulting table definition is unknown?".
Instead of hardcoding category names, you could hardcode country names for the columns and let the rows be dynamic as usual.
SELECT cat_nbr,
COUNT(CASE WHEN Country = 'US' THEN company_code END) AS NumUS,
COUNT(CASE WHEN Country = 'UK' THEN company_code END) AS NumUK,
COUNT(CASE WHEN Country = 'FR' THEN company_code END) AS NumFR,
...
FROM global_companies
GROUP BY cat_nbr;
Another alternative is you can aggregate the data into JSON or array structures.

select sum(a), sum(b where c=1) from db; sql conditions in select statement

i guess i just lack the keywords to search, but this is burning on my mind:
how can i add a condition to the sum-function in the select-statement like
select sum(a), sum(b where c=1) from db;?
this means, i want to see the sum of column a and the sum of column b, but only of the records in column b of which column c has the value 1.
the output of heidi just says "bad syntac near WHERE". may there be any other way?
thanks in advance and best regards from Berlin, joachim
The exact syntax may differ depending on the database engine, however it will be along the lines of
SELECT
sum(a),
sum(CASE WHEN c = 1 THEN b ELSE 0 END)
FROM
db
select sum(case when c=1 then b else 0 end)
This technique is useful when you need a lot of aggregates on the same set of data - you can query the entire table without applying a where filter, and have a bunch of these which give you aggregated data for a specific filter.
It's also useful when you need a lot of counts based on filters - you can do sums of 1 or 0:
select sum(case when {somecondition} then 1 else 0 end)

SQL-Server - Ability to pass subset parameter in the select?

I'm trying create a query that will output a total number, as well as a subset of the total number in SQL-Server. I can think of a way to do this via subqueries, but that seems like a ton of work. Is there a faster way to write this query?
Table name: orders
OrderID Date Type OrderSize
1 1/1/2012 Electronics $282.02
2 1/1/2012 Electronics $1,000.56
3 1/1/2012 Books $17.25
4 1/1/2012 Books $10.00
What I am trying to output would look like this:
Date ElectronicOrders ElectronicOrderSize BookOrders BookOrderSize
1/1/2012 2 $1,282.58 2 $27.25
I could create a temp table, then run 2 update queries - 1 WHERE Type = 'Electronics' and 1 WHERE Type = 'Books'.
What I have seen in some programming languages, such as R, is the ability to subset a variable. Is there a way for me to say something like:
count(OrderID, Type = 'Electronics) as ElectronicOrders, sum(OrderSize, Type = 'Electronics') as ElectronicOrderSize
Or am I stuck with subqueries and UPDATE queries?
I haven't ever gotten the new PIVOT syntax to make sense in my head but you can do a pivot table by grouping, and taking aggregate functions in a case statement.
select [date], sum( case when type = 'Electronics' then (ordersize) else 0 end) AS ElectronicsSum,
sum( case when type = 'Electronics' then 1 else 0 end) AS ElectronicsCount,
sum( case when type = 'Books' then (ordersize) else 0 end) AS BooksSum,
sum( case when type = 'Books' then 1 else 0 end) AS BooksCoumt
from orders
group by [date]
I put a fiddle thing up to test it out. If Aaron B. posts up a solution, give him the answer credit, I might not have even recognized the pivotyness of it.